Note on Fwd: Import AI 317: DeepMind Speeds Up Language Model Sampling; Voice Cloning Tech Gets Abused; More Scaling Laws for RL via Josh Beckman

Use a small model to generate a ‘draft’ output, then use a larger and smarter model to score the ‘draft’, then use a rejection sampling scheme to accept the tokens which are agreed by the small and large models.

In tests, they find that a draft model can give them speedups ranging between 1.92X (on a summarization benchmark called XSum) and 2.46X on a code generation task called HumanEval.

✉️
FROM:
Josh Beckman
Fwd: Import AI 317: DeepMind Speeds Up Language Model Sampling; Voice Cloning Tech Gets Abused; More Scaling Laws for RL

Reference

ai
performance
scalability
Fwd: Import AI 317: DeepMind Speeds Up Language Model Sampling; Voice Cloning Tech Gets Abused; More Scaling Laws for RL
Josh Beckman
2023, February 07, Tuesday
Edit

Widgets

Network Graph

Legend

Insight Agent

This widget generates “insights” about the post using an agentic loop and MCP server for this site. A response may take up to a minute to generate.

Generating

Keyboard Shortcuts

Key	Action
`o`	Source
`e`	Edit
`i`	Insight
`r`	Random
`h`	Home
`s` or `/`	Search