Training Compute-Optimal LLMs (Chinchilla)

A plain summary, so you can get the gist here without leaving.

In 2022, DeepMind found that many large models had been built too big for the amount of data they were trained on, and that a smaller model fed more data can do better for the same training budget.

What it is

This work, which produced a model nicknamed Chinchilla, revisits how to spend a fixed training budget. The two main things you can scale are model size, the number of parameters, and data, the amount of text the model reads. The question is how to balance them.

DeepMind's answer challenged the prevailing habit. Many earlier models had grown very large but had not been trained on proportionally enough data. Chinchilla showed that rebalancing toward more data, with a smaller model, used the same compute more wisely.

The core idea

For a given amount of compute, there is a sweet spot between size and data. Going too big without enough text wastes the budget, because the model has more capacity than it has examples to learn from.

Chinchilla, a smaller model trained on much more data, outperformed larger models trained on less, while using a comparable budget. The lesson is that data and size should grow together in a balanced way, not size alone.

Why it matters

This reshaped how teams plan training runs. Smaller, well-fed models are cheaper to run afterward and can match or beat bloated ones, which matters for both cost and accessibility.

For builders, it is a reminder that more parameters is not automatically better. Thinking about the balance between model size and training data leads to models that are both stronger and more efficient to use.

Key points

Published in 2022 by DeepMind.
Asks how to split a fixed budget between model size and training data.
Found many big models were undertrained on too little data.
A smaller model with more data beat larger models at similar compute.
Size and data should scale together, which also makes models cheaper to run.

Open the original source

DeepMind

New to this? Come build with us.

Reading is good. Building with people is better. Our drop-ins are free and open to total beginners.

RSVP for the next session Browse the whole library