Overview
Chinchilla is DeepMind’s 2022 language model that showed smaller models trained on far more tokens can beat much larger ones. It has about 70B parameters and was trained on roughly 1.4T tokens, setting a new compute-optimal recipe and improving accuracy while cutting inference cost.
Description
Chinchilla is a dense decoder-only Transformer built to test compute-optimal scaling. Instead of pushing parameter count ever higher, DeepMind kept the model moderate in size and dramatically increased the training corpus, landing near an optimal ratio of about 20 training tokens per parameter. Trained on roughly 1.4 trillion tokens with around 70 billion parameters, it outperformed larger predecessors such as Gopher on a wide range of benchmarks, while being faster and cheaper to serve at inference. The result reshaped industry practice: for a fixed training budget, allocate more data and fewer parameters, aim for long training runs with strong regularization, and you can get better generalization, stronger few-shot performance, and more practical deployment costs. Chinchilla’s findings influenced later model families that emphasized token budgets, data quality, and extended pretraining over sheer parameter scale.
About DeepMind
DeepMind is a technology company that specializes in artificial intelligence and machine learning.
Industry:
Research Services
Company Size:
501-1000
Location:
London, GB
View Company Profile