Pretraining on the test set is all you need
Abstract
We introduce a novel technique to achieve state-of-the-art performance on any benchmark: simply pretrain on the test set. Our method achieves 100% accuracy on all benchmarks tested.
Summary
Satirical paper showing that pretraining on the test set yields perfect benchmark scores.
Summary
This satirical paper highlights the problem of data contamination in large language model evaluation by taking the concept to its logical extreme.
The “Method”:
We propose a revolutionary new technique: simply include the test set in pretraining data. Our approach achieves:
- 100% accuracy on MMLU
- 100% accuracy on GSM8K
- 100% accuracy on every other benchmark
The Serious Point:
This paper draws attention to the growing problem of benchmark contamination in LLM evaluation:
- Test sets may inadvertently appear in web-scraped pretraining corpora
- Models may memorize benchmark examples rather than learning general capabilities
- The field needs better evaluation practices that are robust to contamination
Impact: This work sparked important discussions about evaluation integrity and the need for held-out, uncontaminated test sets.
