Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Summary
Predicting human evaluations of language models from NLP benchmark scores.

Predicting human evaluations of language models from NLP benchmark scores.