Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Abstract
Summary
Predicting human evaluations of language models from NLP benchmark scores.

Predicting human evaluations of language models from NLP benchmark scores.