Curriculum Learning for Language Modeling
Authors: Campos
Venue: Arxiv
PDF: https://arxiv.org/pdf/2108.02170.pdf
Idea
Wide variety of curriculum learning (CL) experiments language modeling show no compelling evidence
that curriculum learning (CL) improves language modeling results.
Background
- Curriculum learning (CL) has been shown to help models train faster and produce better results in
neural machine translation
Results
- Train ELMo with a variety of curricula on wikitext-2 and wikitext-103
then evaluate on GLUE
- Use Platanios et al.’s competence based curriculum (CBC): for each sequence of words, sort by difficulty
using a heuristic like sentence length or unigram rarity. Model is given an initial competence scalar
and can be trained using data with difficulty below its competence level
- Test 2 baselines: no curriculum, random difficulty
- Test 6 heuristics: sample length, unigram/bigram/trigram entropy, parse tree depth, parts of speech diversity
Notes
-
Not a great paper. One master’s student, lots of typos, experiments incomplete and poorly plotted
-
LOL: “Seeking to represent natural language, researchers have found language models (LM) with Sesame Street-inspired
names [1] [2] [3] to be incredibly effective methods of producing language representations (LR).”