# Curriculum Learning for Natural Language Understanding

**Authors**: Xu et al.

**Venue**: ACL 2020

## Idea

Given a dataset for a natural language model, the authors aim
to automate construction of a curriculum. To do this, they propose the
following:

- Split your dataset into N random, equal sized data subsets
- Train N models
- For each model, feed it the data subsets used to train the other models
and measure the model’s performance (e.g. accuracy, F1, MSE, whatever) on each datum. That means each datum has
N-1 values measuring how well the other N-1 models that haven’t seen it performed
on it.
- For each datum, average the N-1 scores and sort based on the average score
- Train a new model from scratch, starting with the easiest datum and adding more over time

## Results

On SQuAD (reading comprehension) and NewsQA (reading comprehension), the curriculum enables
a trained model to perform better at the end of training

They make a similar claim on GLUE

Bizarrely, having even N=2 seemed to deliver almost as much benefit as N=10,
and there was no monotonic non-decrease with N.

## Notes

- I didn’t see any learning curves to better understand the effect the curriculum has.