Rylan Schaeffer

Logo
Resume
Publications
Learning
Blog
Teaching
Jokes
Kernel Papers


← Back to Natural Language Processing

BERT

BERT is a language model with an architecture consisting of stacks of self-attention layers and “pretrained” on masked language modeling (MLM) and next sentence prediction tasks.

Fine-tuning is commonly done via adding a linear layer to BERT’s final layer.