BERT is a language model with an architecture consisting of stacks of self-attention layers and “pretrained” on masked language modeling (MLM) and next sentence prediction tasks.
Fine-tuning is commonly done via adding a linear layer to BERT’s final layer.