BERT: Pre-training of Deep Bidirectional Transformers

A plain summary, so you can get the gist here without leaving.

In 2018, Google released BERT, a language model that reads a sentence from both directions at once and learns from huge amounts of plain text first, so it can then be adapted to many specific tasks with little extra training.

What it is

BERT is a Transformer-based model for understanding language. Its name stands for Bidirectional Encoder Representations from Transformers. The key word is bidirectional. When BERT looks at a word, it considers the words on both sides, the left context and the right context together, to understand the full meaning.

It learns in two phases. First it is pretrained on enormous amounts of ordinary text. Then it is fine-tuned, given a smaller dose of task-specific examples, to do a particular job like answering questions or sorting reviews.

The core idea

During pretraining, BERT plays a fill-in-the-blank game. Some words in a sentence are hidden, and the model learns to guess them from everything around them. To do that well it has to build a genuine sense of how language works.

This split between general pretraining and targeted fine-tuning is called transfer learning. You do the expensive, broad learning once, then reuse that knowledge cheaply for many specific tasks. BERT made this the standard way to work in language.

Why it matters

BERT pushed accuracy forward across a wide set of language benchmarks and showed that a single pretrained model could be adapted to many problems. That changed the default workflow in the field.

The pattern of pretrain once, adapt often is now everywhere in AI, including the foundation models you fine-tune or prompt today. For builders, BERT is a clear demonstration of why starting from a model that already understands a lot saves enormous effort.

Key points

Published in 2018 by Google.
Reads context from both directions at once, hence bidirectional.
Pretrains on huge text by filling in hidden words, then fine-tunes per task.
Made transfer learning, pretrain once and adapt often, the NLP default.
Raised accuracy across many language tasks with a single base model.

Open the original source

Google

New to this? Come build with us.

Reading is good. Building with people is better. Our drop-ins are free and open to total beginners.

RSVP for the next session Browse the whole library