Training Language Models to Follow Instructions (InstructGPT)

A plain summary, so you can get the gist here without leaving.

In 2022, OpenAI showed how to make a language model genuinely follow instructions by tuning it on human preferences, a method called RLHF that became the basis for ChatGPT.

What it is

A raw language model is good at predicting text, but that is not the same as being helpful. It might ignore your instruction or answer in an unhelpful way. InstructGPT addresses this by adding a layer of human feedback to the training.

People review the model's answers and indicate which ones are better. The model is then adjusted to produce more of the kind of answers humans prefer. This technique is known as reinforcement learning from human feedback, or RLHF.

The core idea

Rather than only learning from raw text, the model learns from human judgments about quality. Reviewers rank responses, those rankings train a model of what people like, and that preference model then guides the language model toward more helpful, honest, and on-target replies.

The result is a model that actually does what you ask. A smaller model tuned this way can feel more useful than a larger one that was only trained to predict text, because following the instruction is what people care about.

Why it matters

This alignment recipe is what turned powerful but unwieldy language models into assistants people can talk to. ChatGPT and the wave of chat assistants that followed are built on this idea.

For builders, the takeaway is that capability and helpfulness are different things, and human feedback is a practical way to bridge them. Much of what makes modern AI feel cooperative comes from this step.

Key points

Published in 2022 by OpenAI.
Uses human feedback to teach a model to follow instructions.
The method is reinforcement learning from human feedback (RLHF).
Makes models more helpful, honest, and on-target than raw prediction alone.
The basis for ChatGPT and modern chat assistants.

Open the original source

OpenAI

New to this? Come build with us.

Reading is good. Building with people is better. Our drop-ins are free and open to total beginners.

RSVP for the next session Browse the whole library