From base model to assistant · Oslo Vibe Coding

The takeaway

The assistant you chat with is a statistical simulation of a helpful, knowledgeable human labeler, built by training the base model on example conversations that labelers wrote by hand following company guidelines. It is not a person and it is not a database. It is a statistical average of what skilled labelers would write. Read its answers that way.

What the base model is

After the pre-training stage covered earlier in this series, you get a base model. Karpathy calls it an internet document simulator. Feed it some tokens and it continues them the way internet text tends to continue, sampling one token at a time from the statistical patterns it absorbed. Its knowledge sits compressed in a few hundred billion parameters, a lossy zip of the internet that it recalls vaguely rather than looks up.

The base model is genuinely useful, but it is not an assistant. Ask a base model "what is 2 plus 2" and it will not tell you it is 4 and ask what else you need. It treats your question as a prefix and keeps writing, because that is what a web page would do. Karpathy shows it drifting into philosophy or simply generating more questions. It is a very expensive autocomplete.

You can coax useful behavior out of a base model with clever prompting. A few-shot prompt with ten English-Korean pairs gets it translating. Wrapping a prompt to look like a transcript of a helpful AI talking to a human gets it to play along. These tricks work, but they are fragile. To get something like ChatGPT, you need a second stage.

The base model is a very expensive autocomplete that dreams internet pages.

Post-training swaps the dataset

The second stage is called post-training, and the first step in it is supervised fine-tuning. The mechanics are almost boring. You take the finished base model and keep training it with the exact same algorithm. The only thing that changes is the data. You throw out the pile of internet documents and substitute a dataset of conversations.

This is cheap compared to pre-training. Pre-training can run for roughly three months across thousands of computers and cost millions of dollars. Post-training is more like three hours, because the conversation dataset is far smaller than the whole internet. The heavy compute already happened. This stage adjusts the model's behavior on top of the knowledge it already has.

The model adapts fast. Trained on enough example conversations, it stops behaving like a document simulator and starts behaving like a participant in a chat. It learns the statistics of how an assistant responds to a human question. That new behavior is layered onto everything it already knew from the internet.

The conversation format

Everything a model touches has to become a sequence of tokens, including conversations. So we need an agreed way to encode who said what. Karpathy compares it to a network protocol: precise rules everyone follows so a structured object turns into a flat sequence and back again.

In practice the encoding uses special tokens that mark the turns. A turn begins with a token like the one OpenAI writes as im_start, followed by a tag for whose turn it is (user or assistant), a separator, the actual text of the message, and a closing token. These special tokens are new. They did not exist during pre-training. They get introduced in post-training precisely so the model can learn where a user's turn ends and the assistant's turn begins.

Once the conversation is flattened into tokens, nothing about training is new. It is the same next-token prediction as before. At inference time, ChatGPT builds the context up to the point where the assistant's turn should start, then samples tokens to fill in the reply. The exact format differs between models and is still a bit of a wild west, but the idea is constant: conversations become one-dimensional token sequences so the old machinery still applies.

Special tokens mark the start and end of each turn and tag it as user or assistant.
These tokens are added in post-training; the base model never saw them.
A conversation becomes a flat token sequence, so training is still just next-token prediction.
At inference, the model is handed the context and asked to complete the assistant's turn.

Where the conversations come from

The conversations are written by people. OpenAI's 2022 InstructGPT paper described hiring human contractors whose job was to invent prompts and then write the ideal assistant response to each one. "List five ideas for how to regain enthusiasm for my career." "What are the top ten science fiction books I should read next." A person wrote both the question and the model answer.

How does a labeler know what counts as an ideal answer? The company gives them labeling instructions. At a high level the instruction is to be helpful, truthful, and harmless, and to refuse requests the company does not want the assistant to handle. In reality these documents run to hundreds of pages, and labelers study them professionally. The assistant's personality is not coded in software. It is programmed by example, through the answers labelers write under those guidelines.

The state of the art has shifted since 2022. Labelers rarely write every answer from scratch now. They lean on existing language models to draft responses and then edit them. Modern datasets like UltraChat are largely synthetic with some human involvement, spanning millions of conversations across a huge range of topics. The seed is still humans following labeling instructions, but models now help produce the bulk of the data.

A simulation of a labeler

This is Karpathy's central mental model, and it changes how you read every answer. When you ask ChatGPT something, you are not talking to a magical intelligence that went and researched your question. You are getting a statistical simulation of the kind of human labeler the company hired. The model imitates what a labeler would have written in that situation.

The labeler is not a random person off the internet. These companies hire skilled people, and for questions about code or medicine the labelers behind those conversations tend to be educated experts. So you are talking to an instantaneous simulation of a fairly skilled labeler, averaged across many of them. Not one specific person. A statistical average of the labelers.

Take his example: ask for the top five landmarks to see in Paris. The model did not rank the world's landmarks with infinite intelligence. If that exact question sat in the post-training dataset, you get something close to what a labeler wrote after twenty minutes of their own research. If it did not, the answer is more emergent: the model combines its pre-training knowledge of what Paris landmarks people talk about with the labeler style it learned in post-training, and produces an imitation of what such a labeler would say.

You are not talking to a magical AI. You are getting a statistical simulation of a labeler that was hired by OpenAI.

System prompts and what this means for you

The persona you meet is a default set by post-training, not the only one possible. A system prompt sits at the front of the conversation and can shift the model into a different role, the same way the base model took on a role when it was prompted to look like a chat transcript. It is still the same simulator underneath, nudged toward a different character.

Reading the model as a simulation of a labeler makes its strengths and failures legible. It answers with the confident tone of a labeler because labelers wrote confident answers, which is exactly why a model will sometimes state something false with total assurance. It does not consult a database and it does not verify facts. It writes what a knowledgeable person plausibly would write.

The practical move is to treat an answer as a well-informed draft from a skilled generalist, not a lookup from an oracle. It is often right, especially on common topics the labelers and the internet cover well. It is neither a person with beliefs nor a search engine with citations. Knowing which of those two things you are dealing with, and knowing it is neither, is most of what it takes to use these tools well.

The default persona comes from post-training; a system prompt can shift it into another role.
The confident tone is learned from labelers, which is why wrong answers can still sound sure.
The model writes what a knowledgeable person would write; it does not look facts up.
Read answers as skilled drafts to verify, not authoritative lookups to trust.

Watch the full 3.5-hour video

Read it, then build it with people.

Bring this to a free Oslo Vibe Coding drop-in and put it to work with people around you.

RSVP for the next session New to this? Start here