After this lesson, you will be able to: Understand LLMs: tokens, transformers, context windows, and why they seem intelligent.
Large Language Models predict the next token given previous tokens. That simple task, done at scale with the transformer architecture, produces ChatGPT. This lesson demystifies what's under the hood.
LLMs see text as tokens, sub-word units. 'Hello world' might be 3 tokens. 'antidisestablishmentarianism' is many. Tokens are the atom of LLM I/O, and what you pay for in API pricing.
1. Your prompt is tokenized.
2. Tokens flow through transformer layers.
3. Last layer outputs probabilities over the vocabulary.
4. Sample one token (greedy, top-k, top-p).
5. Append to prompt, repeat for next token.
6. Stop at end-of-sequence or max tokens.
The total tokens the model sees at once (prompt + response). GPT-4: 128K. Claude: 200K. Gemini 1.5: 1M+. Larger windows = longer documents in one shot, but more compute per token.
Think about how text is represented inside the model.
Sign in and purchase access to unlock this lesson.