After this lesson, you will be able to: Understand pre-training, RLHF, fine-tuning, and LoRA, the lifecycle of making an LLM useful.
An LLM goes through three stages: pre-training (raw next-token prediction on internet-scale data), supervised fine-tuning (instruction following), and RLHF (alignment with human preferences). Plus: how you customize one for your domain.
1. Pre-training, predict next token on trillions of tokens. Costs millions, takes months. The base model 'knows things'. 2. Supervised Fine-Tuning (SFT), train on Q/A pairs to follow instructions. 3. RLHF (Reinforcement Learning from Human Feedback), humans rank outputs, reward model trains preferences.
1. Prompt engineering, cheapest, often enough.
2. RAG (retrieval-augmented generation), inject relevant docs into prompt.
3. Fine-tuning, train on your data; permanent capability change.
4. LoRA / QLoRA, low-rank adaptation; fine-tune cheaply.
5. Full retraining, almost never the right choice.
Fine-tune when: you have 1000+ examples of desired behavior, prompting is brittle, you need consistent format/tone. Don't fine-tune to teach facts. RAG is better.
Sign in and purchase access to unlock this lesson.