Training and Fine-Tuning LLMs

After this lesson, you will be able to: Understand pre-training, RLHF, fine-tuning, and LoRA, the lifecycle of making an LLM useful.

An LLM goes through three stages: pre-training (raw next-token prediction on internet-scale data), supervised fine-tuning (instruction following), and RLHF (alignment with human preferences). Plus: how you customize one for your domain.

Prerequisites:The LLM Landscape: Models and Companies

Three training phases

1. Pre-training, predict next token on trillions of tokens. Costs millions, takes months. The base model 'knows things'. 2. Supervised Fine-Tuning (SFT), train on Q/A pairs to follow instructions. 3. RLHF (Reinforcement Learning from Human Feedback), humans rank outputs, reward model trains preferences.

Customizing a model

1
1. Prompt engineering, cheapest, often enough.
2
2. RAG (retrieval-augmented generation), inject relevant docs into prompt.
3
3. Fine-tuning, train on your data; permanent capability change.
4
4. LoRA / QLoRA, low-rank adaptation; fine-tune cheaply.
5
5. Full retraining, almost never the right choice.

When to fine-tune

Fine-tune when: you have 1000+ examples of desired behavior, prompting is brittle, you need consistent format/tone. Don't fine-tune to teach facts. RAG is better.

←Hands-on: Hugging Face and Open-Source Models

Back to Artificial Intelligence

Prompt Engineering→