BiTree
  • Search For Lessons
  • Curriculum
  • Pricing
  • For Educators
  • Become a Tutor
  • About
  • Contact
Log InGet Started

Questions, concerns, bug reports, or suggestions? We read every message, write to us at [email protected].

More ways to reach us →
BiTree

Live coding lessons for aspiring developers and security professionals.

[email protected]

(201) 785-7951

Mon–Fri, 9 AM–5 PM EST

Learn

  • Search For Lessons
  • Curriculum
  • Pricing

Company

  • About
  • For Educators & Schools
  • Become a Tutor
  • Contact Us

Legal

  • Terms of Service
  • Privacy Policy
© 2026 BiTree. All rights reserved.
Curriculum/Artificial Intelligence/Training and Fine-Tuning LLMs
45 minIntermediate

Training and Fine-Tuning LLMs

After this lesson, you will be able to: Understand pre-training, RLHF, fine-tuning, and LoRA, the lifecycle of making an LLM useful.

An LLM goes through three stages: pre-training (raw next-token prediction on internet-scale data), supervised fine-tuning (instruction following), and RLHF (alignment with human preferences). Plus: how you customize one for your domain.

Prerequisites:The LLM Landscape: Models and Companies

Three training phases

1. Pre-training, predict next token on trillions of tokens. Costs millions, takes months. The base model 'knows things'. 2. Supervised Fine-Tuning (SFT), train on Q/A pairs to follow instructions. 3. RLHF (Reinforcement Learning from Human Feedback), humans rank outputs, reward model trains preferences.

💡 Why RLHF matters

Pre-trained models can produce harmful, false, or unhelpful content. RLHF aligns them with human preferences (helpful, harmless, honest). It's why ChatGPT feels more usable than GPT-3 base.

Customizing a model

  1. 1

    1. Prompt engineering, cheapest, often enough.

  2. 2

    2. RAG (retrieval-augmented generation), inject relevant docs into prompt.

  3. 3

    3. Fine-tuning, train on your data; permanent capability change.

  4. 4

    4. LoRA / QLoRA, low-rank adaptation; fine-tune cheaply.

  5. 5

    5. Full retraining, almost never the right choice.

When to fine-tune

Fine-tune when: you have 1000+ examples of desired behavior, prompting is brittle, you need consistent format/tone. Don't fine-tune to teach facts. RAG is better.

💡 RAG vs fine-tuning

RAG = inject knowledge dynamically. Best for facts that change. Fine-tuning = embed behavior. Best for tone/format/structure. Most production apps use BOTH.

Sign in and purchase access to unlock this lesson.

Sign in to purchase
←Hands-on: Hugging Face and Open-Source Models
Back to Artificial Intelligence
Prompt Engineering→