BiTree
  • Search For Lessons
  • Curriculum
  • Pricing
  • For Educators
  • Become a Tutor
  • About
  • Contact
Log InGet Started

Questions, concerns, bug reports, or suggestions? We read every message, write to us at [email protected].

More ways to reach us →
BiTree

Live coding lessons for aspiring developers and security professionals.

[email protected]

(201) 785-7951

Mon–Fri, 9 AM–5 PM EST

Learn

  • Search For Lessons
  • Curriculum
  • Pricing

Company

  • About
  • For Educators & Schools
  • Become a Tutor
  • Contact Us

Legal

  • Terms of Service
  • Privacy Policy
© 2026 BiTree. All rights reserved.
Curriculum/Artificial Intelligence/What are Large Language Models?
50 minIntermediate

What are Large Language Models?

After this lesson, you will be able to: Understand LLMs: tokens, transformers, context windows, and why they seem intelligent.

Large Language Models predict the next token given previous tokens. That simple task, done at scale with the transformer architecture, produces ChatGPT. This lesson demystifies what's under the hood.

Prerequisites:Neural Networks and Deep Learning

Tokens, not words

LLMs see text as tokens, sub-word units. 'Hello world' might be 3 tokens. 'antidisestablishmentarianism' is many. Tokens are the atom of LLM I/O, and what you pay for in API pricing.

💡 The transformer

2017 paper 'Attention Is All You Need'. Replaced RNNs. Key idea: attention, every token can attend to every other token, weighted by learned relevance. Parallelizable, scales beautifully on GPUs.

How an LLM generates

  1. 1

    1. Your prompt is tokenized.

  2. 2

    2. Tokens flow through transformer layers.

  3. 3

    3. Last layer outputs probabilities over the vocabulary.

  4. 4

    4. Sample one token (greedy, top-k, top-p).

  5. 5

    5. Append to prompt, repeat for next token.

  6. 6

    6. Stop at end-of-sequence or max tokens.

Context window

The total tokens the model sees at once (prompt + response). GPT-4: 128K. Claude: 200K. Gemini 1.5: 1M+. Larger windows = longer documents in one shot, but more compute per token.

💡 Why they seem to reason

LLMs aren't reasoning the way humans do, they pattern-match on a vast corpus of human reasoning. With chain-of-thought prompting, they decompose problems. Effective for many tasks, fails in surprising ways. Don't anthropomorphize.

Quick Check

Why are LLMs bad at counting characters?

Think about how text is represented inside the model.

Sign in and purchase access to unlock this lesson.

Sign in to purchase
←Neural Networks and Deep Learning
Back to Artificial Intelligence
The LLM Landscape: Models and Companies→