Hands-on: Hugging Face and Open-Source Models

After this lesson, you will be able to: Use Hugging Face to find, evaluate, and run an open-source model; understand where open models fit alongside frontier APIs.

Hugging Face is the canonical hub for open-source AI: hundreds of thousands of models, datasets, and demo Spaces. This lesson takes you from 'never heard of HF' to 'I can pick a model, run it, and explain when open beats closed.'

Prerequisites:The LLM Landscape: Models and Companies

What Hugging Face is, in plain language

Hugging Face (HF) hosts AI models the way GitHub hosts code: open weights, datasets, version control, and a community page per model. Three things you'll use the most: Models (downloadable open-source weights), Datasets (training/evaluation data), and Spaces (live demos hosted by the community). The Transformers Python library is HF's standard interface for running models locally.

First-pass exploration: pick a small open model

Stay small to keep iteration fast. Llama-3.2-1B-Instruct or Mistral-7B-Instruct are good starting points.

1
Sign up at huggingface.co (free), generate a read access token from Settings → Access Tokens
2
Browse hf.co/models, filter by Task = Text Generation, sort by Trending
3
Open a model card (e.g. meta-llama/Llama-3.2-1B-Instruct). Read the description, license (important!), and use cases
4
Note whether the model requires acceptance of an EULA (Llama, Gemma) before download
5
Check the 'Use in Transformers' snippet at the top right of the model card

Run an open model locally with Transformers

pip install transformers torch. Works on a CPU; faster on GPU.

python

from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="meta-llama/Llama-3.2-1B-Instruct",
    device_map="auto",
)

out = pipe(
    [
        {"role": "user", "content": "Explain RAG in 3 sentences."},
    ],
    max_new_tokens=200,
)
print(out[0]["generated_text"][-1]["content"])

When open beats closed (and when it doesn't)

Open wins on: data privacy (run entirely on your hardware), cost predictability (no per-token fees), domain fine-tuning (LoRA + a small dataset), edge / offline use. Closed (Claude, GPT, Gemini) wins on: raw capability at frontier tasks, multimodal support, reliability, ease of integration. Common pattern in 2026 production stacks: closed frontier model for the user-facing reasoning, fine-tuned open model for high-volume classification/extraction subtasks.

Common mistakes only experienced practitioners avoid

Picking a 70B model when a 7B handles the task. Inference cost differs by 10x for marginal quality gains. Ignoring licence terms. Some weights (older Llama variants) restrict commercial use; Mistral and Qwen are permissively licensed. Skipping the dataset card. Models trained on poorly-documented data inherit invisible biases. Forgetting to set torch.no_grad() on inference scripts. Eats GPU memory for nothing.

Quick Check

Which Hugging Face surface is most useful for trying a model without installing anything?

Pick the no-install option.

ModelsDatasetsSpacesInference Endpoints

←The LLM Landscape: Models and Companies

Back to Artificial Intelligence

Training and Fine-Tuning LLMs→