After this lesson, you will be able to: Use Hugging Face to find, evaluate, and run an open-source model; understand where open models fit alongside frontier APIs.
Hugging Face is the canonical hub for open-source AI: hundreds of thousands of models, datasets, and demo Spaces. This lesson takes you from 'never heard of HF' to 'I can pick a model, run it, and explain when open beats closed.'
Hugging Face (HF) hosts AI models the way GitHub hosts code: open weights, datasets, version control, and a community page per model. Three things you'll use the most: Models (downloadable open-source weights), Datasets (training/evaluation data), and Spaces (live demos hosted by the community). The Transformers Python library is HF's standard interface for running models locally.
Stay small to keep iteration fast. Llama-3.2-1B-Instruct or Mistral-7B-Instruct are good starting points.
Sign up at huggingface.co (free), generate a read access token from Settings → Access Tokens
Browse hf.co/models, filter by Task = Text Generation, sort by Trending
Open a model card (e.g. meta-llama/Llama-3.2-1B-Instruct). Read the description, license (important!), and use cases
Note whether the model requires acceptance of an EULA (Llama, Gemma) before download
Check the 'Use in Transformers' snippet at the top right of the model card
pip install transformers torch. Works on a CPU; faster on GPU.
from transformers import pipelinepipe = pipeline("text-generation",model="meta-llama/Llama-3.2-1B-Instruct",device_map="auto",)out = pipe([{"role": "user", "content": "Explain RAG in 3 sentences."},],max_new_tokens=200,)print(out[0]["generated_text"][-1]["content"])
Open wins on: data privacy (run entirely on your hardware), cost predictability (no per-token fees), domain fine-tuning (LoRA + a small dataset), edge / offline use. Closed (Claude, GPT, Gemini) wins on: raw capability at frontier tasks, multimodal support, reliability, ease of integration. Common pattern in 2026 production stacks: closed frontier model for the user-facing reasoning, fine-tuned open model for high-volume classification/extraction subtasks.
Picking a 70B model when a 7B handles the task. Inference cost differs by 10x for marginal quality gains. Ignoring licence terms. Some weights (older Llama variants) restrict commercial use; Mistral and Qwen are permissively licensed. Skipping the dataset card. Models trained on poorly-documented data inherit invisible biases. Forgetting to set torch.no_grad() on inference scripts. Eats GPU memory for nothing.
Pick the no-install option.
Sign in and purchase access to unlock this lesson.