Skip to content

Agenda — How LLMs Work

Session length: ~3 hours | Difficulty: Beginner → Intermediate | Coding time: ~1 hour

Why this session matters

Every practical decision you make in this course — which model to pick, how to set temperature, why your 500-page PDF retrieval fails — traces back to one question: what is actually happening inside the model? This session answers that question with enough depth to make you dangerous, without requiring a PhD in mathematics.

Learning objectives

By the end of Part 1 you will be able to:

  • Explain how the transformer architecture processes a sentence in parallel
  • Describe what attention scores represent and why they matter
  • Trace the journey from raw text → tokens → embeddings → next-token prediction
  • Reason about context window limits when designing RAG pipelines
  • Tune temperature, top_p, and top_k with confidence for any task

Session outline

Time Topic File
0:00 – 0:40 Transformers and the attention mechanism 01-transformers-and-attention
0:40 – 1:10 Tokenization — how text becomes numbers 02-tokenization
1:10 – 1:30 Context windows — what the model can "see" 03-context-windows
1:30 – 2:00 How LLMs generate text — sampling and decoding 04-how-llms-generate-text
2:00 – 2:30 Hands-on practice exercises 05-practice-exercises
2:30 – 3:00 Interview questions review 06-interview-questions

Prerequisites

None. This is Day 1. Bring Python 3.10+ and a terminal.

Setup

pip install openai anthropic tiktoken
import os
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")

Before the session

Skim the abstract of "Attention Is All You Need" (Vaswani et al., 2017). Don't worry about the math — just absorb the vocabulary. Every concept we cover today traces back to that eight-page paper.


Week-01/README | Next: 01-transformers-and-attention