Skip to content

Week 01 Assignment

Complete this assignment before starting Week 2. It covers the five core skills from Week 1.


Submission format

Create a GitHub repository named llm-week01 with the following structure:

llm-week01/
├── README.md      ← Summary of what you built and your results
├── task1.py
├── task2.py
├── task3.py
├── task4.py
├── task5.py
└── requirements.txt

Each script should run with python taskN.py and print its results.


Task 1 — API fundamentals (2 points)

Write a Python script that: - Calls the OpenAI chat API to answer 5 different questions - Measures and prints the latency for each call - Prints the total token count and estimated cost for all 5 calls combined

Expected output:

Q1: What is Python? → Answer in 10 words... | 45 tokens | 312ms
Q2: ...
...
Total: 210 tokens | $0.000032 | avg 350ms


Task 2 — Embeddings and similarity (2 points)

Write a script that: - Takes 10 sentences (your choice of topic) - Embeds all 10 using text-embedding-3-small - Finds and prints the top-3 most similar pairs using cosine similarity - Finds and prints the most dissimilar pair

Expected output:

Most similar pairs:
  1. "Python is a programming language" ↔ "Python is used for data science" — similarity: 0.943
  2. ...
Most dissimilar pair:
  "Python is a programming language" ↔ "The sky is blue" — similarity: 0.121


Task 3 — Basic RAG (3 points)

Build a minimal RAG system: - Ingest at least 5 markdown files (or 1 PDF) into ChromaDB - Accept a question from the user (use input()) - Retrieve the top-3 relevant chunks - Generate an answer with citations - Print the answer and which source(s) it came from

Run it end-to-end and screenshot or paste the output in your README.


Task 4 — Evaluation (2 points)

Using your Task 3 system: - Define 5 test questions with expected source documents - Run retrieval for each question - Calculate Recall@3: fraction of test questions where the expected source appears in the top-3 results - Print the results

Expected output:

Q1: "What is chunking?" → Sources: [faq_2, faq_5, guide_1] | Expected: faq_2 | HIT
Q2: ...
Recall@3: 4/5 = 80%


Task 5 — Structured output (1 point)

Write a script that uses function calling (OpenAI tools API or with_structured_output) to extract structured data from 3 different unstructured text samples. The schema should include at least 4 fields with different types (string, int/float, list, optional).

Print each extraction result as a Pydantic model.


Grading rubric

Task Points Pass criteria
1 2 All 5 calls work, latency and cost printed accurately
2 2 Similarity computation correct, output includes actual similarity scores
3 3 End-to-end pipeline works with real documents, citations present
4 2 Recall@3 computed correctly over ≥5 test questions
5 1 Extraction returns a valid Pydantic model for all 3 inputs

Total: 10 points. Pass threshold: 7/10.


Don't optimize until it works

Get a working end-to-end result first, then improve. A working system with 60% retrieval recall is worth more than a theoretically perfect system that doesn't run.