Skip to content

Agenda — Responsible AI and Safety

Deploying LLMs without safety measures is like deploying a web server without authentication. Your model will face adversarial inputs the moment it's public — from users probing its limits, from automated bots, and from genuinely harmful use cases. This session builds the practical skills to detect, prevent, and respond to these failure modes.

Learning objectives

By the end of this session you will be able to:

  • Identify and defend against jailbreaks and prompt injection attacks
  • Implement input/output guardrails using moderation APIs and custom classifiers
  • Apply content filtering at multiple pipeline stages
  • Audit your system for demographic bias and fairness issues

Schedule

Time Topic File
0:00 – 0:35 Jailbreaks and prompt injection 01-jailbreaks-and-prompt-injection
0:35 – 1:10 Guardrails — input validation and output checks 02-guardrails
1:10 – 1:40 Content filtering APIs and custom classifiers 03-content-filtering
1:40 – 2:20 Bias, fairness, and demographic auditing 04-bias-and-fairness
2:20 – 3:00 Practice exercises 05-practice-exercises

Setup

pip install openai anthropic presidio-analyzer presidio-anonymizer transformers
import os
from openai import OpenAI
from anthropic import Anthropic

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

Why responsible AI is not optional

In the EU, the AI Act classifies LLMs deployed in high-risk domains (employment, credit, healthcare) under strict requirements for bias auditing, documentation, and human oversight. In the US, executive orders require federal agencies to assess AI risks. Beyond regulation: a single high-profile safety failure can end a product faster than a bad review.

← Day 04 Part 1 | Start →