Anthropic Messages API¶
The Anthropic Messages API powers Claude. It shares the same core concepts as OpenAI (messages, tools, streaming) but has important design differences that affect how you write production code.
Learning objectives¶
- Understand the structural differences between OpenAI and Anthropic APIs
- Use Claude's tool_use blocks correctly
- Implement streaming with event-based parsing
- Apply prompt caching to reduce cost on long system prompts
API structure differences at a glance¶
| Feature | OpenAI | Anthropic |
|---|---|---|
| System prompt | messages[0].role = "system" |
Separate system parameter |
| Response object | response.choices[0].message.content |
response.content (list of blocks) |
| Tool calls | message.tool_calls (list) |
content blocks with type: "tool_use" |
| Tool results | role: "tool" messages |
role: "user" with type: "tool_result" block |
| Streaming | stream=True, delta chunks |
Event-based stream with typed events |
| Context window | 128K (gpt-4o) | 200K (claude-sonnet-4-6) |
| Token counting | response.usage |
response.usage + separate client.messages.count_tokens() |
Basic messages API call¶
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
system="You are a concise technical assistant. Answer in 2-3 sentences.",
messages=[
{"role": "user", "content": "What is the difference between RAG and fine-tuning?"}
]
)
print(response.content[0].text)
print(f"\nInput tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
The content field is a list of ContentBlock objects. For text-only responses, content[0].text is always safe. For tool use, you may have multiple blocks of different types.
Multi-turn conversations¶
def chat(client: Anthropic, system: str, messages: list[dict], user_input: str) -> tuple[str, list[dict]]:
messages = messages + [{"role": "user", "content": user_input}]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
system=system,
messages=messages
)
assistant_reply = response.content[0].text
messages = messages + [{"role": "assistant", "content": assistant_reply}]
return assistant_reply, messages
# Usage
system = "You are a helpful Python tutor."
history = []
reply, history = chat(client, system, history, "What is a decorator?")
print(f"Claude: {reply}\n")
reply, history = chat(client, system, history, "Show me a simple example.")
print(f"Claude: {reply}\n")
Tool use with Claude¶
Claude's tool use follows the same logical flow as OpenAI but uses different data structures.
import json
tools = [
{
"name": "get_customer",
"description": "Retrieve customer information by customer ID.",
"input_schema": {
"type": "object",
"properties": {
"customer_id": {
"type": "string",
"description": "The customer's unique identifier"
}
},
"required": ["customer_id"]
}
},
{
"name": "update_subscription",
"description": "Update a customer's subscription plan.",
"input_schema": {
"type": "object",
"properties": {
"customer_id": {"type": "string"},
"plan": {"type": "string", "enum": ["free", "pro", "enterprise"]}
},
"required": ["customer_id", "plan"]
}
}
]
# Fake implementations
def get_customer(customer_id: str) -> dict:
return {"id": customer_id, "name": "Alice Johnson", "plan": "free", "email": "alice@example.com"}
def update_subscription(customer_id: str, plan: str) -> dict:
return {"success": True, "customer_id": customer_id, "new_plan": plan}
TOOL_REGISTRY = {"get_customer": get_customer, "update_subscription": update_subscription}
def run_claude_with_tools(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1000,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
# Extract text from final response
return next(b.text for b in response.content if b.type == "text")
if response.stop_reason != "tool_use":
break
# Append Claude's response (contains tool_use blocks)
messages.append({"role": "assistant", "content": response.content})
# Execute each tool call and collect results
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
fn = TOOL_REGISTRY[block.name]
result = fn(**block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
# Send results back as a user message
messages.append({"role": "user", "content": tool_results})
return ""
print(run_claude_with_tools("Upgrade customer cust_123 to the pro plan."))
Tool results go in role: "user" messages
This is the most common source of confusion when migrating from OpenAI. In Anthropic's API, tool results are sent as a user turn with type: "tool_result" content blocks — NOT as role: "tool" messages.
Streaming¶
Anthropic uses an event-based stream with strongly-typed events.
def stream_claude(prompt: str) -> str:
full_text = ""
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
full_text += text
print() # newline
return full_text
result = stream_claude("Explain the CAP theorem in 3 bullet points.")
For streaming with tool use, handle typed events directly:
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1000,
tools=tools,
messages=[{"role": "user", "content": "Get info for customer cust_456"}]
) as stream:
for event in stream:
if event.type == "content_block_start":
if hasattr(event.content_block, "type"):
print(f"\n[Block type: {event.content_block.type}]")
elif event.type == "content_block_delta":
if hasattr(event.delta, "text"):
print(event.delta.text, end="", flush=True)
Prompt caching¶
For long system prompts (>1024 tokens for Sonnet), prompt caching reduces cost by 90% on cached tokens.
LONG_SYSTEM = """
You are an expert Python developer with deep knowledge of:
- FastAPI and async patterns
- SQLAlchemy ORM and database optimization
- Redis caching strategies
- Docker and Kubernetes deployment
- Testing with pytest and hypothesis
[... imagine 2000 more tokens of detailed instructions ...]
"""
def cached_query(user_question: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
system=[
{
"type": "text",
"text": LONG_SYSTEM,
"cache_control": {"type": "ephemeral"} # marks this block for caching
}
],
messages=[{"role": "user", "content": user_question}]
)
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache write tokens: {response.usage.cache_creation_input_tokens}")
return response.content[0].text
# First call: cache_creation_input_tokens > 0 (charged at 1.25× rate)
r1 = cached_query("How do I write a FastAPI dependency for database sessions?")
# Second call: cache_read_input_tokens > 0 (charged at 0.1× rate — 90% discount)
r2 = cached_query("What's the best way to handle connection pooling?")
When caching pays off
Cache write costs 1.25× the normal input rate. Cache reads cost 0.1×. Break-even is after ~2 calls. For a 2000-token system prompt on Claude Sonnet 4.6 ($3/M input tokens):
- Without cache: $0.006 per call × 100 calls = $0.60
- With cache (after first call): 2000 × $0.0003 = $0.06 cached read + $0.0075 first write = $0.067 total
Token counting¶
Count tokens before sending to avoid context limit errors on large documents.
def count_and_check(messages: list[dict], system: str, max_tokens_budget: int = 150_000) -> bool:
token_count = client.messages.count_tokens(
model="claude-sonnet-4-6",
system=system,
messages=messages
)
print(f"Request will use {token_count.input_tokens:,} input tokens")
if token_count.input_tokens > max_tokens_budget:
print(f"Warning: exceeds budget of {max_tokens_budget:,} tokens")
return False
return True
messages = [{"role": "user", "content": "Summarize this document: " + "word " * 5000}]
count_and_check(messages, "You are a summarizer.", max_tokens_budget=100_000)
Model options¶
| Model | Context | Strengths | Cost (input/output per 1M) |
|---|---|---|---|
claude-opus-4-7 |
200K | Hardest reasoning, coding, research | $15 / $75 |
claude-sonnet-4-6 |
200K | Best capability/cost, general purpose | $3 / $15 |
claude-haiku-4-5-20251001 |
200K | Fastest, cheapest, simple tasks | $0.80 / $4 |
Claude Sonnet 4.6 as your default
For most production use cases, claude-sonnet-4-6 hits the right balance. Use Haiku for classification, routing, and high-volume extraction where you've already verified quality. Reach for Opus only for tasks where Sonnet measurably falls short.