Interview Questions — AI Agents¶
Q1: What distinguishes an agent from a chain? Give a concrete example of each.
Show answer
A chain executes a fixed, predetermined sequence of steps. Every invocation follows the same path. Example: a RAG pipeline that always does Retrieve → Compress → Generate → Return. No branching, no tool selection, no variable-length execution.
An agent uses the model's output to decide what to do next. The sequence is determined at runtime. Example: a research agent given the question "What company recently acquired OpenAI's main competitor?" might: (1) search for "OpenAI competitor" → (2) see "Anthropic" → (3) search "recent acquisitions of Anthropic" → (4) discover there are none → (5) reframe the search → (6) answer. The number of steps and which tools are called depend on what the agent finds.
Key distinction: in a chain, the developer decides the flow. In an agent, the model decides the flow.
Q2: Explain the ReAct loop. What does "Thought → Action → Observation" mean in practice?
Show answer
ReAct (Reasoning + Acting) is a prompting framework where the model writes its reasoning before each action:
- Thought: The model's internal reasoning about what to do next. It's not a tool call — it's the model "thinking out loud." Example: "I need to find the population of Paris to answer this question."
- Action: A specific tool call the model requests. Formatted as
tool_name[input]. Example:search[population of Paris]. - Observation: The actual result of executing the action. This is inserted by your code, not generated by the model. Example:
Paris has approximately 2.16 million people.
The loop continues until the model writes "Final Answer:" instead of another Thought/Action pair.
The key implementation detail: stop generation at "Observation:" (before the model writes its own), execute the real tool, inject the real result, then continue generation.
Q3: What is the max_steps guard and why is it non-negotiable in production agents?
Show answer
max_steps is a hard limit on the number of tool-calling iterations an agent can perform before forcibly stopping.
Without it: a misbehaving agent can enter an infinite loop — calling the same tool repeatedly, receiving results it can't process, and continuing to call. Each iteration costs tokens and API call fees. An agent looping for an hour against the OpenAI API can accumulate hundreds of dollars in costs.
The loop is always possible because: (1) the model might receive a tool result that confuses it, (2) the original question might be unanswerable with the available tools, causing repeated failed attempts, (3) a bug in your tool execution might return malformed results the model keeps retrying.
Set max_steps based on your task: 5 for simple queries, 10–15 for research tasks, and never more than 20 without a human-in-the-loop checkpoint.
Q4: Compare in-context memory, sliding window, and summary buffer. When is each appropriate?
Show answer
In-context (full history): Keep every message in the conversation. Simple but context grows unboundedly. Appropriate for short tasks (under 20 turns) where losing any context would break correctness.
Sliding window: Keep only the last N turns. Constant memory cost, but older context is completely lost. Appropriate for chat assistants where recent turns are sufficient and the user doesn't expect the agent to remember things said 50 messages ago.
Summary buffer: When history exceeds a threshold, summarize older messages and store the summary alongside recent messages. Preserves key facts from the full history without paying full token cost. Appropriate for long research sessions where earlier findings need to be available but verbatim reproduction isn't required.
Vector memory (not listed but worth knowing): Store important facts as embeddings; retrieve the most relevant ones for each query. Best for very long-running or personalized agents where the memory is large but any given query only needs a subset.
In practice: combine summary buffer (for conversation flow) + vector memory (for durable facts).
Q5: What is the difference between plan-and-execute and ReAct for multi-step tasks?
Show answer
ReAct (reactive): Decides one step at a time. Each Thought is based on the result of the previous Observation. Handles surprises gracefully — if step 2 reveals the original approach was wrong, the agent can pivot. More robust to unexpected tool results. Slower per query due to sequential LLM calls.
Plan-and-execute: Generates the complete plan upfront, then executes each step. Allows parallel execution of independent steps. More efficient when the task structure is known. Brittle when early assumptions are wrong — if step 2 fails in an unexpected way, the remaining plan may be useless.
When to use each: - Unknown task length, dynamic tool selection → ReAct - Structured task with predictable steps → Plan-and-execute - Mixed: use plan-and-execute with a reflection loop (re-plan when a step fails)
Q6: How would you debug an agent that is looping — calling the same tool repeatedly without progress?
Show answer
Step 1: Read the trace. Every tool call should be logged. Look for patterns: is the agent calling the same tool with the same input repeatedly? Is it calling a tool with increasingly confused inputs?
Step 2: Check the observation. The repeated loop usually means the model isn't understanding the tool's output. Check: is the output in an unexpected format? Is it too long, truncating relevant information? Does it contain an error the model doesn't know how to handle?
Step 3: Check the system prompt. Does the agent know what to do when a tool returns no results? Add explicit instructions: "If search returns no results, try a different search term. If after 3 searches you find nothing, say so."
Step 4: Inspect context length. If the context is near the limit, older messages may be truncated, losing the original question. The agent loses its goal and starts looping. Add windowing.
Step 5: Add a guard. Track tool calls in a set; if the agent calls the same (tool, input) pair twice, inject a message: "You already tried this. Try a different approach."