Agent Reasoning and Design Patterns
The canonical single-agent reasoning and acting loops: ReAct, Chain-of-Thought, Plan-and-Solve, ReWOO, Reflexion, Tree-of-Thoughts, and Self-Consistency — what each is, when to use it, and tradeoffs.
Every agent implementation rests on the same substrate: a loop that perceives input, reasons about what to do, executes an action (usually a tool call), observes the result, and repeats until a termination condition is met. The named patterns below are systematic ways of structuring that loop. Choose the simplest pattern that meets your correctness and cost requirements.
The core loop
Perceive → Reason → Act → Observe → repeat.
Each turn the agent reads its context window (system prompt, history, tool results), produces reasoning or a plan, calls a tool or returns a final answer, and appends the result to context. Two decisions recur in every pattern: (1) which tool to call (driven by the reasoning step and the available tool schema); (2) when to stop (goal-satisfaction check, max-iteration guard, or explicit DONE signal). Error recovery follows a retry-then-replan ladder: retry the same call on transient failures, replan on semantic failures, escalate to the caller on persistent failures.
Named patterns
ReAct — reason + act interleaved
The model emits a Thought (free-form reasoning trace), then an Action (tool call), then an Observation (tool result), cycling until it can emit a final answer. Interleaving reasoning with actions keeps the scratchpad grounded: each thought is informed by the most recent real observation. When to use: general-purpose tool-using agents, short-to-medium horizon tasks. Tradeoff: reasoning tokens paid on every turn; long loops accumulate context fast.
Chain-of-Thought (CoT)
The model reasons step-by-step in natural language before producing an answer, without interleaved tool calls. Strong for arithmetic, logic, and multi-step inference when all information is already in context. Limit: no mechanism to retrieve missing information or recover from wrong intermediate steps.
Plan-and-Execute / Plan-and-Solve
Phase 1: produce an explicit multi-step plan. Phase 2: execute each step, optionally re-planning when a step fails. Separating planning from execution lets the agent reason globally before committing to actions. When to use: long-horizon tasks where the full sequence of steps can be enumerated upfront. Tradeoff: plan may go stale if early steps produce unexpected results; requires a replanning trigger.
ReWOO — reasoning without observation
All tool calls are planned upfront in a single reasoning pass, with placeholders for their outputs. The planner runs once; an executor runs the tool calls in sequence, substituting real results for placeholders; a solver synthesizes the final answer. Because the LLM is not re-invoked between tool calls, token consumption is dramatically lower for deterministic tool sequences. When to use: tasks with predictable, independent tool calls. Tradeoff: cannot adapt mid-sequence if an early result changes what later calls should be.
Reflexion — self-reflection and retry
After a failed or low-quality attempt, the agent critiques its own output in natural language ("verbal reinforcement"), stores the critique in an episodic memory buffer, and retries with that critique in context. No weight updates are required — the feedback loop is entirely in-context. When to use: tasks with a verifiable success signal (unit tests, factual checks) where correctness matters more than token cost. Tradeoff: multiple full attempts multiply cost; critique quality depends on the model's self-awareness.
Tree-of-Thoughts (ToT)
At each step, the agent generates multiple candidate reasoning branches, scores or votes on them, and searches the tree (BFS or DFS) for the most promising path. Generalizes Chain-of-Thought from a single chain to a search over reasoning. When to use: problems requiring deliberate exploration where greedy reasoning routinely fails (planning puzzles, creative tasks with many candidate solutions). Tradeoff: compute cost scales with branching factor and depth; impractical for latency-sensitive tasks.
Self-Consistency
Sample multiple independent reasoning paths (temperature > 0), then select the answer that appears most often across paths (majority vote). No tree search or explicit scoring — diversity is obtained by sampling, consistency is the signal. When to use: arithmetic and commonsense reasoning where a correct answer is unique and verifiable. Tradeoff: cost scales linearly with the number of samples; diminishing returns beyond ~10–20 paths.
Tool-use and stopping
- Tool selection: driven by the tool schema in the system prompt; structured output (JSON function-call format) reduces mis-calls. See /resources/reliable-tool-calling.
- Termination: explicit DONE / final-answer token, goal-satisfaction check in code, or max-iteration guard (always set one to prevent runaway loops).
- Error recovery ladder: (1) retry identical call on transient/network errors; (2) replan the current step on semantic errors; (3) escalate to the caller on persistent failures. Log every failure with its tool name and arguments.
Reliability techniques
- Explicit planning reduces mid-sequence confusion on long-horizon tasks.
- Structured scratchpads (labeled Thought / Action / Observation blocks) prevent the model from conflating reasoning with output.
- Verification / self-check steps after each action ground the agent in what actually happened, not what it expected.
- Grounding in tool results means: never reason forward from an assumed tool output; always wait for the real observation.
- More reasoning steps cost more tokens and add latency — see /resources/agent-cost-latency-optimization.
- Pattern choice interacts with how you structure the system prompt and history — see /resources/prompt-context-engineering.
Decision guide
- Start with ReAct + well-typed tools. It covers the majority of tool-using tasks with minimal complexity. Add structured output / function-calling format to reduce mis-calls.
- Add explicit planning (Plan-and-Execute or ReWOO) when the task is long-horizon and the step sequence is predictable.
- Add reflection / verification (Reflexion, self-check steps) when correctness matters more than cost and a success signal exists.
- Use self-consistency or ToT only when single-pass reasoning demonstrably fails on your task and you can afford the compute.
- Escalate to multi-agent only when a single agent genuinely cannot hold the required context, tools, or specialization — see /resources/multi-agent-orchestration-patterns.
Cross-links: /resources/reliable-tool-calling · /resources/evaluating-ai-agents · /resources/agent-cost-latency-optimization · /resources/prompt-context-engineering · /resources/multi-agent-orchestration-patterns
Verified sources
- ReAct (arXiv:2210.03629) — Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models", 2022: https://arxiv.org/abs/2210.03629
- Reflexion (arXiv:2303.11366) — Shinn et al., "Reflexion: Language Agents with Verbal Reinforcement Learning", NeurIPS 2023: https://arxiv.org/abs/2303.11366
- Tree-of-Thoughts (arXiv:2305.10601) — Yao et al., "Tree of Thoughts: Deliberate Problem Solving with Large Language Models", NeurIPS 2023: https://arxiv.org/abs/2305.10601
- ReWOO (arXiv:2305.18323) — Xu et al., "ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models", 2023: https://arxiv.org/abs/2305.18323
- Self-Consistency (arXiv:2203.11171) — Wang et al., "Self-Consistency Improves Chain of Thought Reasoning in Language Models", 2022: https://arxiv.org/abs/2203.11171
- Plan-and-Solve (arXiv:2305.04091) — Wang et al., "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models", ACL 2023: https://arxiv.org/abs/2305.04091