ChangeGamer

← All resources

Agent Reasoning and Design Patterns

Guide · updated 2026-06-16 · Markdown variant

The canonical single-agent reasoning and acting loops: ReAct, Chain-of-Thought, Plan-and-Solve, ReWOO, Reflexion, Tree-of-Thoughts, and Self-Consistency — what each is, when to use it, and tradeoffs.


Every agent implementation rests on the same substrate: a loop that perceives input, reasons about what to do, executes an action (usually a tool call), observes the result, and repeats until a termination condition is met. The named patterns below are systematic ways of structuring that loop. Choose the simplest pattern that meets your correctness and cost requirements.

The core loop

Perceive → Reason → Act → Observe → repeat.

Each turn the agent reads its context window (system prompt, history, tool results), produces reasoning or a plan, calls a tool or returns a final answer, and appends the result to context. Two decisions recur in every pattern: (1) which tool to call (driven by the reasoning step and the available tool schema); (2) when to stop (goal-satisfaction check, max-iteration guard, or explicit DONE signal). Error recovery follows a retry-then-replan ladder: retry the same call on transient failures, replan on semantic failures, escalate to the caller on persistent failures.

Named patterns

ReAct — reason + act interleaved

The model emits a Thought (free-form reasoning trace), then an Action (tool call), then an Observation (tool result), cycling until it can emit a final answer. Interleaving reasoning with actions keeps the scratchpad grounded: each thought is informed by the most recent real observation. When to use: general-purpose tool-using agents, short-to-medium horizon tasks. Tradeoff: reasoning tokens paid on every turn; long loops accumulate context fast.

Chain-of-Thought (CoT)

The model reasons step-by-step in natural language before producing an answer, without interleaved tool calls. Strong for arithmetic, logic, and multi-step inference when all information is already in context. Limit: no mechanism to retrieve missing information or recover from wrong intermediate steps.

Plan-and-Execute / Plan-and-Solve

Phase 1: produce an explicit multi-step plan. Phase 2: execute each step, optionally re-planning when a step fails. Separating planning from execution lets the agent reason globally before committing to actions. When to use: long-horizon tasks where the full sequence of steps can be enumerated upfront. Tradeoff: plan may go stale if early steps produce unexpected results; requires a replanning trigger.

ReWOO — reasoning without observation

All tool calls are planned upfront in a single reasoning pass, with placeholders for their outputs. The planner runs once; an executor runs the tool calls in sequence, substituting real results for placeholders; a solver synthesizes the final answer. Because the LLM is not re-invoked between tool calls, token consumption is dramatically lower for deterministic tool sequences. When to use: tasks with predictable, independent tool calls. Tradeoff: cannot adapt mid-sequence if an early result changes what later calls should be.

Reflexion — self-reflection and retry

After a failed or low-quality attempt, the agent critiques its own output in natural language ("verbal reinforcement"), stores the critique in an episodic memory buffer, and retries with that critique in context. No weight updates are required — the feedback loop is entirely in-context. When to use: tasks with a verifiable success signal (unit tests, factual checks) where correctness matters more than token cost. Tradeoff: multiple full attempts multiply cost; critique quality depends on the model's self-awareness.

Tree-of-Thoughts (ToT)

At each step, the agent generates multiple candidate reasoning branches, scores or votes on them, and searches the tree (BFS or DFS) for the most promising path. Generalizes Chain-of-Thought from a single chain to a search over reasoning. When to use: problems requiring deliberate exploration where greedy reasoning routinely fails (planning puzzles, creative tasks with many candidate solutions). Tradeoff: compute cost scales with branching factor and depth; impractical for latency-sensitive tasks.

Self-Consistency

Sample multiple independent reasoning paths (temperature > 0), then select the answer that appears most often across paths (majority vote). No tree search or explicit scoring — diversity is obtained by sampling, consistency is the signal. When to use: arithmetic and commonsense reasoning where a correct answer is unique and verifiable. Tradeoff: cost scales linearly with the number of samples; diminishing returns beyond ~10–20 paths.

Tool-use and stopping

Reliability techniques

Decision guide

  1. Start with ReAct + well-typed tools. It covers the majority of tool-using tasks with minimal complexity. Add structured output / function-calling format to reduce mis-calls.
  2. Add explicit planning (Plan-and-Execute or ReWOO) when the task is long-horizon and the step sequence is predictable.
  3. Add reflection / verification (Reflexion, self-check steps) when correctness matters more than cost and a success signal exists.
  4. Use self-consistency or ToT only when single-pass reasoning demonstrably fails on your task and you can afford the compute.
  5. Escalate to multi-agent only when a single agent genuinely cannot hold the required context, tools, or specialization — see /resources/multi-agent-orchestration-patterns.

Cross-links: /resources/reliable-tool-calling · /resources/evaluating-ai-agents · /resources/agent-cost-latency-optimization · /resources/prompt-context-engineering · /resources/multi-agent-orchestration-patterns

Verified sources

#agents #reasoning #react #chain-of-thought #planning #reflection #patterns

Category: Guide