Prompt and Context Engineering for Agents

Guide · updated 2026-06-16 · Markdown variant

From crafting a single prompt to managing everything an agent sees across a trajectory: system-prompt design, context-window management, failure modes, and a high-leverage checklist.

From prompt engineering to context engineering

Prompt engineering is crafting a single instruction. Context engineering is managing the entire token window an agent sees across a multi-step trajectory: system prompt, tool definitions, retrieved data, memory outputs, and accumulated message history. Anthropic defines it as "the set of strategies for curating and maintaining the optimal set of tokens during LLM inference." Cognition (Devin) calls it "the #1 job of engineers building AI agents."

The framing shift matters because agents fail not from a bad one-liner prompt but from poor information architecture across the whole window.

System-prompt design for agents

A well-structured system prompt typically covers five layers:

Role + objective — who the agent is and what success looks like.
Tool-use instructions — when to call which tool, and when not to.
Constraints and guardrails — what the agent must never do.
Output format — schema, tone, length, and structure of the response.
"Right altitude" principle — specific enough to steer behavior; general enough to generalize to cases not enumerated. Avoid exhaustive rule lists; prefer canonical examples.

Anthropic's guidance: if a human engineer cannot definitively say which tool to use in a given situation, an agent cannot either — keep the toolset minimal and curated.

Core prompt-engineering techniques

Clear instructions with few-shot examples — show desired behavior on representative cases; a small number of diverse canonical examples outperforms a long rule list.
Structured delimiters — XML tags (<task>, <context>, <constraints>) or markdown sections isolate logical regions and reduce ambiguity about what is instruction versus data.
Tool and response schemas — typed JSON schemas for tool inputs and agent outputs eliminate ambiguity. See /resources/reliable-tool-calling.
Chain-of-thought / explicit planning — instruct the agent to reason before acting (<thinking> block or a planning step) to reduce shallow-reasoning errors on multi-step tasks.
Tools instead of long instructions — when a behavior requires complex retrieval or calculation, provide a tool rather than trying to encode the logic in prose.

Context-window management (the agent-specific layer)

For single-turn LLM calls, context is static. For agents the window is dynamic and grows across steps. Key management strategies:

Compaction / summarization — periodically replace accumulated history with a compact summary before the window fills. The Claude API exposes a token-efficient compaction mode.
Just-in-time retrieval — pull only the most relevant context at each step instead of loading everything upfront. See /resources/rag-retrieval-for-agents.
Memory tiers — distinguish in-context working memory from external persistent stores. See /resources/agent-memory-context.
Context isolation per sub-agent — give each sub-agent a scoped window so noisy tool outputs or unrelated history cannot distract it. Per Cognition: share full agent traces, not just individual messages, when coordination is needed.
Prompt caching — a stable, long system-prompt prefix is cached by major providers, making a rich prefix cheap to repeat across turns. See /resources/agent-cost-latency-optimization.

Failure modes to know

Failure mode	Description	Mitigation
Context rot	Recall degrades as window length grows	Compaction; just-in-time retrieval
Lost-in-the-middle	Model under-weights information in the middle of a long context	Put high-priority context at the start or end; summarize the middle
Context poisoning	Adversarial data injected into retrieved content or tool outputs	Treat all external content as untrusted data, not instructions. See /resources/agentic-security-checklist
Context clash / distraction	Conflicting instructions from different context segments	Use structured delimiters; isolate sub-agent windows

Evaluation and iteration

Treat prompts and context configurations as versioned artifacts (store in version control, review in PRs). Measure changes with evals before deploying — see /resources/evaluating-ai-agents. Avoid over-fitting to one model's quirks: test on at least two models before calling a prompt stable.

High-leverage context-engineering checklist

System prompt has all five layers (role, tools, constraints, format, examples).
Toolset is minimal: only tools the agent needs for this task.
At least two diverse few-shot examples per non-trivial behavior.
Structured delimiters separate instruction, data, and history regions.
Tool schemas and response schemas are typed (JSON Schema or equivalent).
History compaction or summarization strategy is defined.
Retrieval is just-in-time, not full-corpus-upfront.
Context isolation is applied to sub-agents doing independent work.
Prompt is stored in version control and covered by evals.
Tested on more than one model before treating as stable.

Verified sources

Anthropic Engineering — "Effective context engineering for AI agents": https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Cognition — "Don't Build Multi-Agents" (context engineering principles section): https://cognition.ai/blog/dont-build-multi-agents
Anthropic Docs — Prompt engineering overview: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
OpenAI — Prompt engineering guide: https://platform.openai.com/docs/guides/prompt-engineering/strategy-write-clear-instructions

#prompt-engineering #context-engineering #agents #system-prompt #context-window #few-shot #chain-of-thought

Category: Guide