Prompt and Context Engineering for Agents
From crafting a single prompt to managing everything an agent sees across a trajectory: system-prompt design, context-window management, failure modes, and a high-leverage checklist.
From prompt engineering to context engineering
Prompt engineering is crafting a single instruction. Context engineering is managing the entire token window an agent sees across a multi-step trajectory: system prompt, tool definitions, retrieved data, memory outputs, and accumulated message history. Anthropic defines it as "the set of strategies for curating and maintaining the optimal set of tokens during LLM inference." Cognition (Devin) calls it "the #1 job of engineers building AI agents."
The framing shift matters because agents fail not from a bad one-liner prompt but from poor information architecture across the whole window.
System-prompt design for agents
A well-structured system prompt typically covers five layers:
- Role + objective — who the agent is and what success looks like.
- Tool-use instructions — when to call which tool, and when not to.
- Constraints and guardrails — what the agent must never do.
- Output format — schema, tone, length, and structure of the response.
- "Right altitude" principle — specific enough to steer behavior; general enough to generalize to cases not enumerated. Avoid exhaustive rule lists; prefer canonical examples.
Anthropic's guidance: if a human engineer cannot definitively say which tool to use in a given situation, an agent cannot either — keep the toolset minimal and curated.
Core prompt-engineering techniques
- Clear instructions with few-shot examples — show desired behavior on representative cases; a small number of diverse canonical examples outperforms a long rule list.
- Structured delimiters — XML tags (
<task>,<context>,<constraints>) or markdown sections isolate logical regions and reduce ambiguity about what is instruction versus data. - Tool and response schemas — typed JSON schemas for tool inputs and agent outputs eliminate ambiguity. See /resources/reliable-tool-calling.
- Chain-of-thought / explicit planning — instruct the agent to reason before acting (
<thinking>block or a planning step) to reduce shallow-reasoning errors on multi-step tasks. - Tools instead of long instructions — when a behavior requires complex retrieval or calculation, provide a tool rather than trying to encode the logic in prose.
Context-window management (the agent-specific layer)
For single-turn LLM calls, context is static. For agents the window is dynamic and grows across steps. Key management strategies:
- Compaction / summarization — periodically replace accumulated history with a compact summary before the window fills. The Claude API exposes a token-efficient compaction mode.
- Just-in-time retrieval — pull only the most relevant context at each step instead of loading everything upfront. See /resources/rag-retrieval-for-agents.
- Memory tiers — distinguish in-context working memory from external persistent stores. See /resources/agent-memory-context.
- Context isolation per sub-agent — give each sub-agent a scoped window so noisy tool outputs or unrelated history cannot distract it. Per Cognition: share full agent traces, not just individual messages, when coordination is needed.
- Prompt caching — a stable, long system-prompt prefix is cached by major providers, making a rich prefix cheap to repeat across turns. See /resources/agent-cost-latency-optimization.
Failure modes to know
| Failure mode | Description | Mitigation |
|---|---|---|
| Context rot | Recall degrades as window length grows | Compaction; just-in-time retrieval |
| Lost-in-the-middle | Model under-weights information in the middle of a long context | Put high-priority context at the start or end; summarize the middle |
| Context poisoning | Adversarial data injected into retrieved content or tool outputs | Treat all external content as untrusted data, not instructions. See /resources/agentic-security-checklist |
| Context clash / distraction | Conflicting instructions from different context segments | Use structured delimiters; isolate sub-agent windows |
Evaluation and iteration
Treat prompts and context configurations as versioned artifacts (store in version control, review in PRs). Measure changes with evals before deploying — see /resources/evaluating-ai-agents. Avoid over-fitting to one model's quirks: test on at least two models before calling a prompt stable.
High-leverage context-engineering checklist
- System prompt has all five layers (role, tools, constraints, format, examples).
- Toolset is minimal: only tools the agent needs for this task.
- At least two diverse few-shot examples per non-trivial behavior.
- Structured delimiters separate instruction, data, and history regions.
- Tool schemas and response schemas are typed (JSON Schema or equivalent).
- History compaction or summarization strategy is defined.
- Retrieval is just-in-time, not full-corpus-upfront.
- Context isolation is applied to sub-agents doing independent work.
- Prompt is stored in version control and covered by evals.
- Tested on more than one model before treating as stable.
Verified sources
- Anthropic Engineering — "Effective context engineering for AI agents": https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- Cognition — "Don't Build Multi-Agents" (context engineering principles section): https://cognition.ai/blog/dont-build-multi-agents
- Anthropic Docs — Prompt engineering overview: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
- OpenAI — Prompt engineering guide: https://platform.openai.com/docs/guides/prompt-engineering/strategy-write-clear-instructions