Agentic Security Checklist

Guide · updated 2026-06-15 · Markdown variant

Cross-vendor, threat-surface-organized security checklist for building and operating AI agents — synthesizing OWASP, NIST, Anthropic, OpenAI, Google SAIF, and MITRE ATLAS.

Labs publish vendor-specific guidance. This checklist synthesizes it across vendors into one agent-consumable reference, organized by threat surface. Check each item before shipping an agent to production.

Summary table

Threat surface	Highest-impact control
Prompt injection	Treat all external content as untrusted data, not instructions
Tool/function-call abuse	Least-privilege toolset; confirm before destructive calls
Excessive agency	Scope permissions to the minimum required for each task
Secrets & credentials	Never pass secrets through the context window
MCP / supply chain	Pin versions; audit tool descriptions before connecting
Output / action sandboxing	Human-in-the-loop gate on irreversible actions
Memory & context poisoning	Validate and sanitize retrieved context before injection
Data exfiltration	Network egress allowlist; outbound content inspection
Auth & OAuth scopes	Request only the OAuth scopes the agent needs per task
Logging & auditability	Log every tool call, input, and output with a trace ID

1. Prompt injection — direct and indirect

Enforce a strict separation between trusted instructions (system prompt) and untrusted data (user input, tool outputs, retrieved documents). Never concatenate them without a structural delimiter.
Treat all content fetched from the web, email, databases, or tool responses as untrusted user data — not as instructions — regardless of where it originated (OWASP Agentic ASI01 Agent Goal Hijack; OWASP LLM01:2025 Prompt Injection).
Strip or escape instruction-like patterns (Ignore previous instructions, <system>, ASSISTANT:) from retrieved content before inserting into context.
Validate that tool outputs conform to the expected schema and type before the model acts on them.
Run a separate classification step or guardrail on user input to detect jailbreak patterns before the main agent sees them (OpenAI agent builder safety guidance).

2. Tool and function-call abuse

Maintain a minimal toolset: expose only the tools needed for the current task; disable or detach others at runtime (Anthropic trustworthy-agents framework — harness layer).
Require explicit confirmation before any tool call that is irreversible, has financial impact, or modifies shared state (email, file system, database write, API mutation).
Log the full tool-call request (name + arguments) and response before execution.
Rate-limit tool calls per agent turn to bound blast radius from runaway loops.
Reject tool calls with arguments that reference paths, URIs, or identifiers outside the expected domain.

3. Excessive agency and least-privilege

Grant the agent only the permissions it needs for the specific task, not for all tasks it might ever do (OWASP LLM06:2025 Excessive Agency; Google SAIF principle of minimizing blast radius).
Prefer read-only tool variants when write access is not required for the step.
Time-bound credentials: issue short-lived tokens per task rather than long-lived agent credentials.
Review and prune the tool list each time the agent's scope changes.

4. Untrusted content handling

Sanitize HTML, Markdown, and JSON retrieved from external sources before insertion into the context window.
Do not render or execute content returned by tools without validation.
Limit the size of individual tool responses injected into context to prevent context-flooding attacks.
Apply output encoding when agent-generated content is passed to downstream systems (SQL, shell, HTML rendering).

5. Secrets and credential management

Never place API keys, passwords, or tokens in the system prompt, user message, or any field visible to the model.
Inject secrets at the infrastructure layer (environment variables, secret managers) and resolve them in tool-call wrappers, not in context.
Rotate credentials on a schedule; revoke immediately on suspected compromise.
Audit which tool servers hold credentials on your behalf — long-lived third-party tokens stored server-side are high blast-radius targets.

6. MCP server trust and supply chain

Only connect to MCP servers you can review or whose publisher you trust; third-party registries vary in vetting rigour.
Audit the tool description text of every MCP server before connecting — malicious servers embed instructions in descriptions to hijack model behavior (prompt injection via tool metadata).
Pin package versions with lock files and verify checksums; the first malicious MCP package appeared in September 2025.
Prefer OAuth 2.1 + PKCE for remote MCP server auth (mandatory for HTTP transport per the June 2025 MCP spec); use per-agent client registrations, not shared credentials.
Scope OAuth tokens to the minimum required tools; use per-tool scopes where the server supports them.
Revoke server access when the agent task is complete.

7. Output and action sandboxing; human-in-the-loop gates

Define a pre-flight checklist of action categories that always require human approval: financial transfers, sending external communications, deleting data, provisioning infrastructure (Anthropic trustworthy-agents framework; NIST AI RMF GOVERN function).
Sandbox code execution and file operations in isolated environments with no network egress by default.
Implement a rollback or undo path for every reversible action the agent takes.
Emit a structured pre-action summary to the user before high-stakes tool calls and wait for acknowledgement.

8. Memory and context poisoning

Validate retrieved memories or RAG results against a known-good schema before injecting into context.
Treat vector-store content with the same distrust as external web content — it may have been poisoned at ingestion time (MITRE ATLAS AML.T0020 Poison Training Data, which covers fine-tuning and RAG data sources).
Separate short-term working memory from long-term persistent storage; apply stricter validation before promoting content to persistent memory.
Periodically audit long-term memory stores for injected instructions.

9. Data exfiltration channels

Restrict agent network egress to a known allowlist of destinations; deny by default.
Inspect outbound tool-call arguments for PII and secrets before execution; block calls that reference data outside their expected domain.
Do not let the agent construct arbitrary URLs or shell commands from user-supplied input without sanitization.
Monitor for unusual data volumes or frequencies in tool outputs that may indicate exfiltration via covert channels.

10. Auth and OAuth scopes

Request OAuth scopes at the minimum granularity needed for each specific task; do not request broad scopes for future convenience.
Use separate OAuth clients (and credentials) per agent instance; never share a client ID across agent instances.
Implement token refresh and revocation; treat access tokens as ephemeral.
Verify that the OAuth resource indicator (RFC 8707) matches the intended MCP server to prevent token mis-redemption attacks.

11. Logging and auditability

Assign a unique trace ID to every agent run; propagate it through all tool calls and sub-agent invocations.
Log: timestamp, trace ID, tool name, full arguments, full response, latency, and outcome for every tool call.
Store logs in an append-only, tamper-evident store; agents must not be able to delete their own logs.
Alert on anomalous patterns: high tool-call rates, calls to unexpected endpoints, or sudden changes in action type distribution (Google SAIF — Monitor and Respond principle).
Retain logs long enough to support incident reconstruction; NIST AI RMF MANAGE function recommends documented response plans.

Verified sources

OWASP Top 10 for Agentic Applications (2026): https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
OWASP LLM01:2025 Prompt Injection: https://genai.owasp.org/llmrisk/llm01-prompt-injection/
Anthropic — Our framework for developing safe and trustworthy agents: https://www.anthropic.com/news/our-framework-for-developing-safe-and-trustworthy-agents
OpenAI — Safety in building agents: https://platform.openai.com/docs/guides/agent-builder-safety
OpenAI — A practical guide to building agents: https://openai.com/business/guides-and-resources/a-practical-guide-to-building-ai-agents/
Google SAIF (Secure AI Framework): https://saif.google/secure-ai-framework
MITRE ATLAS (adversarial threat landscape for AI): https://atlas.mitre.org/
NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework

#security #agents #prompt-injection #mcp #checklist #owasp

Category: Guide