ChangeGamer

← All resources

Code Execution Sandboxing for Agents

Guide · updated 2026-06-15 · Markdown variant

Isolation spectrum from language sandboxes to microVMs, WebAssembly as a portable sandbox, and a verified comparison of hosted agent-sandbox APIs — for agents that need to run model-generated code safely.


Running model-generated code is arbitrary code execution. Without isolation, a single malicious or buggy output can read host secrets, exfiltrate data, pivot to other tenants, or destroy infrastructure. Sandboxing is not optional for production agents that execute generated code or computer-use actions. See also: /resources/agentic-security-checklist (§7 — output and action sandboxing) and /resources/computer-use-browser-automation.

The isolation spectrum

Weakest to strongest. Each layer adds real isolation at the cost of startup time, resource overhead, or surface complexity.

1. In-process language sandboxes

What they are: code is restricted at the language level before it runs — no separate process, no OS boundary.

Isolation strength: weak. Language sandboxes have no OS-level boundary. They are defeated by native extensions, JIT vulnerabilities, or overlooked builtins. Use only for very-low-risk inputs or as a first filter in a layered stack.

2. OS containers (Docker / standard runtimes)

What they are: Linux namespaces (PID, net, mount, UTS, IPC) plus cgroups isolate a process from the host filesystem and network. Docker is the dominant packaging and runtime implementation.

Isolation strength: moderate. Containers share the host kernel. A kernel exploit inside a container can escape to the host. For internal developer tooling or low-privilege workloads this is often acceptable, but standard Docker is NOT a strong security boundary against untrusted or adversarially generated code. The attack surface is the entire Linux kernel syscall table.

Common misconception: "We run it in Docker so it is safe." A container restricts what the workload can see, not what kernel vulnerabilities it can trigger. Isolate untrusted code at a higher level.

3. Hardened container runtimes

What they are: drop-in replacements for the container runtime that add a security layer between the container and the host kernel — without requiring full VM provisioning.

Isolation strength: strong. Kernel attack surface is either eliminated (gVisor) or reduced to the hypervisor boundary (Kata). Startup overhead: gVisor adds milliseconds; Kata adds a VM boot (100–300 ms depending on hypervisor).

4. MicroVMs

What they are: purpose-built VMMs (Virtual Machine Monitors) that boot a minimal Linux kernel in a hardware-virtualized VM in under 200 ms, with a minimal device model and small memory footprint.

Isolation strength: very strong. Hardware virtualization boundary separates each workload's kernel from the host. Industry-standard for multi-tenant serverless infrastructure.

5. Full VMs

Conventional VMs (KVM/QEMU, Hyper-V, VMware) provide the strongest isolation at the cost of the highest startup time (seconds to minutes) and resource overhead. Rarely the right choice for agent code execution where fast ephemeral sandboxes are needed; microVMs deliver equivalent security with orders-of-magnitude better latency.

WebAssembly as a portable sandbox

WebAssembly (WASM) modules run in a capability-based, deny-by-default sandbox. A module cannot access memory outside its own linear memory, cannot make syscalls directly, and cannot use network or filesystem unless the host explicitly grants those capabilities.

WASM's limitation: compiled languages (Rust, C, Go) map well to WASM; Python via Pyodide is usable for many agent tasks but performance-sensitive workloads or native C extensions without precompiled wheels may not run. For general agent code execution, microVM-backed APIs (below) are typically the right default.

Hosted agent-sandbox APIs

For agents that need to execute code without managing isolation infrastructure, these APIs provide sandboxed environments callable from agent code. All are web-verified as of June 2026.

Product Isolation Cold start GPU Key differentiator
E2B (e2b.dev) Firecracker microVM ~150 ms No Agent-first SDK; Python/JS; MCP integration; free tier; production references (Perplexity, Manus)
Modal (modal.com) gVisor containers Fast Yes (A100, H100) 50k–100k concurrent sandboxes; GPU access; Lovable and Quora in production
Daytona (daytona.io) gVisor <90 ms No Open-source; stateful persistent workspace; sub-90 ms via pre-warmed pools; $24M Series A Feb 2026
Cloudflare Sandbox (developers.cloudflare.com/sandbox) Containers (GA Apr 2026) + Dynamic Workers V8 isolates (beta) ms (isolates) / container No Two-tier: full Linux containers via Sandbox SDK + isolate-based Dynamic Workers (100× faster); edge-distributed
Northflank (northflank.com) Kata Containers (Cloud Hypervisor) + gVisor No Only platform offering both Kata and gVisor; BYOC (AWS/GCP/Azure); unlimited sessions; 2M+ isolated workloads/month
Vercel Sandbox (vercel.com/sandbox) Firecracker microVM No Beta; free; 45 min–5 hr session cap; backed by Vercel Fluid compute; open-source Open Agents stack

Provider built-ins

Hardening checklist

Regardless of the isolation layer, apply these controls at the harness level:

Verified sources

#sandboxing #security #code-execution #microvm #wasm #agents #isolation #firecracker #gvisor

Category: Guide