Why, and how you need to sandbox AI-Generated Code? — Harshil Agrawal, Cloudflare

Key Concepts

Untrusted Code: Treating LLM-generated code as potentially malicious or buggy, regardless of its source.
Capability-Based Security: A security model based on "default deny," where code is granted only the specific, minimal permissions required to function.
V8 Isolates: Lightweight, fast, sandboxed JavaScript/TypeScript/Python execution environments (e.g., Cloudflare Workers).
Containers: Full Linux environments providing file systems, process management, and networking for complex tasks.
Prompt Injection: An attack vector where adversarial input manipulates an LLM into executing unauthorized actions.
Proxy Pattern: A method to handle secrets by keeping them in the host environment and proxying requests through a secure gateway rather than passing keys into the sandbox.

1. The Threat Model of AI-Generated Code

The speaker argues that we are currently running untrusted code from the internet without sufficient oversight. Three primary threat scenarios are identified:

Hallucination: The model generates syntactically correct but logically flawed code (e.g., infinite loops, bad imports, or missing base cases), which can crash production systems or exhaust compute resources.
The "Over-helpful" LLM: The model attempts to be helpful by accessing sensitive environment variables, API keys, or database credentials to "configure" a task, inadvertently exposing secrets.
Compromised Prompts:
- Direct Injection: A user explicitly tells the LLM to ignore instructions and exfiltrate data.
- Indirect Injection: The LLM processes an external document or webpage containing hidden, adversarial instructions.

Key Argument: AI agents operate with the developer's full production privileges (file system, network, database). If the code is compromised or flawed, it has the "keys to the kingdom."

2. Framework: Capability-Based Security

The speaker advocates for Capability-Based Security over traditional block-listing.

Block-list (Ineffective): Trying to anticipate every dangerous system call or API.
Allow-list (Recommended): Default deny everything. Explicitly grant only the specific capabilities (e.g., a specific database query method) required for the task.

3. Sandboxing Methodologies

The speaker presents two primary solutions based on the complexity of the task:

A. V8 Isolates (The "Fast Brain")

Use Case: Quick functions, tool calls, plugins, and data transformation.
Characteristics: Sub-millisecond startup, no file system, no process model, stateless.
Implementation: Uses dynamic worker isolates. Network access is set to null by default. Bindings are used to pass only necessary interfaces (e.g., a restricted database query method).

B. Containers (The "Workbench")

Use Case: Building/deploying apps, cloning repositories, installing npm packages, running dev servers.
Characteristics: Full Linux environment, real file system, process management, networking.
Implementation: Managed via a Durable Object (stateful coordinator) that orchestrates the container lifecycle.

4. Practical Patterns for Production

User Isolation: Always maintain a 1:1 ratio between users and sandboxes. Never share environments, as this creates a data leak vector.
The Proxy Pattern: Never pass API keys or secrets into the sandbox. Instead, have the sandbox request a proxy endpoint on your server, which then attaches the secret and forwards the request to the external service.
Cleanup: Use try...finally blocks to ensure containers are destroyed immediately after use to prevent resource waste and reduce the security surface area.
Resource Limits: Enforce strict timeouts (e.g., 10 minutes) and memory/CPU caps to prevent Denial of Service (DoS) via infinite loops.

5. Decision Tree for Implementation

Note: The speaker suggests using both in tandem—Isolates for the "thinking/tool-calling" loop and Containers for the "building/deployment" phase.

6. Universal Checklist for AI Sandboxing

Default Deny Network: Block all outbound traffic unless explicitly required.
Grant Minimal Capabilities: Only provide what is strictly necessary.
Isolate Per User: One sandbox per user, no exceptions.
Set Resource Limits: Cap CPU, memory, and execution time.
Keep Secrets Outside: Use the proxy pattern.
Cleanup: Destroy sandboxes immediately after use.
Log Everything: Maintain an audit trail of what code ran and when.
Validate Input: Perform basic syntax and security checks before execution.

Synthesis

The core takeaway is that AI-generated code is untrusted code. Developers must stop treating LLMs as "magic" and start treating their output with the same security rigor applied to third-party code. By implementing a strict capability-based security model and choosing the appropriate sandbox (Isolates vs. Containers), developers can leverage the productivity gains of AI without compromising their production infrastructure. As the speaker notes: "The cost of an extra sandbox is always less than the cost of a data leak."