# Sandboxed AI Agent Execution

> Source: https://ibl.ai/resources/capabilities/sandboxed-agent-execution


*Run real code in fully isolated containers — Python, R, shell, SQL — with enterprise-grade security, audit trails, and zero risk to your host infrastructure.*

AI agents that can only generate text are half an agent. ibl.ai enables agents to execute real code — Python scripts, R models, shell commands, SQL queries — inside fully isolated containers that are completely separated from your host systems.

Built on the enterprise-hardened OpenClaw framework, ibl.ai's sandboxed execution environment gives agents the computational power to solve real problems: running data pipelines, installing packages, querying databases, and automating workflows — all within strict security boundaries.

With three layered security models — NanoClaw, IronClaw, and OpenClaw application-level controls — organizations get defense-in-depth isolation that satisfies compliance requirements without sacrificing agent capability. Trusted by 400+ organizations and 1.6M+ users in production.

## The Challenge

Most enterprise AI deployments hit a hard ceiling: the agent can reason about a problem but cannot act on it. Generating a Python script is not the same as running one. Without sandboxed execution, agents are advisory tools — they describe what should be done but cannot do it, forcing humans back into the loop for every computational task.

Worse, organizations that attempt to give agents direct system access face catastrophic risk. An agent with unrestricted access to a host environment can overwrite files, exfiltrate data, exhaust resources, or trigger cascading failures. Without container isolation, network restrictions, resource limits, and audit trails, real code execution is an unacceptable security liability — leaving enterprises stuck between capability and safety.

## How It Works

1. **Agent Task Is Received and Scoped:** The OpenClaw Brain receives a task via any of 12+ supported channels — Slack, Teams, API, or scheduled Heartbeat trigger. The ReAct reasoning loop determines that code execution is required and selects the appropriate skill from 5,700+ available plugins.
2. **Isolated Container Is Provisioned:** A fresh, ephemeral Linux container is spun up for the agent session. Under NanoClaw, OS-level isolation is enforced in ~500 lines of auditable code. Under IronClaw, five independent security layers activate: network restrictions, request filtering, credential isolation, WASM sandboxing, and Docker containment.
3. **Code Executes Inside the Sandbox:** The agent executes Python, R, shell scripts, or SQL queries inside the container. It can install packages, read and write files within its scoped file system, query databases, and browse the web — all within defined permission boundaries and resource limits.
4. **Results Are Captured and Persisted:** Execution outputs — stdout, stderr, generated files, query results — are captured and written to the agent's persistent memory layer (Markdown files + SQLite vector/keyword search). Results are available for subsequent reasoning steps or downstream agent tasks.
5. **Audit Trail Is Written:** Every code execution event, file access, network call, and permission check is logged to an immutable audit trail. Logs include timestamps, agent identity, skill invoked, resource consumption, and execution outcome — satisfying compliance and forensic requirements.
6. **Container Is Destroyed and Resources Released:** On task completion or timeout, the container is torn down and all ephemeral state is purged. Only explicitly persisted outputs survive. Host systems remain completely unaffected, and resource limits are enforced throughout the lifecycle.

## Features

### Multi-Language Code Execution

Agents execute Python, R, shell scripts, SQL, and more inside sandboxed containers. Custom packages can be installed per-session without affecting host environments or other agent workloads.

### Three-Tier Defense-in-Depth Security

NanoClaw provides OS-level container isolation in ~500 lines of auditable code. IronClaw adds five independent security layers: network restrictions, request filtering, credential isolation, WASM sandboxing, and Docker containment. OpenClaw enforces application-level per-user and per-skill permission checks.

### Immutable Audit Trails

Every agent action inside the sandbox — code executed, files accessed, network calls made, packages installed — is logged with full context. Audit logs support compliance reporting, incident investigation, and policy enforcement.

### Resource Limits and Quotas

CPU, memory, disk I/O, and network bandwidth limits are enforced at the container level. Runaway processes are automatically terminated. Resource quotas can be configured per agent, per skill, or per organizational unit.

### Persistent Memory Across Sessions

Unlike stateless sandboxes, ibl.ai agents retain execution outputs, generated files, and learned context across sessions via Markdown-based persistent memory and SQLite vector search — enabling multi-step, long-horizon workflows.

### Self-Hosted on Any Infrastructure

Sandboxed execution runs on your infrastructure — on-premises, private cloud, AWS, GCP, or Azure. No data leaves your environment. Fully compatible with air-gapped deployments for defense and government use cases.

### Model-Agnostic Execution

The sandbox layer is decoupled from the LLM. Organizations can run sandboxed agents on GPT-4, Claude, Gemini, Llama, or any custom model — switching or mixing models without changing execution infrastructure.

## With vs. Without

| Aspect | Without | With |
|--------|---------|------|
| Code Execution Capability | Agents generate code as text output; a human must copy, paste, and run it manually | Agents execute Python, R, shell, and SQL directly inside sandboxed containers with results returned to the agent in real time |
| Host System Security | Granting agents system access exposes host file systems, credentials, and network interfaces to agent errors or prompt injection attacks | Container isolation, network restrictions, and credential separation ensure zero host exposure regardless of agent behavior inside the sandbox |
| Compliance and Audit | No structured record of what code ran, what data was accessed, or what outputs were produced — failing audit requirements in regulated industries | Immutable, structured audit trails capture every execution event with full context, satisfying SOC 2, HIPAA, FedRAMP, and internal governance requirements |
| Package and Dependency Management | Agents cannot install packages; they are limited to pre-installed libraries, blocking specialized analytical or scientific workflows | Agents install any required packages (pip, conda, apt) inside ephemeral containers without affecting host environments or other agent sessions |
| Resource Safety | Runaway agent processes can exhaust host CPU, memory, or disk — destabilizing shared infrastructure and generating unbounded cloud costs | Hard resource limits (CPU, RAM, disk, timeout) are enforced at the container level; runaway processes are automatically terminated |
| Memory and State Persistence | Each execution is stateless; agents cannot build on prior results, forcing repetitive re-computation and losing analytical context between sessions | Execution outputs are persisted to the agent's memory layer (Markdown + SQLite) and available for subsequent reasoning steps and future sessions |
| Infrastructure Flexibility | Vendor-hosted sandboxes (GPTs, Gems) lock execution to the vendor's cloud, blocking on-premises, air-gapped, or sovereign cloud deployments | Sandboxed execution runs on any infrastructure — on-premises, private cloud, AWS, GCP, Azure, or air-gapped environments — with full data sovereignty |

## FAQ

**Q: What programming languages can AI agents execute inside the sandbox?**

ibl.ai sandboxed agents support Python, R, Bash and shell scripting, SQL (PostgreSQL, MySQL, SQLite), and Node.js. Agents can install additional packages and runtimes inside the container using pip, conda, apt, or npm — scoped to the container lifetime without affecting host systems.

**Q: How does ibl.ai prevent a sandboxed agent from accessing host systems or other tenants' data?**

ibl.ai uses a defense-in-depth architecture with three security models. NanoClaw provides OS-level Linux container isolation. IronClaw adds five independent layers: network isolation, HTTP request filtering, credential vault separation, WASM sandboxing, and Docker containment. OpenClaw enforces application-level per-user and per-skill permission checks. Compromise of any single layer does not expose the host.

**Q: Can sandboxed agent execution be deployed on-premises or in an air-gapped environment?**

Yes. ibl.ai's sandboxed execution infrastructure runs on any environment — on-premises servers, private cloud, AWS, GCP, Azure, or fully air-gapped networks. No data is sent to ibl.ai or any external service. This makes it suitable for defense, government, and regulated enterprise deployments with strict data sovereignty requirements.

**Q: What audit trail is generated when an agent executes code in the sandbox?**

Every execution event is logged: the code run, files accessed, network calls made, packages installed, resource consumption, execution duration, agent identity, and outcome. Logs are structured as JSON events compatible with SIEM platforms including Splunk, Elastic, and Datadog. Audit logs can be exported in formats aligned with SOC 2, HIPAA, and FedRAMP reporting requirements.

**Q: How does ibl.ai's sandboxed execution compare to the code interpreter in ChatGPT or Google Gemini?**

ibl.ai offers significantly broader capability. Unlike GPT or Gemini sandboxes, ibl.ai agents support persistent memory across sessions, can install arbitrary packages, run on your own infrastructure, work with any LLM model, and integrate with 5,700+ OpenClaw skills. Security is enforced via three independent models (NanoClaw, IronClaw, OpenClaw) rather than a single vendor-controlled boundary.

**Q: What happens if a sandboxed agent process runs out of control or exceeds resource limits?**

Hard resource limits — CPU quotas, RAM caps, disk write limits, and execution timeouts — are enforced at the container level. If an agent process exceeds any limit, it is automatically terminated. The container is then destroyed and resources are released. Host systems and other agent sessions are completely unaffected.

**Q: Can agents retain the results of sandboxed code execution for use in future sessions?**

Yes. ibl.ai agents use a persistent memory layer — Markdown files combined with SQLite vector and keyword search — to store execution outputs, generated files, and analytical results. This state persists across sessions, enabling agents to build on prior work and execute long-horizon, multi-step workflows without re-computing from scratch.

**Q: Which security model should an organization choose — NanoClaw, IronClaw, or OpenClaw?**

The choice depends on risk profile and operational requirements. NanoClaw is ideal for organizations that need lightweight, auditable OS-level isolation with minimal overhead. IronClaw is designed for high-security environments — defense, finance, healthcare — requiring five independent security layers. OpenClaw application-level controls complement either and are always active, enforcing per-user and per-skill permissions regardless of the underlying isolation model.