Agent tool contract
Agent integrations should expose mvm as a narrow tool, not as ambient shell
access. The application owns validation, policy selection, redaction, and
cleanup. The microVM owns guest execution.
Use this contract when an LLM, coding agent, or workflow engine needs to run generated code, inspect files, call a command, or preserve recoverable state.
Tool boundary
Section titled “Tool boundary”| Tool action | App responsibility | Sandbox responsibility |
|---|---|---|
| Create sandbox | Choose image, policy, TTL, resource limits, and owner. | Boot the selected microVM artifact. |
| Write input files | Validate paths, size, content type, and retention. | Receive files under guest paths. |
| Run command | Validate argv, timeout, working dir, env refs, and output budget. | Execute inside the guest policy boundary. |
| Read output files | Restrict paths and size; scan or redact content before model use. | Return requested bytes from guest storage. |
| Return logs/results | Redact secrets and user data; attach audit/run IDs. | Emit command status, logs, and receipts where available. |
| Preserve state | Decide pause, cold, snapshot, or volume retention. | Save backend-specific state. |
| Cleanup | Stop, destroy, lock volumes, delete snapshots, or keep with explicit TTL. | Release compute and retained state according to command. |
Request schema
Section titled “Request schema”Keep the model-facing request small and typed. Do not let the model pass raw host paths, arbitrary environment variables, or host network bindings.
{ "language": "python", "files": [ { "path": "/work/main.py", "content": "print('hello')" } ], "command": ["python", "/work/main.py"], "timeout_seconds": 20, "network": { "mode": "none" }, "state": { "retention": "destroy" }}Validation rules:
- reject absolute host paths and path traversal;
- bound file count, file size, stdout, stderr, and runtime;
- allow only known image or flake targets;
- require deny-by-default network unless the caller has a policy grant;
- require secret references, not literal secret values;
- require an explicit retention choice: destroy, stop, cold, or snapshot.
Current CLI-backed implementation
Section titled “Current CLI-backed implementation”Today the broadest shipped lifecycle surface is the CLI. A tool runner can use the CLI while preserving the same contract the SDK target should expose.
mvmctl build ./agent-toolmvmctl up ./agent-tool --name agent-tool-callmvmctl fs write agent-tool-call /work/main.py < /tmp/request-main.pymvmctl exec agent-tool-call --timeout 20 -- python /work/main.pymvmctl logs agent-tool-callmvmctl down agent-tool-callFor one-off command execution, prefer a single bounded run when the workflow does not need staged files or persistent state:
mvmctl run --timeout 20 -- python -c 'print("bounded tool call")'Keep the sandbox name, command exit status, receipt path, and audit/run IDs with the agent trace when those values are available.
Runtime SDK target
Section titled “Runtime SDK target”The SDK target should make the same contract easier to write without hiding security decisions:
from mvm import NetworkPolicy, Sandbox
def run_agent_tool(request: dict) -> dict: checked = validate_request(request)
with Sandbox.create( image=checked["image"], network=NetworkPolicy.deny_by_default(), ttl_seconds=checked["ttl_seconds"], ) as sandbox: for item in checked["files"]: sandbox.files.write(item["path"], item["content"])
result = sandbox.commands.run( checked["command"], timeout_seconds=checked["timeout_seconds"], max_output_bytes=checked["max_output_bytes"], )
return { "exit_code": result.exit_code, "stdout": redact(result.stdout), "stderr": redact(result.stderr), "audit_id": result.audit_id, }import { NetworkPolicy, Sandbox } from "@mvm/sdk";
async function runAgentTool(request: Record<string, unknown>) { const checked = validateRequest(request);
using sandbox = await Sandbox.create({ image: checked.image, network: NetworkPolicy.denyByDefault(), ttlSeconds: checked.ttlSeconds, });
for (const item of checked.files) { await sandbox.files.write(item.path, item.content); }
const result = await sandbox.commands.run(checked.command, { timeoutSeconds: checked.timeoutSeconds, maxOutputBytes: checked.maxOutputBytes, });
return { exit_code: result.exitCode, stdout: redact(result.stdout), stderr: redact(result.stderr), audit_id: result.auditId, };}Check Operations cookbook and Lifecycle matrix before treating a helper as shipped in a language SDK.
Network policy
Section titled “Network policy”Start closed:
{ "network": { "mode": "none" }}If the agent needs outbound access, issue a separate reviewed grant:
{ "network": { "mode": "bridge", "allow": [ { "host": "api.openai.com", "port": 443 } ] }}Do not let the model invent network destinations. The application should map approved tool capabilities to concrete policy, then pass only that policy to the sandbox.
Secrets
Section titled “Secrets”Secrets should enter through references controlled by the application or operator, not through model output.
{ "secrets": { "OPENAI_API_KEY": { "ref": "openai-api-key" } }}Security rules:
- never pass secrets in command args;
- do not echo resolved secrets back to the model;
- redact stdout, stderr, logs, error messages, and receipts before returning them to a model context;
- grant secrets per operation, not per long-lived agent identity.
State retention
Section titled “State retention”Choose one retention behavior per tool call:
| Retention | Use when | Cleanup requirement |
|---|---|---|
destroy | The call is disposable. | Stop compute and remove retained local state. |
stop | Build artifacts can remain, but compute should stop. | Review logs, receipts, volumes, and snapshots separately. |
cold | The next call needs memory or filesystem continuity. | Treat cold state as sensitive and set a TTL. |
snapshot | A reviewer or retry path needs a named recovery point. | Store retention metadata and delete when no longer needed. |
Cold state and snapshots can contain prompts, tool outputs, files, process memory, browser sessions, and credentials. Treat them as sensitive artifacts.
Response schema
Section titled “Response schema”Return bounded, structured results to the model:
{ "sandbox_id": "agent-tool-call", "exit_code": 0, "stdout": "redacted output", "stderr": "", "timed_out": false, "audit_id": "run_...", "retention": "destroy"}If the run fails, distinguish policy denial, validation failure, timeout, guest command failure, transport failure, and cleanup failure. They require different retries and different user-facing messages.