Skip to content

The Matryoshka model: how mvm isolates untrusted code

mvm’s job is to let you run untrusted code — third-party software, AI-generated scripts, CI runners, sandbox workloads — and trust the isolation. This page explains the security model in one diagram and one matrix.

┌───────────────────────────────────────────────────────────┐
│ L5 — Workload (your untrusted code) │
├───────────────────────────────────────────────────────────┤
│ L4 — Guest agent (parses host messages, launches code) │
├───────────────────────────────────────────────────────────┤
│ L3 — Guest kernel (Linux, ephemeral, isolated) │
├───────────────────────────────────────────────────────────┤
│ L2 — VMM (Firecracker, Rust, seccomp-jailed) │
├───────────────────────────────────────────────────────────┤
│ L1 — Host + hypervisor (KVM / Apple VZ / HVF) │
└───────────────────────────────────────────────────────────┘

Each layer trusts only the layer below it. An attacker has to break through every boundary above to reach the host. A failure in any one layer is bounded — the layer below still enforces its own contract.

This pattern (sometimes called the matryoshka model after the nested Russian dolls) is the same defense-in-depth used across the production microVM / hardened-isolation ecosystem. mvm’s adaptation is that L5 is enforced inside the guest — even a guest-kernel compromise doesn’t give arbitrary access to other in-guest services. See ADR-002 for the full decision record.

mvm makes seven CI-enforced security claims. Each one is backed by a continuous-integration check that fails the build if the claim ceases to hold.

#ClaimDefends layerHow it’s enforced
1No host-fs access from a guest beyond explicit sharesL2 / L5Per-service uid + seccomp standard default + setpriv bounding-set drop
2No guest binary can elevate to uid 0L2 / L4setpriv --no-new-privs in launch path; /etc/{passwd,group} are read-only bind-mounts
3A tampered rootfs ext4 fails to bootL3dm-verity sidecar + roothash on cmdline + mvm-verity-init initramfs
4The guest agent does not contain do_exec in production buildsL4CI symbol-grep on the prod binary; absence is enforced
5Vsock framing is fuzzedL2 / L4cargo-fuzz targets cover every host↔guest message; deny_unknown_fields on every type
6Pre-built dev image is hash-verifiedsupply chainSHA-256 manifest streamed through the download
7Cargo deps are audited on every PRsupply chaincargo-deny + cargo-audit jobs; reproducibility double-build

L1 (host + hypervisor) doesn’t carry its own claim — the host is trusted by definition. If your host is compromised, every layer falls. Locking down the host (firewall, package hygiene, full-disk encryption) is your responsibility.

Every workload also goes through a signed admission step before boot. mvmctl up synthesizes an ExecutionPlan, signs it with the host key, checks its validity window and replay nonce, then emits a chain-signed audit entry.

The plan now carries an admission_profile: a compact record of the workload’s declared intent and the controls selected for that intent:

  • intent, for example vm:boot, code:execute, or agent:web-research
  • seccomp tier selected for the run
  • network, filesystem, egress, and tool policy refs
  • secret-release posture (none, plan-bound, or attestation-bound)
  • audit taxonomy and required labels

This does not add a second seccomp implementation or new execution capability inside the sandbox. Runtime syscall filtering still comes from mvm-security and the guest seccomp.json manifest. The admission profile records the selected tier in the signed plan so the audit chain can prove which security posture the workload was admitted under.

mvm runs on multiple backends. Not all backends carry all seven claims. The tier you actually get depends on which backend mvm picks for your run.

BackendL1L2L3L4L5Tier
Firecracker (Linux + KVM)Tier 1 — full ADR-002. All seven claims hold.
Apple Container (macOS 26+ Apple Silicon)⚠️Tier 2 — claim 3 (verified boot) is partial. Other six claims hold.
libkrun (Linux KVM, macOS Apple Silicon HVF)⚠️Tier 2 — same as Apple Container.
Docker (any host with Docker)Tier 3 — claims 1, 2, 3 do not hold. L1–L3 collapse to the host kernel.
microvm.nix (QEMU + KVM)⚠️⚠️Tier 2 — QEMU’s larger device model raises L2 audit cost.

✅ = layer fully enforced. ⚠️ = layer partial (named exception). ❌ = layer collapsed (claim does not apply).

Tier 3 (Docker) is convenience, not isolation

Section titled “Tier 3 (Docker) is convenience, not isolation”

mvm’s Docker backend exists so you can run a workload in a non-virt environment (e.g., a CI host without /dev/kvm, a developer laptop without nested virt). It’s not a microVM. The isolation comes from the Linux kernel’s namespace and cgroup machinery, which is shared with the host kernel.

In 2024–2025 the container ecosystem produced seven CVEs (Leaky Vessels, NVIDIAScape, runc race conditions, Buildah mount, Docker Desktop priv-esc, runc masked-path, runc /dev/console) that all yielded host escape from inside a container. None of those matter inside a microVM — the guest kernel is isolated by hardware. They all matter inside a Docker container.

If mvmctl auto-selects Tier 3 because no microVM-capable backend is available, the CLI prints a banner naming the dropped claims and the recent CVEs. You can suppress the banner once you’ve acknowledged the tier with:

Terminal window
export MVM_ACK_DOCKER_TIER=1

or in ~/.mvm/config.toml:

[security]
ack_docker_tier = true
  • Production / untrusted code → Tier 1. Linux + KVM + Firecracker. No exceptions.
  • macOS dev or CI on Apple Silicon → Tier 2 (Apple Container or libkrun). Verified boot is the open item.
  • macOS Intel / native Windows / WSL2 → unsupported for local microVM isolation today. WSL2 nested KVM and Hyper-V managed Linux builder support are future backend work.
  • Anywhere else → Tier 3 (Docker), with the banner caveats.

mvmctl doctor reports your current tier on the running host.

ADR-002 names three explicit non-goals so we don’t accidentally commit to defending against them:

  • A malicious host. mvm trusts the host with the hypervisor and the build keys. If your laptop or your server is compromised, every layer falls.
  • Multi-tenant guests. One guest = one workload. Sharing a single guest VM between mutually-distrusting tenants is out of scope.
  • Hardware-backed key attestation (TPM/SEV/etc.) is out of scope for v1.

If your threat model needs any of those, mvm is not the right tool today. ADR-002 documents these limits explicitly.