Skip to content

Building MicroVM Images

mvm is a library, not a project to fork. You keep your code, your flake.nix, and your mvm.toml in your own repository, and mvmctl builds your microVM image by running nix build against your flake. You should never need to edit anything inside the mvm repository.

Under the hood, mvm wraps microvm.nix (MIT) — that’s the NixOS module that abstracts Firecracker, Cloud Hypervisor, QEMU, crosvm, kvmtool, and stratovirt. The choice is recorded in ADR-013.

Every mvm project has a mvm.toml and a flake.nix:

my-app/mvm.toml
flake = "."
profile = "default"
vcpus = 1
memory_mib = 256
my-app/flake.nix
{
description = "my microVM app";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.11";
mvm.url = "github:tinylabscom/mvm";
};
outputs = { self, nixpkgs, mvm, ... }: {
packages.x86_64-linux.default = mvm.lib.x86_64-linux.mkGuest {
name = "my-app";
services.web = {
command = [ "/usr/local/bin/web" ];
};
};
};
}

That’s the whole user-side surface. mvmctl build reads mvm.toml, follows flake = "." to your flake, and runs nix build against it.

From your project directory:

Terminal window
mvmctl build # reads mvm.toml; builds the named flake target
mvmctl run # builds (if needed) + boots

mvmctl build is a host command. You run it from macOS or Linux, and mvm sends the Linux-only Nix work into the builder VM. You do not need to enter mvmctl dev shell first. The shell is for debugging the build environment manually, not a prerequisite for normal builds.

mvmctl selects the runtime backend automatically when you boot the finished image. Use --hypervisor on runtime commands when you want to force a specific runtime backend:

Terminal window
mvmctl up --flake . --hypervisor apple-container
mvmctl up --flake . --hypervisor firecracker

If you want to drive nix build directly without mvmctl in the loop:

Terminal window
nix build .#default

That direct Nix command is only for users who intentionally manage their own Nix environment. It bypasses mvm’s builder VM orchestration and is not required for the normal workflow. See Builder VM for the detailed build boundary.

mvm.lib.<system>.mkGuest { … } takes a single attribute set:

FieldTypePurpose
namestringHuman-readable identifier; baked into the rootfs at /etc/mvm/name.
entrypointattrsThe boot-time workload. Exactly one of three forms (see below).
servicesattrs (optional)Auxiliary supervised services. Same shape as entrypoint.services.
packages[pkg] (optional)Extra Nix packages added to the rootfs closure.
hypervisorstring (optional)Override the default (firecracker).
vcpus, memory_mibint (optional)Resource defaults; mvm.toml overrides at run time.
devbool (optional)Explicit accessible-vs-sealed override. Inferred from entrypoint by default.
uidsattrs (optional)`{ agent = 990; entrypoint = 0
extraFilesattrs (optional){ "/abs/path" = { content; mode?; }; } baked into the rootfs at build time.

entrypoint declares exactly one of:

# Form 1 — interactive PTY shell (accessible image, dev-friendly)
entrypoint.shell = "/bin/bash";
# Form 2 — single sealed program (production default)
entrypoint.command = [ "/usr/local/bin/serve" "--port" "8080" ];
# Form 3 — supervised multi-service
entrypoint.services = {
web = { command = [ "/bin/web" ]; };
worker = { command = [ "/bin/worker" ]; restart = "always"; };
};

Attached vs detached — lifecycle of the running VM

Section titled “Attached vs detached — lifecycle of the running VM”

Independent of the sealed/accessible distinction, mvm exposes two runtime lifecycle modes modeled after libkrun’s SpawnMode:

ModeWhat it meansWhen to use
attachedVM lifecycle bound to the calling process — Ctrl-C / process exit sends SIGTERM to the VM.mvmctl run interactive, mvmctl dev shell sessions, test harnesses that want deterministic teardown.
detachedVM survives caller exit — only mvmctl down (or VmBackend::stop) terminates it.mvmctl up (background), production agents, CI fixtures that boot once and run multiple phases.

The default is detached. Override:

Terminal window
mvmctl run --attached # attached mode; CLI Ctrl-C kills VM
mvmctl run # detached mode (default); VM keeps running
mvmctl detach my-app # convert a running attached VM to detached
mvmctl wait my-app # block until VM exits (only meaningful for attached)

The lifecycle mode is orthogonal to the sealed/accessible distinction:

CombinationUse case
accessible + attachedDev-mode debug session: entrypoint.shell, Ctrl-C ends the session.
accessible + detachedLong-running dev container: shell available, survives reconnect.
sealed + attachedTest harness running an entrypoint to completion, exit captured.
sealed + detachedProduction: entrypoint.command, runs forever until mvmctl down.

The trait surface lives at mvm_core::vm_backend::{StartMode, VmBackend::start_with_mode, VmBackend::wait, VmBackend::detach}. The libkrun backend records StartMode intent at ~/.mvm/vms/<name>/mode.json; mvmctl status surfaces it.

Sealed vs accessible — the same flake works for both

Section titled “Sealed vs accessible — the same flake works for both”

The mvm builder transparently determines whether the resulting image is sealed (production — no console attach) or accessible (dev — mvmctl console <vm> opens an interactive PTY over vsock). The decision is encoded in passthru.mvm.{accessible, sealed, entrypointKind} on the resulting derivation, and mvmctl reads that metadata to gate the console subcommand.

The default inference:

Entrypoint formDefault mode
entrypoint.shell = …accessible (dev = true)
entrypoint.command = …sealed (dev = false)
entrypoint.services = …sealed (dev = false)

Override either way with the explicit dev field:

# A shell entrypoint that's still sealed (no console attach allowed)
mkGuest { entrypoint.shell = "/bin/bash"; dev = false; ... }
# A command entrypoint that's accessible for debugging
mkGuest { entrypoint.command = [ "..." ]; dev = true; ... }

The same flake source is consumed in both dev and production builds — there’s no separate “dev flake” the user has to maintain. The difference is purely in the resulting image’s metadata + the host-side console gate.

The mkGuest library produces a busybox-as-PID-1 rootfs (no NixOS, no systemd) and emits an ext4 image directly. The boot path is: kernel → /init script → mounts /proc /sys /dev → execs your entrypoint. No service manager between the kernel and your code. mvm’s security overlay (per-service uids, seccomp tier, dm-verity, read-only /etc) layers on top in Phase 6 without changing this base.

Floor: ≤ 300 ms cold p50 on every backend. A backend that can’t hit it is a backend we drop.

BackendCold p50Snapshot-cloned p50Notes
Firecracker (Linux/KVM)≤ 300 ms≤ 30 msDefault for typical workloads.
Cloud Hypervisor (Linux/KVM)≤ 300 ms≤ 50 msTier-1 peer of FC. Adds VFIO/GPU, virtio-gpu, virtio-fs, larger guests. Opt-in via --hypervisor cloud-hypervisor.
libkrun / libkrun (Linux/KVM)≤ 300 ms≤ 30 msCross-platform default; libkrun-backed.
libkrun / libkrun (macOS HVF)≤ 300 ms≤ 60 msmacOS path; HVF adds ~100ms over KVM.
Apple Virtualization framework≤ 300 ms≤ 200 msLegacy ladder; superseded by libkrun per ADR-013.

The numbers are surfaced on every mkGuest derivation as passthru.mvm.expectedBootMs so you can nix eval .#default.passthru.mvm.expectedBootMs to confirm. Phase 9 enforces with xtask perf --backend <name> --p50-ms 300 --runs 100. See ADR-013 §“Boot-time budget” for rationale.

The floor is achievable because the rootfs uses busybox-as-PID-1 with a custom /init (no NixOS, no systemd, no OpenRC). See ADR-013 for why this matters and the implementation breadcrumb.

What’s inside the mvm repository (and why you don’t touch it)

Section titled “What’s inside the mvm repository (and why you don’t touch it)”

The repository’s nix/ directory contains:

  • nix/flake.nix — exposes lib.<system>.mkGuest for your flake to consume.
  • nix/profiles/minimal.nix — an internal test fixture used by mvm’s own smoke tests (tests/smoke_libkrun.rs, tests/nix_flake_structure.rs). Not a starter template.

The internal fixture lives under the internal- namespace in flake outputs (nixosConfigurations.internal-minimal-…, packages.<system>.internal-minimal-runner) so the boundary is mechanical: anything internal-* is for mvm developers, not for users.

Terminal window
cd my-app
nix flake check --no-build

mvmctl validate does the same with extra mvm.toml checks layered on.

mvm runs Nix builds inside the project builder VM and copies the finished kernel/rootfs artifacts back to the host cache. You don’t need host-side Nix, and you don’t need to enter a dev shell before building.

  • Linux: the builder VM provides the Linux build boundary and cache policy. Firecracker is the default runtime backend when /dev/kvm is available.
  • macOS: the host mvmctl build command orchestrates a Linux builder VM. The resulting runtime image can then boot with Apple Virtualization (--hypervisor apple-container) or another available macOS runtime backend.
  • Windows: Tauri-only (the mvm-studio desktop app packages a WSL2-backed builder + runtime). See ADR-031.

PID 1 must be uid 0 (kernel mandate). Everything else can — and by default in production does — run non-root. mkGuest’s uids knob controls the privilege drop:

ProcessDefault uidRole
/init (PID 1)0Mounts pseudofs, forks the agent in the background, drops privs, exec’s the entrypoint
mvm-guest-agent990Vsock RPC handler (never needs root); supervised by /init
Entrypoint (workload)0 in dev, 1000 in prodYour service or shell

Agent binary status: as of Phase 1 W6.1.1 the agent at /usr/local/bin/mvm-guest-agent is a stub — a sh script that logs startup and sleeps. The supervision pattern is real (init forks it under uid 990 before setpriv-exec’ing the entrypoint); the vsock RPC surface lands when W6.1.2 swaps in the cross-compiled Rust binary. Every derivation surfaces passthru.mvm.agentBinary = "stub" | "real" so production deployments can refuse to boot a stub image.

The dev/prod default split is intentional:

  • Dev keeps entrypoint as root because debug shells expect root: apt install, mount, tcpdump. Forcing rootless dev would break those flows on first try.
  • Prod drops to uid 1000 by default per ADR-002 W2.1 — “no guest binary can elevate to uid 0.” A workload that isn’t root can’t be re-elevated.

/init uses setpriv --reuid=N --regid=N --clear-groups --no-new-privs -- to drop. --no-new-privs blocks setuid re-elevation, so even if the workload finds a SUID binary, it can’t reach uid 0.

# Rootless dev shell — forces non-root even in dev mode.
mkGuest {
entrypoint.shell = "/bin/bash";
uids = { entrypoint = 1000; };
}
# Rootful prod workload — explicit override, rarely the right call.
mkGuest {
entrypoint.command = [ "/usr/local/bin/serve" ];
uids = { entrypoint = 0; };
}
# Non-default agent uid (e.g. to avoid collisions with host-side ranges).
mkGuest {
entrypoint.command = [ "/bin/x" ];
uids = { agent = 5000; };
}

The resolved values surface as passthru.mvm.uids = { agent; entrypoint; } and passthru.mvm.rootlessEntrypoint :: bool so mvmctl status can cross-check against /proc/<pid>/status at runtime.

mvm is microVMs, not containers. Even though the underlying libkrun library exposes OCI image pulls (RootfsSource::Oci), mvm uses only the host-local disk-image path. The bridge between your Nix-built .ext4 rootfs and the runtime is a sibling .raw hard-link with fstype("ext4") — no registry, no auth, no pull cache, fully offline-by-default once your rootfs is built. ADR-013 §“Non-goal: OCI / container images” carries the full rationale.