Debugging firecracker-containerd from scratch

Apr 5, 2026 12 min

Debugging firecracker-containerd from scratch

Mikrom runs Firecracker microVMs. Until now our firecracker-agent talked to Firecracker directly via its HTTP API — creating the VM, attaching drives, configuring the network, booting. That works, but it means we own the full lifecycle: kernel path, rootfs path, jailer setup, socket management.

firecracker-containerd offers a better model: it wraps all of that behind a standard containerd interface. You pull an OCI image, run ctr run, and the aws.firecracker runtime shim handles the Firecracker process. The agent only needs to manage networking and expose its gRPC API.

Switching to it took one afternoon. Not because it is complex, but because each failure was silent in a different way. This post is a log of every error we hit and how we resolved it.


The setup

The host is a Debian Trixie machine with KVM available. The target architecture looks like this:

firecracker-ctr / grpcurl


firecracker-containerd   (:containerd.sock)
        │  devmapper snapshotter
        │  aws.firecracker runtime shim

Firecracker process
        │  virtio-mmio block + vsock

microVM kernel (hello-vmlinux.bin)
microVM rootfs (default-rootfs.img  ← squashfs with guest agent)

        ▼  overlayfs
OCI image layers (busybox, ubuntu, …)

The shim boots the VM from a base squashfs rootfs, then mounts the OCI image’s layers on top via overlayfs. Inside the VM, a guest agent binary listens on vsock and proxies container lifecycle calls back to the shim.


Error 1: snapshotter not loaded — naive

The first attempt used the config that came with our initial setup:

snapshotter: "naive"
failed to prepare extraction snapshot "extract-…": snapshotter not loaded: naive: invalid argument

The naive snapshotter in firecracker-containerd is a proxy plugin — a separate daemon that must be running and registered in config.toml under [proxy_plugins]. We did not have it.

Fix: switch to the devmapper snapshotter, which ships as a built-in plugin. The config change is straightforward:

# /etc/firecracker-containerd/config.toml
[plugins."io.containerd.snapshotter.v1.devmapper"]
  pool_name       = "fc-dev-thinpool"
  base_image_size = "10GB"
  root_path       = "/var/lib/firecracker-containerd/snapshotter/devmapper"

That requires a thin-pool device to exist first.


Error 2: devmapper thin-pool does not exist

Setting up a devmapper thin-pool for development does not require LVM. Loop-device-backed files are enough:

mkdir -p /var/lib/firecracker-containerd/snapshotter/devmapper

dd if=/dev/zero \
   of=/var/lib/firecracker-containerd/snapshotter/devmapper/data \
   bs=1M count=102400   # 100 GB

dd if=/dev/zero \
   of=/var/lib/firecracker-containerd/snapshotter/devmapper/metadata \
   bs=1M count=2048     # 2 GB

DATA_DEV=$(losetup --find --show .../data)
META_DEV=$(losetup --find --show .../metadata)

dmsetup create fc-dev-thinpool \
  --table "0 $(blockdev --getsz $DATA_DEV) thin-pool \
  $META_DEV $DATA_DEV 128 32768 1 skip_block_zeroing"

dmsetup ls confirmed the pool was there. But the image pull still failed with the same snapshotter not loaded: devmapper: invalid argument error.


Error 3: dmsetup not in PATH

The daemon starts the devmapper plugin during initialization. The plugin calls dmsetup version to check that the tool is available. On Debian, dmsetup lives in /sbin, which is not in the default PATH when processes are launched by the system.

The daemon startup log made this explicit:

ERRO dmsetup not available
WARN failed to load plugin io.containerd.snapshotter.v1.devmapper
     error="exec: \"dmsetup\": executable file not found in $PATH"

Fix: set PATH explicitly when launching the daemon:

sudo env PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
  /usr/local/bin/firecracker-containerd \
  --config /etc/firecracker-containerd/config.toml

After this, firecracker-ctr images pull --snapshotter devmapper docker.io/library/busybox:latest succeeded.


Error 4: VM kernel hangs after “loop: module loaded”

The pull worked. The next step was ctr run:

VM "…" didn't start within 1m0s: failed to dial the VM over vsock: context deadline exceeded

The daemon serial console output (logged as vmm_stream=stdout) showed the Linux kernel booting up to loop: module loaded at ~2.6 seconds and then going completely silent. No userspace, no systemd, no panic.

The root cause was an orphaned devmapper snapshot from a previous failed run. A leftover fc-dev-thinpool-snap-N device was still registered, and the shim’s attempt to create a new snapshot for the rootfs drive silently failed. Firecracker booted with no root drive, so the kernel stalled waiting for /dev/vda.

Fix: list and remove orphaned snapshots before retrying:

sudo dmsetup ls
# fc-dev-thinpool       (253:0)
# fc-dev-thinpool-snap-3  (253:1)   ← orphan

sudo dmsetup remove fc-dev-thinpool-snap-3
sudo rm -rf /var/lib/firecracker-containerd/shim-base/default#<old-vmID>

After cleanup, the VM booted all the way to systemd.

One thing to keep in mind: the thin-pool and loop devices are lost on host reboot. Re-running the losetup + dmsetup create commands is required after each restart.


Error 5: guest agent exits with code 1 — vsock EOF

With a clean state the VM now reached systemd and started the firecracker-agent.service. The daemon side showed rapid EOF errors instead of the previous i/o timeout:

attempt=514 error="vsock ack message failure: failed to read \"OK <port>\" within 1s: EOF"

EOF instead of i/o timeout is meaningful: the guest agent was connecting to the vsock and then immediately dying, rather than never connecting at all.

Systemd inside the VM confirmed this:

systemd[1]: Started Firecracker VM agent.
systemd[1]: firecracker-agent.service: Main process exited, code=exited, status=1/FAILURE

But no error message from the agent itself. To capture it we modified the service’s ExecStart to redirect output directly to /dev/console (bypassing the journal, which was not forwarding the agent’s output reliably):

ExecStart=/bin/sh -c '/usr/local/bin/agent --debug >/dev/console 2>&1; echo "agent exit $?" >/dev/console'

Since default-rootfs.img is a squashfs (read-only by design) we had to repack it:

unsquashfs -d /tmp/fc-rootfs /var/lib/firecracker-containerd/runtime/default-rootfs.img
# edit /tmp/fc-rootfs/etc/systemd/system/firecracker-agent.service
mksquashfs /tmp/fc-rootfs \
  /var/lib/firecracker-containerd/runtime/default-rootfs.img \
  -comp gzip -noappend -force-uid 0 -force-gid 0

The next run revealed the actual error:

/usr/local/bin/agent: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found
/usr/local/bin/agent: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found
agent exit 1

Error 6: glibc version mismatch in the guest agent

The agent binary inside default-rootfs.img was compiled on a Debian Bookworm/Ubuntu 22.04 host (glibc ≥ 2.34). The rootfs ships Debian Bullseye, which has glibc 2.31.

Go binaries are statically linked when compiled with CGO_ENABLED=0. Our binary was not — it had been built without that flag, picking up a dynamic dependency on the host’s glibc.

Fix: rebuild the agent statically and inject it into the rootfs:

# In the firecracker-containerd source tree
cd agent
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o agent-static .

# Inject
unsquashfs -d /tmp/fc-rootfs \
  /var/lib/firecracker-containerd/runtime/default-rootfs.img
cp agent-static /tmp/fc-rootfs/usr/local/bin/agent
mksquashfs /tmp/fc-rootfs \
  /var/lib/firecracker-containerd/runtime/default-rootfs.img \
  -comp gzip -noappend -force-uid 0 -force-gid 0

It works

After the static rebuild:

$ sudo firecracker-ctr \
    --address /run/firecracker-containerd/containerd.sock \
    run --snapshotter devmapper --runtime aws.firecracker \
    --rm --tty docker.io/library/busybox:latest busybox-test
/ #

A busybox shell inside a Firecracker microVM, booted from an OCI image via firecracker-containerd.


Summary of fixes

ErrorRoot causeFix
snapshotter not loaded: naivenaive snapshotter daemon not runningSwitch to devmapper
snapshotter not loaded: devmapperdmsetup not in PATH for the daemon processLaunch with explicit PATH=…:/sbin:/usr/sbin
Kernel hangs at loop: module loadedOrphaned devmapper snapshot blocks rootfs drive creationdmsetup remove + rm -rf shim directory
vsock: EOF, agent exits 1Guest agent dynamically linked against newer glibc than rootfs providesRebuild agent with CGO_ENABLED=0

What’s next

The remaining step is wiring this up to the Mikrom API. The firecracker-agent gRPC service already delegates VM lifecycle to containerd — the manager.go calls ctrd.Pull and ctrd.NewContainer with the aws.firecracker runtime. Now that the underlying stack is verified working, the integration test suite can run against a real VM instead of mocks.

The full host setup procedure is documented in firecracker-agent/docs/host-setup.md.

~Antonio Pardo