Harry Hodge

Getting Started with Firecracker

Getting Started

I’m probably late to the nerd party but I recently decided to take a closer look at running Firecracker VMs locally. I used to run a lot of qemu VMs, and explored libvirt to test Puppet and Ansible, and tried out running a VMware vSphere host. Firecracker turned up on Hacker News 4 years ago so I’m almost certainly out of the loop.

Firecracker has a great public presence and the Github repo is fantastic. The getting-started docs got me up and running quickly. There are good examples of building the Firecracker binary in containers which is great if you don’t have the rust toolchain. Especially useful for controlling the build environment (it works on my laptop!) or toolchains not everyone is familiar with.

I’m not going to walk through all the steps for starting and running Firecracker VMs. The getting started docs are comprehensive and I’m not going to do a better job.

If you want to see the configs and scripts I ended up with take a look at my sandbox repo. It’s rough, fairly grubby, but mostly functional!

We start by manually setting up the TAP network interface for the firecracker VM. This could get fiddly if we want to run multiple VMs.

1# Setup network interface
2sudo ip link del "$TAP_DEV" 2> /dev/null || true
3sudo ip tuntap add dev "$TAP_DEV" mode tap
4sudo ip addr add "${TAP_IP}${MASK_SHORT}" dev "$TAP_DEV"
5sudo ip link set dev "$TAP_DEV" up

I guess I could write some bash to parse CIDRs and manage network interfaces for me. Luckily Julia Evans has a fantastic series of posts on her experiments with Firecracker and I can borrow some of her network setup code. The knowledge that I could put a Firecracker VM on a network bridge is very helpful but I’ve still got to manually set an IP address for each interface.

 1ip link add firecracker0 type bridge
 2ip link set firecracker0 up
 3ip addr add 172.21.0.1/16 dev firecracker0
 4iptables -t nat -A POSTROUTING -s 172.21.0.0/16 ! -o firecracker0 -j MASQUERADE
 5iptables -I FORWARD 12 -i firecracker0 ! -o firecracker0 -j ACCEPT
 6
 7ip tuntap add dev "$TAP_DEV" mode tap
 8brctl addif firecracker "$TAP_DEV"
 9ip link set dev "$TAP_DEV" up
10
11## In the guest VM
12echo "nameserver 8.8.8.8" > /etc/resolv.conf

With some further sleuthing I discover that Tim Gross has written a comprehensive post on his Firecracker journey and shares a lot of great insights. The most useful to us is that we can make use of the Container Network Interface and plugins to mange the network for us!

We can use cnitool to execute the plugins outside of the Kubernetes orchestration use-case. We’ve upgraded from sandpaper to a belt sander.

Belt Sanding with CNI

CNI creates interfaces from JSON so we need to create some to define our network interfaces. Another revelation from Tim’s post is the tc-redirect-tap CNI plugin published by awslabs on Github. It’s used in an AWS proof-of-concept to demonstrate packing a host with Firecracker VMs and recommended in the firecracker-go-sdk project for configuring the network.

This is the networking configuration I use locally, /srv/vm/networks is used to persist network data by the CNI. The bridge interface is created if it doesn’t exist and IPs are assigned from the range start and inherit the settings from the ipam config.

 1{
 2    "name": "firecracker",
 3    "cniVersion": "1.0.0",
 4    "plugins": [
 5      {
 6        "type": "bridge",
 7        "name": "firecracker-bridge",
 8        "bridge": "fcbr0",
 9        "isGateway": true,
10        "ipMasq": true,
11        "ipam": {
12          "type": "host-local",
13          "resolvConf": "/etc/resolv.conf",
14          "dataDir": "/srv/vm/networks",
15          "subnet": "192.168.30.0/24",
16          "rangeStart": "192.168.30.32",
17          "gateway": "192.168.30.1"
18        }
19      },
20      {
21        "type": "firewall"
22      },
23      {
24        "type": "tc-redirect-tap"
25      }
26    ]
27  }

Once we have the CNI plugins installed to /opt/cni/bin and the local CNI configurations in the relative path ./net.d/*.conflist we can create a network interface. CNI operates with network namespaces so we create unique namespaces for each VM interface.

1# Generate an ID for the network namespace
2id="$(uuidgen | tr A-Z a-z)"
3# Create the network namespace
4ip netns add $id
5# Create the network interfaces in the namespace
6CNI_ARGS="IgnoreUnknown=1;TC_REDIRECT_TAP_NAME=tap1" CNI_PATH="/opt/cni/bin" NETCONFPATH="$(pwd)/net.d" \
7	cnitool add firecracker "/var/run/netns/${id}" | tee /srv/vm/networks/${id}.json

This sets up the iptables forwarding for the bridge interface if it doesn’t already exist, the network namespace and interfaces in the namespace. cnitool then dumps JSON with the details of the network, we can pipe it somewhere to parse out the IPs and interface MACs for our VM configuration with trusty jq.

1vm_ip=$(jq -r '.ips[0].address | rtrimstr("/24")' < "${netcfg}")
2guest_mac=$(jq -r '.interfaces[] | select(.name == "eth0").mac' < "${netcfg}")
3gateway=$(jq -r '.ips[0].gateway' < "${netcfg}")

In the VM boot_args we can set the VM network interface IP and gateway.

1"boot_args": "... ip=192.168.30.39::192.168.30.1:255.255.255.0::eth0:off"

We will also set up the network-interfaces MAC from our CNI output - see guest_mac.

1    "network-interfaces": [
2        {
3            "iface_id": "eth0",
4            "guest_mac": "42:e2:53:5d:d6:c8",
5            "host_dev_name": "tap1"
6        }
7    ],

Once we are done we can use CNI tool to remove the network interface using the saved JSON output then drop the network namespace.

1CNI_ARGS="IgnoreUnknown=1;TC_REDIRECT_TAP_UID=${uid};TC_REDIRECT_TAP_GID=${gid};TC_REDIRECT_TAP_NAME=tap1" CNI_PATH="/opt/cni/bin" NETCONFPATH="/etc/cni/net.d/" \
2	cnitool del firecracker "/var/run/netns/${1}"
3
4ip netns del <network_namespace_id>

What are we going to run?

The firecracker docs use a pre-built Ubuntu rootfs, kernel image and ssh keypair. This works but I’ve heard that Fly.io run Docker containers with Firecracker and I’d like a piece of the action.

We can create our own rootfs from a container image, and again Julia Evans is out here making it easy for us. We build an empty filesystem, start a container and copy the contents of the Docker container into the filesystem. I’ve no idea if this is at sensible or what the integrity of the resulting image is but it runs!

1CONTAINER_ID=$(docker run -td ubuntu:22.04 /bin/bash)
2MOUNTDIR=mnt
3IMAGE=ubuntu.ext4
4mkfs.ext4 $IMAGE
5qemu-img create -f raw $IMAGE 800M
6sudo mount $IMAGE $MOUNTDIR
7docker cp $CONTAINER_ID:/ $MOUNTDIR

So now we can do this each time we want to start a VM. Or we could copy the base OS image each time? Oh thank goodness, Julia Evans has solved this for us too. We can use dmsetup and losetup to create a copy-on-write filesystem.

So instead of making a copy, instead we overlay another image on top. Reads come through the bottom, but any writes only go to the top level.

 1# Credit Julia Evans
 2# https://jvns.ca/blog/2021/01/27/day-47--using-device-mapper-to-manage-firecracker-images/
 3BASEIMAGE=/path/to/base/image.ext4
 4OVERLAY=/path/to/overlay.ext4
 5
 6# Step 1: Create an empty image
 7qemu-img create -f raw $OVERLAY 1200M
 8OVERLAY_SZ=`blockdev --getsz $OVERLAY`
 9
10# Step 2: Create a loop device for the BASEIMAGE file (like /dev/loop16)
11LOOP=$(losetup --find --show --read-only $BASEIMAGE)
12SZ=`blockdev --getsz $BASEIMAGE`
13
14# Step 3: Create /dev/mapper/mybase
15printf "0 $SZ linear $LOOP 0\n$SZ $OVERLAY_SZ zero"  | dmsetup create mybase
16
17# Step 4: Create another loop device for the OVERLAY file
18LOOP2=$(losetup /dev/loop23 --show $OVERLAY)
19
20# Step 5: Create the final device mapper
21echo "0 $OVERLAY_SZ snapshot /dev/mapper/mybase $LOOP2 P 8" | dmsetup create myoverlay

I can’t remember how to mount a devmapper device into the Firecracker VM. It’s probably easy but I’ve lost the configurations I used. Probably just pass it the path to the device mapper device.

We can take this further and use the script in the firecracker-containerd repo to run debootstrap and create a minimal debian rootfs without Docker. Alternatively there is a script that uses ctr in the firecracker repo to mount a container filesystem and copy out the contents. Fly.io have also written about using the devmapper backend for containerd to create and manage the lifecycle of unpacking container images into block devices for Firecracker.

Writing a custom init

The firecracker-containerd project is a plugin to have containerd manage Firecracker VMs. This could be a great way to quickly accomplish most of the things I’m exploring here - maybe another time.

There is a shell script in the base image building tool overlay-init that creates an overlay filesystem when the VM starts so that all writes don’t touch our rootfs layer. The script is specified as a kernel_arg in the Firecracker VM configuration

1ro console=ttyS0 noapic reboot=k panic=1 pci=off nomodules systemd.journald.forward_to_console systemd.unit=firecracker.target init=/sbin/overlay-init

I didn’t know that the init process could be so easily intercepted! I really like this approach, it’s much more lightweight compared to our devmapper setup. I also learnt about pivot_root.

Are you sure you want to compile the kernel?

Should I? I bet there are kernel compilation memes on Tiktok already and I’m out here wondering if I’ll accidentally torch my workstation. I’d like to try running another kernel instead of the one I’ve downloaded in the getting started docs.

Oh alright - the firecracker docs make it look easy. I better download Tiktok.

1KERNEL_VERSION=6.6.6
2curl -L https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-${KERNEL_VERSION}.tar.xz > linux-${KERNEL_VERSION}.tar.xz
3mkdir -p linux-${KERNEL_VERSION}
4tar --skip-old-files --strip-components=1 -xf linux-${KERNEL_VERSION}.tar.xz -C linux-${KERNEL_VERSION}
5cd linux-${KERNEL_VERSION}
6make defconfig
7make -j "$(nproc)" # use all our available cores!

We point our firecracker VM configuration at our new kernel image and boot! This is part of the JSON configuration instead of using the Firecracker API socket.

1{
2    "boot-source": {
3      "kernel_image_path": "/build/path/images/linux-6.6.6/vmlinux",
4	...
5    },
6    ...

This is a bad example though - I haven’t actually booted a Firecracker VM successfully with this 6.6.6 kernel version. The supported versions look like 4.14, 5.10 and 6.1.

I’m still trying to get my head around the defconfig part of the compilation. From the kernel Makefile I think this is where we choose what gets compiled, options and other configurations. defconfig sounds like the defaults. The Arch Linux kernel config has over 11k lines. Did I mention I run Arch Linux? There are some microvm kernel configs in firecracker-containerd that are probably better suited to microvms - in my testing the defconfig results in slower start up times.

Booting VMs in under a second

One of the primary advantages of Firecracker is the speed that it can start VMs. It operates with a fraction of the functionality of QEMU and this narrow focus allows it to be fast. There are projects and configurations that have optimised QEMU boot time to be comparable to Firecracker, it can be fast too.

When we start a Firecracker VM we can set the kernel and boot args. The firecracker docs give us this set of options

1KERNEL_BOOT_ARGS="console=ttyS0 reboot=k panic=1 pci=off"

VMs still start fast like this - mine were around 1.5-3 seconds. Not bad. Can we go faster? We can disable initialisation of input devices in the i8042 module - parameters described in the kernel docs.

ro console=ttyS0 noapic reboot=k panic=1 pci=off nomodules random.trust_cpu=on i8042.noaux i8042.nomux i8042.nopnp i8042.nokbd

Now we start in 250ms! Not bad at all, I think I’ll leave it at that.

Wrapping up

This is already long, so I won’t drag it out any further. I’ve learnt a lot diving into Firecracker, picked up more Rust, isolation techniques, kernel compilation, and networking. I did start to look into the jailer process for Firecracker and the security benefits, but I didn’t get that far. This has been a lot of fun.