ssh-hypervisor: "SimCity for VMs"

Tackling a larger systems programming project with AI tools.

Sep 23, 2025

This weekend I tried to make a hypervisor hooked up to SSH. It’s like:

ssh <YOUR_NAME>@vmcity.ekzhang.com

But every time someone logs in with a different name, instead of being a user on the host machine, it greets you and then spins up a virtual machine with Firecracker.

$ ssh eric@vmcity.ekzhang.com

Hello, eric! 🌸

Today is Sunday. It's your first time here.

Recent logins:
┌─────────┬──────────────┐
│  User   │  Last login  │
├─────────┼──────────────┤
│ matthew │ 2 hours ago  │
│ kathy   │ 4 hours ago  │
│ linus   │ 16 hours ago │
│ sen     │ 4 days ago   │
└─────────┴──────────────┘

Booting up your fresh VM:
💡 ▮▮▮▯▯▯▯▯▯▯▯▯ 25%

If you haven’t logged in for a while, we store your VM in a snapshot.

This isn’t an original idea, by the way! I had seen this somewhere online, with a person showing off their tiny OS with Firecracker microVMs over public SSH. Unfortunately I don’t remember where I saw this, but I wanted to take this idea and make it a bit whimsical, while adding a couple toy features.

Update: A commenter shared the project https://github.com/nuta/kerla

Back in high school and college, I used to make a lot of smaller, fun projects over the weekend and share them with people. I don’t do this as much now with a job. These tiny projects became less interesting as I grew familiar with systems; more implementation-heavy rather than new ideas.

I think that’s sad though. This project would maybe have taken me 1-2 weeks in the past, so I was hoping that with AI tools, I could do it in just a weekend (inspiration). Then I can spend time on more frivolous projects. I still get ideas all the time. This is one of them, let’s just build it, see where it goes and let my creative side take control!

What is a hypervisor?

I saw this quote recently that sums it up well:

Hypervisor is essentially a hardware-assisted catch block

This is all what I want you to learn from this book. Hardware-assisted hypervisors are event handlers. They are not like a CPU emulator.

In JavaScript, the life of a hypervisor looks like this:
A hypervisor runs the guest OS in a try block, catches events (VM exits), and goes back to the guest mode again.

I want to keep this in mind while working through the project. Firecracker is a very lightweight hypervisor, and they spin up “microVMs” — since hypervisors are catch-blocks, that essentially means the catch-block is small. Firecracker only emulates a few devices and relies on host features for as much as possible. This makes it really fast to boot compared to QEMU.

However, this doesn’t mean that Firecracker is any simpler to set up than other hypervisors. You still need to hook up all the parts of a virtual computer in the right places to get things working! For instance:

Bring your own init system like OpenRC / Systemd.
Attach a kernel ramfs, disk at startup.
Want network? Set up a MAC address, TAP device, bridge, IP routing rules, firewall filters, packet forwarding, and so on.
- Want multiple VMs? Create a network bridge, set the controller of the TAP to that bridge, allocate private IPs from a pool, dynamically configure iptables.
Want serial logs? Edit your kernel boot arguments to send them at a baud rate over the /dev/console TTY.

It’s a good reminder that VMs are tiny little computers that live in your own. When you start up VMs, you’re building up your own computer from scratch!

M4 MacBook Air Teardown: Apple, When Will MacBooks Finally Get Repair Upgrades? - iFixit — Apple M4. Look at that wild chip. Virtualization is taming the beast. (source)

What I found in this project is that, while AI tools made coding a lot faster (thousands of lines in minutes), they didn’t speed up the debugging process all that much. This, understandably, ended up being a big part of the work. 😅

Setting up the project

First step is to decide what we’re building. I’m going to use Go because:

It is a simple, relatively easy systems programming language that compiles fast.
- This makes AI is very good at writing Go code.
It has good support for Firecracker via the official Go SDK.
It compiles to native, statically-linked executables (without cgo). I could deal with glibc / musl issues, but that’s just one more dimension added to an already tricky setup with kernels and virtualization.

So we initialize the project, and I give the AI some basic direction. To start up the VM, we’ll embed a static firecracker binary + vmlinux inside our own binary during the build process, which can be spooled to a tempfile and execv’d.

This will live in a Manager on the Go side that creates VMs and sets up networking. Then, we build a custom SSH server (not relying on OpenSSH) to forward connections to a VM. Here’s what the AI spit back out at me:

Architecture Overview

ssh-hypervisor is a Go-based SSH server that dynamically provisions
Firecracker microVMs. When users SSH in, they get a fresh or restored VM
 instance.

Core Components

1. Network Architecture
- Host SSH server (using Wish library) on configurable port (default
2222)
- TAP devices provide VM network connectivity (configurable CIDR,
default 192.168.100.0/24)
- SSH channel proxying: Host SSH channels → TAP network → Guest SSH
server
- Health check: Poll guest SSH port (22) to detect VM boot completion

2. VM Lifecycle
- Provisioning: Extract embedded Firecracker binary, configure VM
(memory/CPU), boot with TAP networking
- Snapshots: Idle VMs suspended to disk, restored on user reconnection
- Guest OS: Standard Linux with sshd, handles all SSH protocol
complexity (SFTP, port forwarding, etc.)

3. Code Structure
cmd/main.go              # CLI entry point, Linux+amd64/arm64 only
internal/
  config.go              # Configuration & validation
  server/server.go       # SSH server implementation (TODO)
scripts/download-firecracker.sh  # Downloads & embeds Firecracker binary
binaries/firecracker     # Embedded static binary

4. Key Features
- Static binary: Firecracker embedded via go:embed, no external
dependencies
- Build constraints: Prevents compilation on unsupported platforms
- Configuration: CLI flags for port, VM specs, network CIDR, data
directory
- Channel transparency: Host becomes invisible proxy between client SSH
and guest SSH

Next Steps

1. Integrate Wish SSH server
2. Implement VM management & Firecracker integration
3. Build TAP networking & health checks
4. Add snapshot/restore logic

Some things about this summary, I think are a bit overcomplicated. But overall the direction seems fine for now.

It sets up the scaffold, I rearrange some files manually to get it working and to fix the //go:embed paths. Alright, this seems like it will work.

Getting Firecracker to run & SSH

The hardest part of this project will be getting the VM to run and be accessible by SSH. So before I add more complexity, we should figure this out first. I ask the AI to comment-out the server startup code in the CLI entrypoint, and temporarily just have the entrypoint start up a machine — this works.

I need to make some manual changes to get logging with logrus working, as well as other things that the AI can’t figure out:

Adding syscall.SysProcAttr{ Setpgid: true } to the process and providing the firecracker.WithProcessRunner() option. The latter specifies the spooled Firecracker binary path, and the former is needed so that Ctrl+C on controlling terminal (server) doesn’t also interrupt the Firecracker subprocess prematurely before Go code can gracefully shutdown.
Setting up the network of the machine using auto-generated MAC addresses, passing an ip=… option to the kernel boot to configure its eth0 network interface with the proper gateway and netmask. This was based on Julia Evans’s gist.
Creating a network bridge on startup and assigning the VM’s TAP device to that bridge. Also, setting up CI in GitHub Actions (with kvm support) and figuring out CAP_NET_ADMIN permissions to run the aforementioned setup.

So now it’s working. I can run the binary with a provided rootfs (built from an Alpine image with customization) and it starts a full VM! I can see the serial logs too.

Unfortunately, I can’t SSH into the machine. Something is wrong. And there are no serial logs for the sshd daemon with OpenRC. 😭

$ ssh root@192.168.100.2
ssh: connect to host 192.168.100.2 port 22: Connection refused

So we begin the network debugging, yet again. Let’s see if packets are reaching the destination at least.

$ ping 192.168.100.2
PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data.
64 bytes from 192.168.100.2: icmp_seq=1 ttl=64 time=0.401 ms
64 bytes from 192.168.100.2: icmp_seq=2 ttl=64 time=0.359 ms

$ tcptraceroute 192.168.100.2
Selected device sshvm-br0, address 192.168.100.1, port 52619 for outgoing packets
Tracing the path to 192.168.100.2 on TCP port 80 (http), 30 hops max
 1  192.168.100.2 [closed]  0.430 ms  0.243 ms  0.287 ms

Ok, … so that works. Looks like ICMP and TCP packets are both reaching the VM’s destination IP address on the bridge, but port 22 is still not accepting connections.

At this point, the issue can be broken up into two possibilities:

sshd is not starting in the guest VM. We have no logs, so it could just be not listening at all, and then the host isn’t reaching port 22 of course.
There is some kind of networking problem. I find this less likely because “Connection refused” (ECONNREFUSED) usually means that a server actively rejected a connection attempt due to a port not being open; network issues usually show up as timeout / no reachable route.

Still, both are possible, so I think we should try and figure out first which category it’s in. So I will try and see if sshd is indeed starting up.

At this point, I copy-pasted everything back into Claude Code and had it try to figure things out. It flailed around for a while, luckily it’s able to undo its own work.

Okay, let’s go back to the basics. How do we get sshd to run on init? I think the immediate issue for observability is that I can’t SSH into the Firecracker machine and figure out why it’s not running. This turns out to be a whole rabbit hole:

The next step is to debug why “agetty” is not running on startup, but I can’t figure this out either.

So I add agetty to my VM’s inittab instead of as an OpenRC service.

cat > /etc/inittab <<'EOF'
ttyS0::respawn:/sbin/agetty -L 115200 ttyS0 linux
EOF

But even after that, I can’t login with the serial console as root! It’s not supporting my password that I set with chpasswd. I search this up, maybe the issue is using busybox login, so I update to util-linux-login, no dice though.
I’m not really a sysadmin. The only thing that worked so far is opening up a “rescue shell” by setting init=/bin/sh on login, so I’ll just try that next.
This does work! I can’t run sshd still, but running nc -l -p 42 in the guest shell and nc 192.168.100.2 42 outside establishes network communication between the host and guest, so that’s great. The problem is no longer in the network, but in sshd itself. 🥳

I’m kind of tired of working in a VM. So I see if I can get sshd working in a simple Alpine container instead, which this VM is based on.

$ docker run -it --rm -p 1022:22 alpine sh

/ # apk add --no-cache util-linux util-linux-login openssh
...
OK: 21 MiB in 63 packages

/ # apk add --no-cache openrc
...
OK: 23 MiB in 71 packages

/ # echo "root:root" | chpasswd
chpasswd: password for 'root' changed

/ # sed -i 's/^#PermitRootLogin.*/PermitRootLogin yes/' /etc/ssh/sshd_config

/ # ssh-keygen -A
ssh-keygen: generating new host keys: RSA ECDSA ED25519 

/ # /usr/sbin/sshd -D -e
Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.

And well, yes it works. I can ssh into the container from port 1022. Well okay, so something is very different about the VM, and it is causing sshd to hang.

I’m very confused, but the very next thing I try fixes the issue. The issue is entropy. If I run cat /proc/sys/kernel/random/entropy_avail, it only has a few bits of random entropy available! So the operating system blocks on reading random initial state (needed for cryptography in sshd). This is because Firecracker does not provide a virtio-rng device by default.
- Originally validated this by installing rngd and running it in the background manually, which causes sshd to work.
- But it’s probably better to produce actual entropy, so I’ll add virtio-rng now.
- Nevermind, the Firecracker Go SDK doesn’t support adding an entropy device during machine creation. Maybe there’s a raw way to do it?
- Eh… this is not worth spending more time on, will just use rngd.

It works now! Yes! Turns out that running VMs isn’t just like Docker, it’s being your own sysadmin but even more difficult than usual. :’)

During this whole debugging session (5+ hours), I asked ChatGPT a lot of stuff. Gave up on Claude Code since it kept making changes. The AI very confidently guided me toward directions that didn’t work, and it gave me a lot of false hope. But it did eventually find the issue, which was the lack of random entropy causing silent blocking, which I wouldn’t have found otherwise without Google search or strace. I think it probably saved time overall?

Then, I spent another hour trying to get this working with OpenRC. It does not work. I’m just going to call it quits and use bash as my init process, oh well.

And then! It’s working now, but SSH still takes 6 seconds to start up.

virtio-rng and building my own vmlinux

Remember the entropy device from earlier? I still have this rngd hack in my init script that initializes fake entropy:

rngd -f -r /dev/urandom &

Lately it’s become clear that this is a bad idea, and it adds exactly 5 seconds to VM startup for starting a “jitter” generator, which makes the time between boot and getting a shell ~4x slower. I work out how to add an entropy device by manually hitting the Firecracker HTTP endpoint, but it’s still not appearing as /dev/hwrng on the guest.

I think this is because the guest kernel that I’m using is from the quickstart_guide public bucket in S3, and it’s a very old Linux 4.14 image without many devices. Or maybe not. In any case, if I have a newer Linux version then random.trust_cpu (introduced in Linux 4.19) will be respected, and I shouldn’t have a problem either way since it can rely on hardware RNG instructions.

So I try to build an image based on Linux 6.1, and I run into—problems!

[   13.203150] clk: Disabling unused clocks
[   13.207664] /dev/root: Can't open blockdev
[   13.209652] VFS: Cannot open root device "vda" or unknown-block(0,0): error -6
[   13.212952] Please append a correct "root=" boot option; here are the available partitions:
[   13.216454] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[   13.219839] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.153 #2
[   13.220411] Call Trace:
[   13.220411]  <TASK>
[   13.220411]  show_stack+0x3a/0x40
[   13.220411]  dump_stack_lvl+0x3d/0x51
[   13.220411]  dump_stack+0x10/0x16
[   13.220411]  panic+0x100/0x297
[   13.220411]  mount_block_root+0x13e/0x1d9
[   13.220411]  mount_root+0x117/0x138
[   13.220411]  prepare_namespace+0x135/0x16a
[   13.220411]  kernel_init_freeable+0x166/0x188
[   13.220411]  ? rest_init+0xc0/0xc0
[   13.220411]  kernel_init+0x15/0x120
[   13.220411]  ret_from_fork+0x1f/0x30
[   13.220411]  </TASK>
[   13.220411] Kernel Offset: disabled
[   13.220411] Rebooting in 1 seconds..
2025-09-23T00:36:38.075124657 [anonymous-instance:main] Vmm is stopping.
2025-09-23T00:36:38.075549933 [anonymous-instance:main] Vmm is stopping.
2025-09-23T00:36:38.090112389 [anonymous-instance:main] Firecracker exiting successfully. exit_code=0

This is a giant pain. The “Cannot open root device” is a completely useless error message that could mean any number of things, whether APIC issues or uninitialized modules, or even Firecracker bugs. The AI is equally confused.

I spend about an hour stuck on this for a while, trying different Linux and Firecracker versions and flipping kernel configs on/off.

At this point, it’s Monday. So I go to work. And while I’m on the train there, I do some Googling and find this bit from kernel-policy.md:

We use these configurations to build microVM-specific kernels vended by Amazon Linux … As a result, kernel configurations found in this repo should be used to build exclusively the aforementioned Amazon Linux kernels. We do not guarantee that using these configurations to build upstream kernels, will work or produce usable kernel images.

😭

Okay, so that’s it. I need to use Amazon Linux, and then it will work, right? Of course the people at Amazon would use their own Linux fork. So it’s back to the AI.

Let’s run this build again, using Orbstack for their seamless VMs on macOS. Now that we’re building from Amazon Linux, it should work with Firecracker, right?

And it fails again — but I then removed the pci=off acpi=off options, and this combined with Amazon Linux allows it to finally boot. Hooray.

Even better, I’m no longer on an ancient Linux version. Timidly, I decide to try returning to OpenRC despite my issues from earlier. And yes: OpenRC works, sshd is running, and even agetty is finally no longer stalling. Everything is blissful. Yay! It all makes sense again, definitely worth debugging.

Now that things boot, everything just got a lot easier. I also build the vmlinux kernel for ARM64, just for fun, again inside an Orbstack VM. :)

Hooking Firecracker up to an SSH server

We have VMs working! It’s time to hook it up to SSH and build our app.

I’m relying heavily on the AI to figure out the implementation on this, and it’s going swimmingly. It worked out the SSH protocol with no issues at all, and it’s especially good at making cute interactive terminal output, like animated progress bars.

With things like session management and architecture, it’s also good to work in broad strokes as we make changes.

(At some point my VM ran out of RAM and started thrashing.)

But for the most part, this was pretty easy to code since nothing was too tricky to debug on the application side. Just kept iterating, trying it out and fixing things that didn’t quite look right.

On the systems side, I worked out some iptables rules and added a couple entries optionally when -allow-internet is passed in, so the VMs get Internet access.

iptables -A FORWARD -i sshvm-br0 ! -o sshvm-br0 -j ACCEPT -m comment --comment "ssh-hypervisor"
iptables -A FORWARD ! -i sshvm-br0 -o sshvm-br0 -j ACCEPT -m comment --comment "ssh-hypervisor"
iptables -t nat -A POSTROUTING -s <VM_CIDR> ! -o sshvm-br0 -j MASQUERADE -m comment --comment "ssh-hypervisor"

The end result

It works! And it is very cute :)

I am still hosting this at vmcity.ekzhang.com for now, but I will stop at some point, earlier if I notice any crypto miners or other unscrupulous folk.

ssh <YOUR_NAME>@vmcity.ekzhang.com  # try it now!

This was lots of fun, and we ended up with a static binary that runs VMs hooked up to on-demand SSH.

You can get the code here: https://github.com/ekzhang/ssh-hypervisor

eric makes software

Discussion about this post