Running accelerators securely is a balancing act. GPUs and FPGAs crave bare-metal access, but multi-tenant platforms demand isolation. Enterprises like Bloomberg, iRobot, and Fly.io have shared how Firecracker microVMs thread the needle, but the playbook for combining accelerators with microVMs is still emerging.
This article captures the architecture we deploy for clients who need accelerator density, strong isolation, and developer-friendly workflows.
Hardware-aware scheduling
We start with host profiles:
- GPU hosts with SR-IOV support, MIG partitioning (for NVIDIA A100s), and NICs configured for low latency.
- FPGA hosts with toolchains baked into immutable AMIs.
Cluster schedulers (Kubernetes with KubeVirt/Kata, Nomad, or custom controllers) expose microVM runtime classes. Nodes advertise accelerator capabilities through node labels (e.g., microvm.accelerator=nvidia-a100).
MicroVM runtimes
We blend:
- Kata Containers with Firecracker VMM, enabling OCI compatibility.
- gVisor for workloads that do not require dedicated accelerators but still need isolation.
- Weave Ignite for GitOps-managed microVM provisioning outside Kubernetes.
Each microVM boots from distroless images that include only required drivers and tooling. Rootfs images are sealed and signed with Cosign.
Secure device passthrough
Isolation hinges on device control:
- VFIO/IOMMU ensures microVMs access only their assigned GPU or FPGA.
- NVIDIA MIG profiles offer fractional GPUs while maintaining memory isolation.
- cgroup and seccomp policies restrict system calls; Kata’s policy.yaml files codify allowed operations.
Auditable logs capture device assignments and teardown events for compliance.
GitOps pipeline for accelerators
Developers define microVM workloads as code:
- Terraform provisions accelerator pools and registers them with the scheduler.
- Helm charts or KubeVirt manifests declare workloads with resource requests (e.g.,
resources.limits.nvidia.com/gpu-mig-1g.10gb: 1). - CI/CD pipelines run integration tests with TestContainers’ GPU support or custom harnesses before promotion.
Argo CD reconciles manifests; policy controllers validate that workloads request approved accelerator shapes and signed images.
Observability and capacity planning
Accelerators are expensive. We monitor:
- GPU/FPU utilization via DCGM or vendor APIs, aggregated by tenant.
- MicroVM lifecycle metrics (startup time, snapshot restore, failure rate).
- Thermal and power telemetry to avoid hardware throttling.
Dashboards expose cost per minute per accelerator, driving FinOps conversations. Incident runbooks include microVM-specific probes to diagnose kernel-level faults.
Developer experience
We provide:
- Self-service templates in Backstage to request accelerator-backed environments.
- Local emulation using containerized GPU simulators or QEMU, enabling early testing without hardware.
- Feature flag integration so teams can roll out accelerated features gradually and revert quickly.
Feedback loops with ML engineers ensure the platform evolves with their needs.
MicroVM-orchestrated accelerators deliver the density, security, and developer delight modern platforms demand. When combined with GitOps, observability, and policy, they transform hardware from a bottleneck into a competitive advantage.