Cloudythings Blog

Distroless Machine Learning Pipelines Without the Headaches

Applying distroless principles, attestation, and reproducibility to ML model serving and training so security and data science stop fighting.

March 07, 2023 at 09:49 AM EST 12 min read
DistrolessMachine LearningSupply Chain SecurityKubernetesDevOps
Engineers tuning machine learning pipelines with charts projected on a wall
Image: Franki Chamaki / Unsplash

Machine learning pipelines love dependencies. CUDA drivers, Python wheels, system packages—everything feels mission critical, and everything bloats container images. After SolarWinds and Log4Shell, security teams started demanding distroless builds and provenance for ML too. Data scientists balked: “How do we debug TensorFlow without bash?” The answer lies in embracing distroless principles and building supportive tooling.

We helped a fintech client refactor its ML serving stack using distroless images, Sigstore signatures, and reproducible builds. The journey unearthed best practices that apply to any ML org aiming for supply-chain hardening without crushing velocity.

Start with environment contracts

We defined two personas:

  • Training jobs (batch, GPU-heavy) running in Kubeflow or Argo Workflows.
  • Serving endpoints (REST/gRPC) behind Istio or AWS App Mesh.

Each persona received a contract covering base images, runtime libraries, observability, and debugging workflows. Contracts set expectations before code changes land.

Build reproducible images

Our pipeline uses:

  • Bazel with rules_docker to build multi-stage images deterministically.
  • Chainguard’s Wolfi base layers for glibc-compatible distroless images, providing CA certs and timezone data.
  • Python dependency locking via Poetry exports and UV (a fast installer), ensuring deterministic wheels.
  • GPU builds using NVIDIA’s container toolkit in a builder stage, copying only required drivers to the final image.

We version everything with Git tags. Rebuilds produce identical digests, satisfying auditors and enabling quick rollback.

ML engineers discussing distroless image architecture at a whiteboard
Photo by You X Ventures on Unsplash. Start with architecture, not Dockerfiles.

Ship debugging companions

Distroless does not mean debugging dies. We provide:

  • Ephemeral debug images with shells and tooling, accessible only in development namespaces. Kubectl debug attaches the companion image with temporary privileges.
  • Live debugging notebooks (Jupyter, VS Code) running in sandboxed clusters with access to the same feature stores. They use distroless runtime images but layer user tooling through Sidecar containers.
  • Extensive observability: OpenTelemetry traces, Prometheus metrics, and structured logs highlight tensor shapes, batch sizes, and latency contributions.

Security gets minimal production images; data scientists retain ergonomic tooling in controlled environments.

Sign and attest everything

Each image ships with:

  • Cosign signatures tied to workload identities (GitHub Actions OIDC).
  • SLSA provenance describing source repo, commit, dependencies, and build parameters.
  • SBOMs (CycloneDX) listing Python wheels, CUDA libraries, and OS components.

Kubernetes admission controllers (Kyverno with Sigstore integration) enforce signature verification. Serving clusters reject unsigned or tampered images.

Automate reproducible pipelines

We integrate distroless builds into MLops tooling:

  • Kubeflow Pipelines run training steps in distroless images; metadata tracks digests and signatures.
  • MLflow / Weights & Biases models store image digests alongside model versions.
  • Feature stores (Feast) reference signed transformation containers, ensuring data lineage and code provenance align.

Promotion to production happens via GitOps: Argo CD updates manifests referencing the signed image digest. No kubectl set image hacks allowed.

Observe performance and cost

Distroless images shrink weight by 50–70%, accelerating cold starts and saving egress costs. We monitor:

  • Container startup time per endpoint to ensure no regression.
  • GPU utilization using DCGM exporters—lean images leave more VRAM for models.
  • Memory and CPU footprints to catch missing libraries early.

When developers need a missing tool (e.g., tcpdump), they request it via PR. Security reviews the change, we update the contract, and automation rebuilds the image.

Coach the culture

We host workshops based on Chainguard and Google security blogs, explaining why distroless matters. We pair data scientists with platform engineers to refactor Dockerfiles. We run incident reviews focusing on supply-chain gaps and celebrate teams that shave hundreds of MBs off their images.

The result: security trusts the pipeline, SREs enjoy deterministic rollouts, and data scientists keep shipping. Distroless ML is not a constraint; it is a catalyst for better engineering discipline.