Supply-chain checks are infamous for slowing teams down. SBOM generation, vulnerability scans, policy evaluations—if they produce noise, developers bypass them. But when checks connect to observability and SLOs, they become trusted gates. Projects like Keptn, Harness SRM, and Dynatrace’s Cloud Automation have shown the way: deploy decisions should be evidence-driven.
Here is how we make GitOps supply chains both secure and fast.
Capture evidence for every change
We instrument pipelines to attach evidence metadata to artifacts:
- SBOMs and vulnerability scan results (Trivy, Grype) stored as OCI artifacts alongside the image.
- Test outcomes from unit, integration, chaos, and performance stages.
- SLO snapshots capturing burn-rate, latency, and error metrics before and after deployment.
Evidence lives in an artifact catalog (Harbor, Artifactory) tagged with Git SHA, service, and environment. Keptn’s evidence spec inspired our metadata schema.
Wire evidence into GitOps decisions
Argo CD sync waves call out to a decision service (Keptn or custom Lambda) that:
- Fetches evidence for the artifact referenced by the manifest.
- Evaluates policy rules (OPA) covering CVSS thresholds, unresolved vulnerabilities, or missing attestations.
- Queries observability backends for live SLO state. If burn rate is high, the deploy pauses.
- Returns go/no-go verdicts with human-readable reasons.
Verdicts appear as PR comments and Slack updates. If a deploy pauses, engineers know exactly which evidence failed.
Automate post-deploy verification
Once Argo completes a sync, we trigger:
- Keptn evaluation sequences comparing pre/post-deploy metrics (latency, error rate, resource usage).
- Feature flag rollouts (LaunchDarkly) that gradually expose functionality while monitoring user-centric KPIs.
- OpenTelemetry trace diffing to spot regressions in critical code paths.
Results feed back into the evidence catalog. Future deployments reference historical performance to highlight risk trends.
Close the loop with SLOs
Every deploy attaches an SLO delta. We built a “deploy impact” dashboard showing:
- Error budget consumed per release.
- Time to detect anomalies.
- Rollbacks triggered by evidence failures.
This data informs product conversations. If a feature repeatedly erodes SLOs, product leads see it instantly—mirroring the telemetry-driven culture advocated by Honeycomb and Google.
Keep developers happy
We guard against friction by:
- Caching scan results for unchanged dependencies.
- Providing fast feedback (~5 minutes) for evidence checks using parallel pipelines.
- Allowing documented, time-bound exceptions through Git PRs with explicit approvals.
When developers understand the “why” and experience minimal delay, they embrace supply-chain checks rather than fighting them.
Evidence-driven GitOps blends security and reliability without sacrificing speed. It is the antidote to checkbox compliance—a system where data, not intuition, guides every deploy decision.