Cloudythings Blog

Observability-Driven GitOps Supply Chains

Combining Argo CD, Keptn, and evidence-driven observability to turn supply-chain checks into fast, reliable deploy gates.

May 02, 2023 at 10:07 AM EST 12 min read
GitOpsSupply Chain SecurityObservabilityCI/CDSRE
DevOps engineers reviewing supply-chain observability dashboards
Image: Kaleidico / Unsplash

Supply-chain checks are infamous for slowing teams down. SBOM generation, vulnerability scans, policy evaluations—if they produce noise, developers bypass them. But when checks connect to observability and SLOs, they become trusted gates. Projects like Keptn, Harness SRM, and Dynatrace’s Cloud Automation have shown the way: deploy decisions should be evidence-driven.

Here is how we make GitOps supply chains both secure and fast.

Capture evidence for every change

We instrument pipelines to attach evidence metadata to artifacts:

  • SBOMs and vulnerability scan results (Trivy, Grype) stored as OCI artifacts alongside the image.
  • Test outcomes from unit, integration, chaos, and performance stages.
  • SLO snapshots capturing burn-rate, latency, and error metrics before and after deployment.

Evidence lives in an artifact catalog (Harbor, Artifactory) tagged with Git SHA, service, and environment. Keptn’s evidence spec inspired our metadata schema.

Engineer correlating deployment evidence with observability data
Photo by Austin Distel on Unsplash. Evidence must be easy to correlate.

Wire evidence into GitOps decisions

Argo CD sync waves call out to a decision service (Keptn or custom Lambda) that:

  1. Fetches evidence for the artifact referenced by the manifest.
  2. Evaluates policy rules (OPA) covering CVSS thresholds, unresolved vulnerabilities, or missing attestations.
  3. Queries observability backends for live SLO state. If burn rate is high, the deploy pauses.
  4. Returns go/no-go verdicts with human-readable reasons.

Verdicts appear as PR comments and Slack updates. If a deploy pauses, engineers know exactly which evidence failed.

Automate post-deploy verification

Once Argo completes a sync, we trigger:

  • Keptn evaluation sequences comparing pre/post-deploy metrics (latency, error rate, resource usage).
  • Feature flag rollouts (LaunchDarkly) that gradually expose functionality while monitoring user-centric KPIs.
  • OpenTelemetry trace diffing to spot regressions in critical code paths.

Results feed back into the evidence catalog. Future deployments reference historical performance to highlight risk trends.

Close the loop with SLOs

Every deploy attaches an SLO delta. We built a “deploy impact” dashboard showing:

  • Error budget consumed per release.
  • Time to detect anomalies.
  • Rollbacks triggered by evidence failures.

This data informs product conversations. If a feature repeatedly erodes SLOs, product leads see it instantly—mirroring the telemetry-driven culture advocated by Honeycomb and Google.

Keep developers happy

We guard against friction by:

  • Caching scan results for unchanged dependencies.
  • Providing fast feedback (~5 minutes) for evidence checks using parallel pipelines.
  • Allowing documented, time-bound exceptions through Git PRs with explicit approvals.

When developers understand the “why” and experience minimal delay, they embrace supply-chain checks rather than fighting them.

Evidence-driven GitOps blends security and reliability without sacrificing speed. It is the antidote to checkbox compliance—a system where data, not intuition, guides every deploy decision.