Progressive delivery used to feel like a luxury feature for unicorn startups. Then the DORA 2021 report and a wave of battle stories on Medium—most notably from companies like Delivery Hero, Kameleoon, and LaunchDarkly—made it clear: without guardrails that meter change, SRE teams do not stand a chance of defending their error budgets. Rolling back a bad deploy is table stakes, but catching failing code before customers feel it is now the minimum viable reliability program.
This article breaks down how we build progressive delivery capabilities for Cloudythings clients who live under strict SLOs. It is a fusion of Argo Rollouts canaries, Flagger’s traffic shaping, feature-flag orchestrations, and the observability patterns that folks like Honeycomb and Slack keep writing about. The result is a playbook that a platform team can roll out in weeks, not quarters, while giving SREs the control levers they crave.
Start from the error budget math
Before shipping a single controller, we align on service level objectives. Every progressive delivery step must report directly against budget consumption. Google’s SRE workbook and Nobl9’s SLO Academy both emphasize that alerting on raw metrics—latency, error rate—lacks context. We go further by:
- Declaring rolling 30-day SLOs per critical user journey with multi-window burn-rate alerts (a practice recently detailed by Grafana Labs on their engineering blog).
- Capturing historical deployment impact. By tagging all releases in Honeycomb and New Relic, we quantify how much each deployment consumed from the budget.
- Agreeing on burn-rate thresholds that freeze deployment lanes. For example, if the two-hour burn rate exceeds 4×, progressive rollouts automatically pause.
The math becomes the north star. SREs are no longer begging teams to slow down—they are simply enforcing a contract that everyone understands.
Compose your control plane
Progressive delivery is seldom a single tool; it is plumbing. Our reference architecture blends:
- Argo Rollouts for Kubernetes services. It orchestrates blue/green, canary, and analysis phases. We lean on its analysis templates to bind Grafana, Kayenta, or Prometheus checks directly into the rollout.
- Flagger or Linkerd SMI for fine-grained traffic weighting, especially when service meshes (Istio, Linkerd) are at play. Flagger’s load-testing hooks let us run k6 or Vegeta scripts mid-rollout.
- LaunchDarkly / OpenFeature for edge-level gating. Feature flags let product teams meter functionality independent of infrastructure rollouts.
- Harness or Spinnaker when non-Kubernetes systems (data pipelines, serverless) need progressive control.
We package these into a GitOps-managed platform. Each team defines a rollout spec like:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: checkout-api
spec:
strategy:
canary:
canaryService: checkout-api-canary
stableService: checkout-api
steps:
- setWeight: 20
- pause: { duration: 10m }
- analysis:
templates:
- templateName: fourxx-rate
- templateName: p95-latency
- setWeight: 50
- pause: { duration: 20m }
- setWeight: 100
Each analysis template calls out to a Grafana query or Kayenta canary config. The template includes burn-rate guards so the rollout aborts if the error budget plummets faster than tolerated.
Instrument for decisions, not dashboards
Observability is what makes progressive delivery trustworthy. Drawing from Honeycomb’s “observability-driven development” writing and Slack’s release platform posts, we bake telemetry into the pipeline:
- Deployment markers: Every stage emits OpenTelemetry spans with attributes for rollout step, traffic weight, and Git SHA. Spans correlate to traces across the stack.
- Golden signals within the canary scope: We execute queries that isolate traffic hitting the canary subset—Grafana’s exemplar feature makes this trivial when paired with tracing.
- User experience metrics: We integrate Real User Monitoring (RUM) tools such as SpeedCurve or Akamai. Progressive delivery loses meaning if the UI team cannot see visitors’ perspectives.
- Synthetic probes: Flagger’s webhooks run Playwright or Cypress journeys against the canary. Results feed back into Argo’s analysis stage.
The acid test is whether an incident commander can open a single dashboard (usually in Grafana, Honeycomb, or Datadog) labeled “Rollout Checkout API – Step 2” and know exactly which metrics guard the go/no-go decision.
Integrate LaunchDarkly (or OpenFeature) from day zero
Feature flags are not an afterthought. Delivery leaders like Heidi Waterhouse and John Kodumal continue to stress that flags and progressive delivery are two halves of the same reliability story. We integrate LaunchDarkly/OpenFeature by:
- Linking flag toggles to SLO budgets. For high-impact features, we require “error budget aware” rules—if burn rate exceeds 2×, LaunchDarkly triggers an automation hook that disables the flag or routes to a safer cohort.
- Treating flags as infrastructure-as-code. Using Terraform or the LaunchDarkly CLI, we version flag definitions, ensuring review and rollback parity with application code.
- Creating flag health dashboards. We map flag adoption to on-call load. When a flag is toggled frequently, we treat it as a risk indicator and refine the rollout plan.
Flags also make it easy to run “dark launches.” We ship code behind a disabled flag, complete the infrastructure rollout, and then progressively ramp user traffic through LaunchDarkly. This two-phase pattern dramatically lowers the chance of cascading failure.
Automate policy and experimentation
Progressive delivery without guardrails is just delayed failure. We encode policy by:
- Using Open Policy Agent to require canary steps for high-risk services. OPA evaluates rollout manifests in CI and blocks merges if guardrails (analysis templates, pause durations) are missing.
- Enforcing error budget freezes through GitOps. If a service breaches budget thresholds, Argo CD injects a temporary freeze annotation, preventing further deploys until SREs clear the condition.
- Documenting experiment hypotheses. For each rollout, we expect a short hypothesis (e.g., “New gRPC marshalling reduces P95 latency by >10%”). Argo’s analysis reports include verdicts on these hypotheses, creating a feedback loop akin to what Etsy and Shopify have written about for years.
Run game days and retrospectives
Once the platform is live, we run progressive delivery game days. Inspired by the chaos engineering practices Netflix popularized, we simulate:
- Metric spikes to ensure analysis templates trigger aborts.
- Unreachable metric backends to validate that rollouts fail safe (pause instead of promoting bad code).
- Flag misconfigurations by flipping LaunchDarkly rules mid-rollout.
Every incident or aborted rollout gets a post-incident review. We capture metrics such as “time to detection,” “percent of rollouts aborted by automation vs. humans,” and “number of customer-impacting incidents during progressive phases.” Over time, these metrics demonstrate whether progressive delivery is saving error budgets or just adding ceremony.
Start small, then scale responsibly
The journey usually follows three phases:
- Pilot with one service and one team. Measure latency reduction, incident avoidance, and developer sentiment.
- Standardize by templatizing rollouts, policies, and dashboards. Publish a playbook so new services can onboard in <1 sprint.
- Evolve by layering advanced techniques—multi-armed bandit traffic routing (as Airbnb has blogged about), workload-specific SLOs, and machine-learning-driven anomaly detection.
Progressive delivery is not about slowing teams down; it is about maximizing safe throughput. That idea echoes across Medium, InfoQ, and CNCF blogs where reliability leads show their scars. When SREs anchor the conversation in error budgets and pair the right tools with crisp policy, progressive delivery becomes a force multiplier. Instead of arguing over freeze windows, teams celebrate fast deploys that prove they protect the customer experience.