Departments / devops / deploy

deploy

Use when rolling out a new version of a service to staging or production. Runs tests, builds and pushes a container, performs a blue/green or canary rollout with health gates, and rolls back automatically on SLO breach.

Department

DevOps

Safety

destructive
Destructive

Supported stacks

helm+k8skubernetes

When to use

Do not use for local kubectl apply against a dev cluster or for one-off hotfixes that bypass CI. For those, use kubectl rollout directly and document why.

Inputs

Outputs

Tool dependencies

Project scripts you supply

The procedure below shells out to two project-side scripts. They’re not shipped by this skill — drop them in your repo at ./scripts/ (or adjust the paths). Section 7 below documents the exact query contract check-slo.sh must satisfy; smoke.sh is a thin wrapper around your service’s existing health/smoke endpoint.

Procedure

0. Detect the stack

Before pre-flight, confirm this skill is the right tool for the target:

kubectl config current-context                                    # must return a context
kubectl auth can-i create deployments -n "$NAMESPACE" 2>/dev/null  # must be yes
helm version --short 2>/dev/null                                  # Helm 3.x
ls argocd/ fluxcd/ kustomize/ 2>/dev/null | head                  # GitOps layer in play?
grep -l 'serverless\|sam\|fargate' serverless.yml template.yaml 2>/dev/null | head  # non-K8s?

This skill supports helm+k8s and kubernetes canary / blue-green rollouts. If detection shows:

1. Pre-flight

kubectl config current-context
kubectl -n "$NAMESPACE" get deploy "$SERVICE" -o jsonpath='{.spec.template.spec.containers[0].image}'
helm -n "$NAMESPACE" history "$SERVICE"
git rev-parse --short HEAD

Abort if:

Walk the deployment-checklist.md reference and confirm every pre-deploy item.

2. Build and push the image

IMAGE="ghcr.io/${ORG}/${SERVICE}:sha-$(git rev-parse --short HEAD)"
docker buildx build \
  --platform linux/amd64 \
  --tag "$IMAGE" \
  --label org.opencontainers.image.revision="$(git rev-parse HEAD)" \
  --label org.opencontainers.image.source="https://github.com/${ORG}/${SERVICE}" \
  --provenance=true \
  --sbom=true \
  --push .

cosign sign --yes "$IMAGE"
DIGEST=$(cosign triangulate "$IMAGE" | sed 's/.*@//')

Pin by digest in the Helm values (image.digest: sha256:...), not by tag.

3. Run tests

make test
make integration-test
trivy image --exit-code 1 --severity HIGH,CRITICAL "$IMAGE"

Do not proceed if any step fails. Surface failing test names to the user.

4. Staging smoke

helm -n staging upgrade --install "$SERVICE" "$CHART_PATH" \
  -f values-staging.yaml \
  --set image.repository="ghcr.io/${ORG}/${SERVICE}" \
  --set image.digest="$DIGEST" \
  --atomic --timeout 5m --wait

kubectl -n staging rollout status deploy/"$SERVICE" --timeout=5m
./scripts/smoke.sh "https://staging.${SERVICE}.example.com"

5. Request prod approval

gh workflow run deploy-prod.yml \
  -f service="$SERVICE" \
  -f image_digest="$DIGEST" \
  -f strategy="$STRATEGY"

The production GitHub Environment must require at least one approver (see pipeline-builder reference).

6. Rollout

Canary (Argo Rollouts):

kubectl -n "$NAMESPACE" argo rollouts set image "$SERVICE" \
  "$SERVICE=${IMAGE}@${DIGEST}"

for step in 5 25 50 100; do
  kubectl -n "$NAMESPACE" argo rollouts promote "$SERVICE"
  kubectl -n "$NAMESPACE" argo rollouts status "$SERVICE" --timeout 10m
  ./scripts/check-slo.sh "$SERVICE" "$step" || {
    kubectl -n "$NAMESPACE" argo rollouts abort "$SERVICE"
    kubectl -n "$NAMESPACE" argo rollouts undo "$SERVICE"
    exit 1
  }
done

Blue/green with Helm:

helm -n "$NAMESPACE" upgrade --install "${SERVICE}-green" "$CHART_PATH" \
  -f "$VALUES_FILE" \
  --set image.digest="$DIGEST" \
  --set color=green \
  --atomic --timeout 10m --wait

./scripts/check-slo.sh "${SERVICE}-green" 100 || { helm uninstall "${SERVICE}-green" -n "$NAMESPACE"; exit 1; }

kubectl -n "$NAMESPACE" patch svc "$SERVICE" \
  -p '{"spec":{"selector":{"color":"green"}}}'

sleep 300  # soak
./scripts/check-slo.sh "$SERVICE" 100 || {
  kubectl -n "$NAMESPACE" patch svc "$SERVICE" -p '{"spec":{"selector":{"color":"blue"}}}'
  exit 1
}

helm -n "$NAMESPACE" uninstall "${SERVICE}-blue"

Rolling (simple services only):

helm -n "$NAMESPACE" upgrade --install "$SERVICE" "$CHART_PATH" \
  -f "$VALUES_FILE" --set image.digest="$DIGEST" \
  --atomic --timeout 10m --wait
kubectl -n "$NAMESPACE" rollout status deploy/"$SERVICE"

7. SLO gate (scripts/check-slo.sh)

Query Prometheus for the configured window. Example query contract:

error_rate   = sum(rate(http_requests_total{service="$SERVICE",status=~"5.."}[5m]))
             / sum(rate(http_requests_total{service="$SERVICE"}[5m]))
p95_latency  = histogram_quantile(0.95,
                 sum by (le) (rate(http_request_duration_seconds_bucket{service="$SERVICE"}[5m])))

Fail the gate if error_rate > slo.error_rate_max or p95_latency > slo.p95_latency_ms / 1000.

8. Rollback

helm -n "$NAMESPACE" history "$SERVICE"
helm -n "$NAMESPACE" rollback "$SERVICE" <previous-revision> --wait --timeout 10m
kubectl -n "$NAMESPACE" rollout status deploy/"$SERVICE"

If the rollout used Argo:

kubectl -n "$NAMESPACE" argo rollouts undo "$SERVICE"

Post the rollback reason and metric values to #deploys and page the service owner.

9. Post-deploy

Examples

Example 1 — Canary rollout of checkout-api to prod

Inputs:

service: checkout-api
namespace: prod-us-east-1
strategy: canary
chart_path: ./charts/checkout-api
values_file: values-prod.yaml
slo: { error_rate_max: 0.005, p95_latency_ms: 300, window_minutes: 5 }
canary_steps: [5, 25, 50, 100]
approvers: ["@acme/sre"]

Expected flow: build+sign image, staging deploy + smoke, request approval, promote canary through 5 -> 25 -> 50 -> 100, run SLO check after each step, emit deploy note.

Example 2 — Blue/green after schema migration

payments-api ships a new column. Migration has already run as a separate PR (expand phase). Deploy uses strategy: blue-green so both versions can read/write while the cut happens atomically at the Service selector. If p95 latency on green exceeds 300 ms during the 5-minute soak, selector is flipped back to blue and green is uninstalled.

Constraints

Quality checks

Customise for your organisation

deploy

The LLM will rewrite this skill for your environment. Your API key and form inputs stay in your browser — only the skill and your environment go to OpenRouter.

One line. Be specific — cloud, language, framework, orchestrator.

Free text that steers the rewrite. Leave blank if nothing specific.

cost estimate: