Departments / devops / release-to-prod

release-to-prod orchestrator

Use when a merged PR needs to be shipped to production end-to-end. Chains CI verification, canary deploy, SLO-driven auto-rollback, and changelog generation into a single auditable release run.

Department

DevOps

Safety

destructive
Destructive

Supported stacks

Stack-agnostic — no detection required.

Produces

devops/reports/release-<version>.md

Consumes

  • devops/status/ci-green.json
  • devops/status/deploy-result.json

When to use

Do not use for dev/staging-only deploys (call deploy directly), for emergency kubectl rollout undo in the middle of an active incident (call incident-response directly), or when the commit has not yet merged to the release branch.

Chained skills

Executed strictly in this order:

  1. pipeline-builder — read-only check that the CI pipeline for the target SHA finished green. No mutation; consumes GitHub Actions / GitLab CI / Azure DevOps status and emits devops/status/ci-green.json.
  2. deploy — performs the canary-aware rollout to the target environment (5% -> 25% -> 100%) with SLO gates between each step. Emits devops/status/deploy-result.json.
  3. incident-response — invoked ONLY if the SLO gate inside deploy trips. Runs auto-rollback, opens an incident, and drafts the postmortem skeleton with the offending PR/commit pre-populated.
  4. changelog — enumerates the PRs merged since the previous release tag, groups them by type (feat/fix/chore/docs), and renders user-facing release notes.

Inputs

Outputs

Tool dependencies

Procedure

1. Validate inputs

test -n "$VERSION" && test -n "$SHA" && test -n "$ENVIRONMENT"
git cat-file -e "$SHA" || { echo "SHA not found"; exit 1; }
git tag --list | grep -qx "$VERSION" && { echo "Tag already exists"; exit 1; }

Abort if any assertion fails. Print the unresolved input and stop.

2. CI gate (chain: pipeline-builder)

Read-only status check for the SHA. No pipeline YAML is generated here — only verification.

gh api "repos/${ORG}/${REPO}/commits/${SHA}/check-runs" \
  --jq '{sha: "'${SHA}'", runs: [.check_runs[] | {name, status, conclusion}]}' \
  > devops/status/ci-green.json

jq -e '.runs | all(.conclusion == "success")' devops/status/ci-green.json \
  || { echo "CI not green for $SHA"; exit 2; }

If any check failed or is still in progress, halt the orchestrator. Write the failing check name into the release report under a Blocked header and exit with code 2.

3. Tag the release

git tag -a "$VERSION" "$SHA" -m "Release $VERSION"
git push origin "$VERSION"

Skipped if the chart deployment uses digest pinning and the caller does not want a git tag yet (dry-run mode).

4. Canary deploy (chain: deploy)

Invoke deploy with strategy: canary and the caller-supplied SLO and steps. Each step runs the SLO query configured inside deploy; a breach triggers abort + undo locally, and the orchestrator catches the non-zero exit.

./scripts/check-slo.sh is a project-side script (not shipped by this skill); see deploy section 7 for the query contract it must satisfy.

for step in "${CANARY_STEPS[@]}"; do
  START=$(date -u +%FT%TZ)
  kubectl -n "$ENVIRONMENT" argo rollouts set image "$SERVICE" \
    "$SERVICE=ghcr.io/${ORG}/${SERVICE}@${DIGEST}"
  kubectl -n "$ENVIRONMENT" argo rollouts promote "$SERVICE"
  kubectl -n "$ENVIRONMENT" argo rollouts status "$SERVICE" --timeout 10m

  if ! ./scripts/check-slo.sh "$SERVICE" "$step"; then
    jq -n --arg s "$step" --arg t "$START" \
      '{aborted_at_step: $s, started: $t, state: "slo_breach"}' \
      > devops/status/deploy-result.json
    SLO_BREACHED=1
    break
  fi
  echo "step $step ok at $(date -u +%FT%TZ)"
done

5. Incident branch (chain: incident-response)

Only runs if step 4 set SLO_BREACHED=1.

if [ "${SLO_BREACHED:-0}" = "1" ]; then
  kubectl -n "$ENVIRONMENT" argo rollouts abort "$SERVICE"
  kubectl -n "$ENVIRONMENT" argo rollouts undo "$SERVICE"
  kubectl -n "$ENVIRONMENT" rollout status deploy/"$SERVICE" --timeout=10m

  INC_ID="INC-$(date -u +%Y-%m-%d)-$(printf '%02d' "$(ls incidents/ 2>/dev/null | grep "$(date -u +%Y-%m-%d)" | wc -l | awk '{print $1+1}')")"
  # Hand off to incident-response with pre-populated context:
  #   - service, version, SHA, breached metric, breach timestamp
  #   - the set of PRs between previous_release_tag..SHA (suspect list)
  # incident-response writes incidents/$INC_ID.md
  exit 3
fi

The orchestrator stops here on a rollback path — the changelog step is skipped because the release was aborted. The release report is still written and marked ABORTED.

6. Changelog (chain: changelog)

Only runs on the success path.

PRS=$(gh pr list --state merged --search \
  "merged:>$(git log -1 --format=%cI "$PREVIOUS_RELEASE_TAG")" \
  --json number,title,author,labels --limit 200)

echo "$PRS" | jq '.' > devops/status/prs-in-release.json
# changelog skill reads prs-in-release.json and renders:
#   - Features
#   - Fixes
#   - Chores / internal
# into devops/reports/release-<version>.md under the "## Changes" section.

7. Write the release report

devops/reports/release-<version>.md always includes:

Examples

Example 1 — clean release of v2.14.0

Inputs:

environment: prod-us-east-1
version: v2.14.0
sha: 9a3f1c8b7e2d4f5a6c8b9d1e2f3a4b5c6d7e8f90
service: checkout-api
canary_steps: [5, 25, 100]
slo: { error_rate_max: 0.005, p95_latency_ms: 300, window_minutes: 5 }
previous_release_tag: v2.13.2
approvers: ["@acme/sre"]

Observed flow:

Example 2 — canary-triggered rollback at 25%

Inputs identical to Example 1 except SHA and version (v2.14.1). During the 25% stage, the SLO gate observes:

error_rate   = 0.032   # threshold 0.005
p95_latency  = 0.287s  # threshold 0.300s (still OK)

Orchestrator path:

Constraints

Quality checks

Customise for your organisation

release-to-prod

The LLM will rewrite this skill for your environment. Your API key and form inputs stay in your browser — only the skill and your environment go to OpenRouter.

One line. Be specific — cloud, language, framework, orchestrator.

Free text that steers the rewrite. Leave blank if nothing specific.

cost estimate: