When to use
- User says “audit our certs”, “some cert is expiring”, “install cert-manager”, “switch to Let’s Encrypt”, “rotate the ingress cert”, or asks about HTTP-01 vs DNS-01.
- CI or monitoring flagged a cert expiring within 30 days.
- Migrating from manual cert upload (Secret holding
tls.crt/tls.key) to automated issuance. - A wildcard cert is required (forces DNS-01).
- A mutual-TLS setup needs an internal CA (private PKI; this skill can drive cert-manager’s CA issuer).
Do not use this skill for code-signing certs, SSH keys, or client certs unrelated to TLS termination.
Inputs
- Kubeconfig context; list of target namespaces.
- List of hostnames that must be covered (may include wildcards).
- Issuance strategy: HTTP-01 (public, per-host), DNS-01 (wildcards or private), or an internal CA.
- DNS provider (Route53, Cloud DNS, Cloudflare, Azure DNS) and credentials for DNS-01.
- ACME contact email.
- Alert destinations: reuse the Alertmanager config from
monitoring-setup.
Outputs
- A cert inventory CSV/table:
host | issuer | NotBefore | NotAfter | daysLeft | source (cert-manager | manual | external) | namespace | secret. - cert-manager installed and healthy.
ClusterIssuerresources:letsencrypt-staging,letsencrypt-prod, optionallyinternal-ca.Certificateresources for every managed host.- Alertmanager rules for
CertExpiringWarn(<30d) andCertExpiringSoon(<7d) — already defined inmonitoring-setup/references/alertmanager-rules-template.yaml. - A rotation runbook: how to force-renew without downtime, how to roll back.
Tool dependencies
kubectl,helm,openssl,dig,curl,jq,yq.cmctl(cert-manager CLI) for issue/renew/status.- Kubernetes MCP, filesystem MCP, optionally AWS/GCP/Azure MCP for DNS-01 IAM setup.
Procedure
-
Detect the stack. Run these read-only commands and record findings:
kubectl config current-context— confirm a Kubernetes cluster is addressable. No cluster → stop; this skill requireskubernetes.kubectl get crd certificates.cert-manager.io 2>/dev/null— cert-manager installed?helm list -A 2>/dev/null | grep -Ei 'cert-manager|jetstack'— cert-manager via Helm?kubectl get ingressclass 2>/dev/null— what ingress controller is in use (nginx, traefik, istio, contour)?kubectl get secrets -A --field-selector type=kubernetes.io/tls -o name 2>/dev/null | head— existing TLS secrets (manual uploads vs managed)?ls aws-load-balancer-controller/ 2>/dev/nullorkubectl get crd targetgroupbindings.elbv2.k8s.aws 2>/dev/null— AWS ALB/ACM in the mix instead?
Confirm the detected stack is
cert-manager+k8s. If the cluster uses AWS ACM directly via ALB, GCP managed certs via GKE ingress, Azure Key Vault certs, or a commercial CA workflow (Sectigo / DigiCert via manual CSR), STOP and report the detected stack to the user — those are different issuance/rotation surfaces and cert-manager config here would be misleading. -
Inventory. Build the full cert list.
# cert-manager-managed certs kubectl get certificates -A -o json | jq -r ' .items[] | [.metadata.namespace, .metadata.name, .spec.secretName, (.status.notAfter // "pending")] | @tsv' # TLS secrets regardless of source kubectl get secrets -A --field-selector type=kubernetes.io/tls -o json | jq -r ' .items[] | [.metadata.namespace, .metadata.name, ((.data."tls.crt" | @base64d) | split("-----BEGIN CERTIFICATE-----")[1])] | @tsv' # Live-edge scan per hostname (what a client actually sees) for h in api.example.com app.example.com; do echo | openssl s_client -connect "$h:443" -servername "$h" 2>/dev/null \ | openssl x509 -noout -issuer -subject -dates -ext subjectAltName doneFor each TLS secret, decode and inspect:
kubectl -n <ns> get secret <name> -o json \ | jq -r '.data["tls.crt"]' | base64 -d \ | openssl x509 -noout -issuer -subject -dates -ext subjectAltNameCompare to the live edge. Mismatches (ingress serving a stale cert) usually mean the ingress controller needs a reload.
-
Install cert-manager if missing:
helm repo add jetstack https://charts.jetstack.io helm repo update helm upgrade --install cert-manager jetstack/cert-manager \ --namespace cert-manager --create-namespace \ --version v1.15.3 --set crds.enabled=true --waitValidate:
kubectl -n cert-manager get pods,cmctl check api. -
Create ClusterIssuers. Start with staging to avoid the Let’s Encrypt rate limit while iterating.
apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-staging spec: acme: server: https://acme-staging-v02.api.letsencrypt.org/directory email: ops@example.com privateKeySecretRef: { name: letsencrypt-staging-key } solvers: - http01: ingress: { ingressClassName: nginx } --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: ops@example.com privateKeySecretRef: { name: letsencrypt-prod-key } solvers: - http01: ingress: { ingressClassName: nginx } selector: dnsZones: ["example.com"] - dns01: route53: region: eu-west-1 hostedZoneID: Z123ABC selector: dnsZones: ["example.com"]HTTP-01 requires the ingress to be reachable from the public internet on :80. DNS-01 requires IRSA / Workload Identity binding to a DNS-write IAM role; never use static long-lived AWS keys.
-
Issue certs. For each hostname group:
apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: api-example-com namespace: app spec: secretName: api-example-com-tls issuerRef: { name: letsencrypt-prod, kind: ClusterIssuer } dnsNames: - api.example.com privateKey: algorithm: ECDSA size: 256 rotationPolicy: Always renewBefore: 720h # 30dReference the secret from the Ingress:
spec: tls: - hosts: [api.example.com] secretName: api-example-com-tlsWait for
Ready=True:kubectl -n app wait certificate/api-example-com --for=condition=Ready --timeout=5m cmctl status certificate -n app api-example-com -
For wildcards (
*.example.com): HTTP-01 cannot issue wildcards; DNS-01 only. Use thedns01solver above and confirm the_acme-challenge.example.comTXT record is being created and deleted by cert-manager (check withdig +short TXT _acme-challenge.example.com). -
Alerts. Apply the rules from
monitoring-setup/references/alertmanager-rules-template.yaml(thetlsgroup). Add a blackbox-exporter probe per hostname:apiVersion: monitoring.coreos.com/v1 kind: Probe metadata: name: public-hosts namespace: monitoring spec: jobName: blackbox-https interval: 60s module: http_2xx prober: url: blackbox-exporter.monitoring.svc:9115 targets: staticConfig: static: - https://api.example.com - https://app.example.com -
Zero-downtime rotation. To force a renew:
cmctl renew -n app api-example-comcert-manager writes a new
tls.crt/tls.keyinto the existing secret. Ingress controllers (nginx, Traefik, HAProxy, Istio) hot-reload when the secret’s data changes; no pod restart required. Validate:echo | openssl s_client -connect api.example.com:443 -servername api.example.com 2>/dev/null \ | openssl x509 -noout -datesConfirm
notAfteradvanced. If an ingress is pinned to a mounted file (e.g. Envoy withpath:instead ofsds:), rotation requires a pod restart — flag this during the inventory. -
Roll-back. Keep the prior secret for 7 days. cert-manager retains the previous issuance in
status.previousIssuedCertificatewhenrotationPolicy: Alwaysis set. To pin the old cert temporarily, revert the IngresssecretNameto the previous secret.
Examples
Happy path: HTTP-01 for 5 public hosts on nginx-ingress
Inventory found 5 manually-uploaded certs expiring within 45 days. After install:
ClusterIssuer: letsencrypt-prod (HTTP-01 via nginx)
Certificates issued: 5/5 Ready
Secrets rotated: 5 (zero-downtime; nginx hot-reload confirmed)
Alerts active: CertExpiringWarn (<30d), CertExpiringSoon (<7d)
Edge case: wildcard cert for a private zone
*.internal.example.com only resolves inside the VPC. HTTP-01 is impossible (no public reachability). Approach:
- Create a Route53 hosted zone and delegate
internal.example.comto it from the public zone. - Attach an IRSA role with
route53:ChangeResourceRecordSetsscoped to the zone ID. - Use DNS-01 solver as shown above; certificate issues even though the service itself is VPC-only.
- Alternative for air-gapped clusters: use cert-manager’s
CAissuer with an internal root; distribute the root to clients via MDM / Trust Bundle. No Let’s Encrypt involved.
Constraints
- Do not produce output for a stack outside
supported-stacks. If detection shows AWS ACM+ALB, GCP managed certs, Azure Key Vault certs, or a commercial-CA manual-CSR workflow as the issuance surface, STOP and report the detected stack. cert-manager YAML in those environments either does nothing or competes with the real issuance path. - Never use ACME HTTP-01 for a wildcard cert; the protocol does not support it. DNS-01 only.
- Never commit DNS-provider long-lived credentials. Always use IRSA (EKS), Workload Identity (GKE), or Managed Identity (AKS) for the DNS-01 solver.
- Never issue directly against
letsencrypt-prodwhile iterating; stage first. The Let’s Encrypt rate limit is 50 certs per registered domain per week. - Never reuse the same
privateKeySecretRefacross staging and prod issuers. - Never set
renewBeforeshorter than 360h (15d); renewal failures need buffer time. - Never remove the previous secret the same day you rotate; keep it for at least 7 days for rollback.
Quality checks
- Every production hostname has a
CertificatewithReady=TrueandnotAftermore than 30 days out. cmctl check apiexits 0.- The live-edge scan (
openssl s_client) issuer matches the cert-manager issuer for every managed host. - Alertmanager shows
CertExpiringWarn/CertExpiringSoonrules loaded (amtool config routes test severity=critical alertname=CertExpiringSoon). kubectl get events -n cert-manager --sort-by=.lastTimestamp | tailshows noFailedorErrorevents in the last 24 h.- Blackbox-exporter
probe_successfor every host == 1. - DNS-01 credentials are provided via IRSA / Workload Identity, not static keys (verified by inspecting the solver pod’s environment).
- The rotation runbook has been executed once end-to-end during the install (force-renew, verify new
notAfter, verify no 5xx during the window).