Departments / infrastructure
Infrastructure
Monitoring, log aggregation, SSL certs, network diagnostics, backups, cluster health.
7 skills · 1 orchestrator
skillskit install infrastructure installs just this department into ~/.claude/skills/ —
need the CLI? install it first. Task skills
backup-strategy
Use when a user wants to design or audit a backup posture across databases, object storage, and Kubernetes state; targets 3-2-1 (3 copies, 2 media, 1 offsite); sets RPO/RTO per workload tier; schedules Velero for K8s state and logical + physical backups for databases; enforces monthly restore tests. Produces a backup inventory, schedules, and a restore-test calendar.
cluster-health
Use when a user wants a Kubernetes cluster health check, says "is the cluster healthy", "something is off with the cluster", inherits an unfamiliar cluster, or is triaging an ongoing incident. Walks node conditions, control-plane components, resource pressure, critical DaemonSets, pod lifecycle states, and recent events, then produces a severity-ranked issue list.
log-aggregation
Use when a user wants to centralise Kubernetes logs, install Loki + Promtail or ELK (Elasticsearch + Logstash/Fluent Bit + Kibana), configure retention, wire log shipping from pods, or tune label/index hygiene. Picks the lightweight (Loki) or heavyweight (ELK) stack based on scale and budget, installs, validates ingestion, and produces a LogQL or KQL query cheat sheet.
monitoring-setup
Use when a user wants to provision Kubernetes observability, install Prometheus/Grafana/Alertmanager, wire ServiceMonitors, import Golden Signal dashboards, or configure alert routing to Slack/PagerDuty. Installs kube-prometheus-stack via Helm, applies ServiceMonitors, loads dashboards for latency/traffic/errors/saturation, and commits Alertmanager routes.
network-diagnostics
Use when a user reports connectivity failures, "can't reach X", DNS issues, TLS handshake errors, timeouts, or suspected firewall/NetworkPolicy problems. Walks a layered flow from DNS to TCP to TLS to application, audits K8s NetworkPolicy, cloud firewall / NSG rules, MTU, and emits a structured diagnosis with the exact failing layer and fix.
ssl-certificate-manager
Use when a user wants to audit TLS certificates across a Kubernetes estate, migrate to cert-manager with Let's Encrypt (HTTP-01 or DNS-01), set up expiry alerts (≤30d warning / ≤7d critical), or rotate certs without downtime. Runs a cert inventory, issues / renews via cert-manager, and validates the ingress still serves the new chain.
Workflow orchestrators
Orchestrators chain the task skills above into an end-to-end flow. Invoke them the
same way as any other skill — they declare chains: in frontmatter, which
means tooling can pass artifacts between steps automatically.