Awesome-omni-skill ensure-prod
Verifies that the latest main commit is deployed to production. Checks CI pipeline, fixbot activity, and k8s pod status. Use when you want to confirm a deploy completed or troubleshoot why prod is behind.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/devops/ensure-prod" ~/.claude/skills/diegosouzapw-awesome-omni-skill-ensure-prod && rm -rf "$T"
skills/devops/ensure-prod/SKILL.mdEnsure Main is on Prod
Verify that the current HEAD of main is deployed to the production k8s cluster. If it's not, diagnose why and take corrective action.
Step 1: Compare git HEAD with what's actually running on prod
# Get current HEAD git rev-parse HEAD
IMPORTANT: You must verify against the actual running pods, not just Argo CD's sync status. Argo CD's
REVISION field reflects the git manifest revision, NOT the container image — a "Synced" app can still be running old pods if the image build hasn't finished or the rollout hasn't completed.
Check what's actually running using both of these methods (run in parallel):
-
Public build-info endpoint (most reliable — shows exactly what the running pod reports): Use WebFetch on
and extracthttps://abaci.one/api/build-info
andgit.commit
. This also showsgit.commitShort
,environment
, and build timestamp.git.isDirty -
Pod image inspection via Kubernetes MCP: Use
on one of the abaci-app pods in thekubectl_describe
namespace. Look at theabaci
field — the tag or digest tells you which build is running. Cross-reference with the image updater annotations if needed.Image
Match criteria: HEAD matches prod when the build-info endpoint's
git.commit equals git rev-parse HEAD. If they don't match, proceed to Step 2.
Step 2: Check CI pipeline history
# Get enough history to see the pattern — not just the latest run gh run list --branch main --limit 15 --json headSha,displayTitle,status,conclusion,createdAt,databaseId,name
Look at the full picture across recent runs:
- If the latest build succeeded AND prod is close behind: Argo CD image updater should pick it up within a few minutes. Check Step 4.
- If a build is in progress: Don't stop here. Look at the history of prior
runs:Build and Deploy- If prior builds were all succeeding: The current in-progress build is likely fine. Report status and wait.
- If prior builds were failing: The current in-progress build will almost certainly fail with the same error. Investigate the failures immediately (go to Step 2a) rather than waiting.
- If the latest completed build failed: Go to Step 2a immediately.
Key insight: A single "build in progress" is only reassuring if recent history is clean. Multiple consecutive failures mean the current build is broken and needs fixing now, not after it fails again.
Step 2a: Diagnose build failures
When
Build and Deploy runs have been failing, fetch the failure logs:
gh run view <run-id> --log-failed
Use the most recently completed (failed)
Build and Deploy run ID. Look for the actual error — TypeScript errors, test failures, Docker build errors, etc.
- Fix the code error if it's a straightforward compile/type error. Commit and push the fix.
- If fixbot has already opened a PR: Review it (Step 3).
- If the error needs judgment: Explain what's failing and what fix is needed.
Step 3: Check fixbot activity
# Open fixbot issues gh issue list --label fixbot --state open # Recent fixbot PRs gh pr list --label fixbot --state open
- If a fixbot PR exists: Review it with
. If checks pass and the fix looks correct, tell the user it's ready to merge. If it needs judgment, explain what needs review.gh pr view <number> - If a fixbot issue exists but no PR yet: Fixbot is still working on it. Report the issue and wait.
- If no fixbot activity: Check if the diagnostic workflow ran:
. If it didn't, the failure may be too new or fixbot itself failed. Investigate the CI failure manually.gh run list --workflow "Diagnose CI Failure" --limit 3
Step 4: Check Argo CD image updater
If CI passed but pods haven't rolled, check the image updater:
Use the Kubernetes MCP to:
- Check
for recent activitykubectl logs -n argocd -l app.kubernetes.io/name=argocd-image-updater --tail=30 - Check
for sync statuskubectl get applications -n argocd
If the image updater hasn't picked up the new image yet, it may just need a few more minutes. Report the status.
Step 5: Verify pod rollout
Once pods are rolling, monitor via the Kubernetes MCP:
— check that pods are running the new revisionkubectl get pods -n abaci -l app=abaci-app- If pods are in CrashLoopBackOff or ImagePullBackOff, report the error
Step 6: Report
Provide a clear summary:
- Current HEAD SHA (short)
- Prod SHA (from build-info endpoint, short)
- Argo CD revision (short) — note if this differs from what pods are actually running
- CI status (passed/failed/in-progress)
- Fixbot status (if relevant)
- Pod status
- Any build metadata issues (e.g. environment showing "development" instead of "production", dirty flag set incorrectly)
Edge case: fixbot committed directly to main
If you discover fixbot committed directly to main (check
git log --oneline -5 for commits by github-actions bot that aren't merge commits), push an empty commit to trigger a clean CI run:
git pull git commit --allow-empty -m "ci: trigger build after fixbot commit <sha>" git push
Then re-enter this procedure from Step 2.