Claude-skill-registry fix-ci
Fetch GitHub CI failure information, analyze root causes, reproduce locally, and propose a fix plan. Use `/fix-ci` for current branch or `/fix-ci <run-id>` for a specific run.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/fix-ci" ~/.claude/skills/majiayu000-claude-skill-registry-fix-ci && rm -rf "$T"
skills/data/fix-ci/SKILL.mdFix CI Skill
Automates CI troubleshooting by fetching GitHub Actions failures, analyzing logs, reproducing issues locally, and creating a fix plan for user approval.
Execution Workflow
Step 1: Prerequisites Check
Verify the GitHub CLI is installed and authenticated:
gh --version && gh auth status
If gh is not installed:
- Inform user: "GitHub CLI is required. Install with:
"brew install gh - Exit gracefully
If not authenticated:
- Inform user: "Please authenticate with:
"gh auth login - Exit gracefully
Step 2: Parse Arguments
Determine the mode based on arguments:
- No arguments (
): Fetch failures for the current branch only/fix-ci - With run-id (
): Fetch specific run (bypasses branch scoping)/fix-ci <run-id>
Step 3: Fetch Failed Run
Default mode (current branch):
BRANCH=$(git branch --show-current) gh run list --branch "$BRANCH" --status failure --limit 1 --json databaseId,name,headBranch,workflowName,createdAt
Specific run mode:
gh run view <run-id> --json databaseId,name,headBranch,workflowName,jobs,conclusion
If no failures found:
- Report: "No failed runs found for branch
. CI is green!"$BRANCH - Optionally show recent successful runs:
gh run list --branch "$BRANCH" --limit 3 --json databaseId,conclusion,workflowName,createdAt
- Exit gracefully
Step 4: Get Failure Details
Once a failed run is identified, gather comprehensive details:
RUN_ID=<the-run-id> # Get failed jobs with their steps gh run view $RUN_ID --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {name, conclusion, steps: [.steps[] | select(.conclusion == "failure")]}' # Get failed step logs (critical for debugging) gh run view $RUN_ID --log-failed 2>&1 | head -500 # Get verbose run info gh run view $RUN_ID --verbose
Log handling:
- Truncate logs to 500 lines to avoid context overflow
- Note to user: "Showing first 500 lines of failed logs. Full logs available on GitHub."
Step 5: Download Artifacts (if available)
Attempt to download any debug artifacts:
# Try common artifact names - failures are OK (not all runs have artifacts) gh run download $RUN_ID -n "coverage" -D /tmp/ci-debug/ 2>/dev/null || true gh run download $RUN_ID -n "test-results" -D /tmp/ci-debug/ 2>/dev/null || true gh run download $RUN_ID -n "logs" -D /tmp/ci-debug/ 2>/dev/null || true
If artifacts downloaded, read them for additional context.
Step 6: Analyze Failure Type
Categorize the failure based on log patterns:
| Pattern | Failure Type | Root Cause Area |
|---|---|---|
, , | Test Failure | Specific test case |
, | Lint Error | Code style/formatting |
, | Import Error | Missing dependency |
, | Runtime Error | Type mismatch |
| Syntax Error | Invalid code |
| Assertion Failure | Test expectation mismatch |
, | Timeout | Performance/hang |
, | Permission Error | File/resource access |
, | Network Error | External service |
Extract key information:
- Failed test name/file (if applicable)
- Error message
- Stack trace location (file:line)
- Environment variables or config issues
Step 7: Map to Local Test Commands
Determine the appropriate local command based on the CI job:
| CI Workflow/Job | Local Command |
|---|---|
| |
(server) | |
(rag) | |
(config) | |
(runtime) | |
(python) | |
(go) | |
| |
| |
| |
For specific test failures, narrow down the command:
- Python:
cd <dir> && uv run pytest -v <test_file>::<test_name> - Go:
cd cli && go test -v -run <TestName> ./...
Step 8: Reproduce Locally
Run the mapped local command to confirm the failure reproduces:
# Example for Python test cd server && uv run pytest -v tests/test_api.py::test_health_check
Outcome A - Failure reproduces locally:
- Good! Continue to fix plan
- Report: "Successfully reproduced failure locally"
Outcome B - Failure does NOT reproduce locally:
- Note: "Could not reproduce locally. Possible causes:"
- Flaky test (timing-dependent)
- Environment difference (CI has different deps/config)
- Race condition
- Suggest: "Consider re-running CI with
"gh run rerun $RUN_ID - Ask user how to proceed (investigate further or skip)
Step 9: Analyze Root Cause
Based on the failure type and logs, identify:
- What failed: Specific test, lint rule, or build step
- Why it failed: The actual error condition
- Where to fix: File(s) and line(s) that need changes
- How to fix: Proposed changes
Use available tools to explore:
- Read the failing test file
- Read the code being tested
- Search for related patterns in the codebase
- Check recent changes that might have caused the failure
Step 10: Enter Plan Mode
Use
EnterPlanMode to create a formal fix plan. The plan should include:
# CI Fix Plan ## Problem Statement [Summary of the CI failure from logs] ## Failure Details - **Run ID**: <run-id> - **Workflow**: <workflow-name> - **Job**: <job-name> - **Error Type**: <categorized-type> ## Root Cause Analysis [Explanation of why the failure occurred] ## Affected Files - `path/to/file1.py` (line X) - `path/to/file2.py` (line Y) ## Proposed Changes ### Change 1: [Brief description] [Specific edit to make] ### Change 2: [Brief description] [Specific edit to make] ## Verification Steps 1. Run: `<local-test-command>` 2. Expected: All tests pass 3. Optional: Run full test suite with `<full-suite-command>` ## Notes - [Any caveats or considerations]
Step 11: User Approval Gate
Present the plan and wait for explicit user approval:
- User approves: Proceed to execute fixes
- User modifies: Incorporate feedback, update plan
- User rejects: Exit gracefully without changes
CRITICAL: Never make code changes without user approval.
Step 12: Execute Fix (after approval only)
- Make the proposed code changes using Edit tool
- Run local tests to verify the fix:
<local-test-command>
- Report results:
- Success: "Fix verified locally. Tests pass."
- Failure: "Fix did not resolve the issue. [details]"
IMPORTANT: Do NOT auto-commit changes. Leave committing to the user or
/commit-push-pr skill.
Error Handling
| Scenario | Action |
|---|---|
| gh CLI not installed | Direct user to install: |
| gh not authenticated | Direct user to: |
| No failures found | Report CI is green, exit gracefully |
| Rate limit exceeded | Suggest waiting or using |
| Run not found | Verify run ID, suggest to find valid IDs |
| Large logs (>500 lines) | Truncate, note full logs on GitHub |
| Local reproduction fails | Note as flaky/env issue, offer re-run option |
| Network errors | Suggest retry, check connection |
Output Format
On finding a failure:
CI Failure Found Run: #12345 (workflow-name) Branch: feature-branch Failed Job: test-python Error Type: Test Failure Analyzing logs... [Summary of failure] Reproducing locally... [Result] Entering plan mode to propose fix...
On success (after fix):
Fix Applied - Modified: path/to/file.py - Verification: Tests pass locally Next steps: - Review the changes - Run `/commit-push-pr` to commit and push - CI will re-run automatically on push
Notes for the Agent
- Always scope to current branch by default - Users expect
to fix their current work, not random failures/fix-ci - Truncate logs wisely - CI logs can be huge; extract the relevant error sections
- Reproduce before fixing - Don't propose fixes for issues that can't be reproduced
- Plan mode is mandatory - Always use EnterPlanMode before making changes
- Never auto-commit - The user controls when changes are committed
- Be specific in analysis - Generic advice isn't helpful; identify exact files and lines
- Handle flaky tests - If reproduction fails, acknowledge it might be flaky