Claude-skill-registry agent-ops-recovery
Handle failures and errors during workflow. Use when build breaks, tests fail unexpectedly, or agent gets stuck. Semi-automatic recovery with user confirmation for destructive actions.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/agent-ops-recovery" ~/.claude/skills/majiayu000-claude-skill-registry-agent-ops-recovery && rm -rf "$T"
manifest:
skills/data/agent-ops-recovery/SKILL.mdsource content
Error Recovery workflow
Trigger conditions
Use this skill when:
- Build/lint fails unexpectedly after agent changes
- Tests fail that were passing in baseline
- Agent encounters ambiguity it cannot resolve
- Implementation is stuck or going in circles
Recovery procedure
Step 1: Diagnose (invoke debugging)
For non-trivial failures, invoke
:agent-ops-debugging
- Apply systematic debugging process:
- Reproduce the issue consistently
- Define expected vs actual behavior
- Form hypothesis about root cause
- Use debugging output to inform recovery decision
- If root cause unclear after initial analysis, continue debugging before recovery
Step 2: Assess rollback options
- Option A: Fix forward — issue is minor, can be resolved quickly
- Option B: Partial rollback — revert specific file(s) to last good state
- Option C: Full rollback — revert all agent changes since checkpoint
- Option D: Escalate — document the issue, mark task blocked, ask user
Step 3: Propose action
Present options to user with:
- What will be reverted/changed
- Risk assessment
- Recommendation
Step 4: Execute (with confirmation)
- For non-destructive actions (fix forward): proceed
- For destructive actions (rollback): ask user first
- Update
with recovery action taken.agent/focus.md
Destructive actions (require confirmation)
git reset
(discard changes)git checkout -- <file>git revert- Deleting files
- Overwriting files with previous versions
Non-destructive actions (can proceed)
git stash- Reading files
- Running diagnostics
- Updating focus/tasks with findings
Post-recovery
- Update
with what happened.agent/focus.md - Invoke
to create issue for root cause investigationagent-ops-tasks - Update
with "pitfall to avoid" if applicable.agent/memory.md - Re-run baseline comparison before continuing
Issue Discovery After Recovery
After recovery, invoke
discovery procedure:agent-ops-tasks
-
Create issue for the incident:
📋 Recovery completed. Create issue to track root cause? Suggested: - [BUG] Investigate: {description of what failed} - What happened: {failure description} - Recovery action: {what was done} - Root cause: TBD Create this issue? [Y]es / [N]o -
If pattern detected, create prevention issue:
This failure pattern has occurred before. Create improvement issue? - [CHORE] Add validation to prevent {failure type} - [TEST] Add regression test for {scenario} Create these? [A]ll / [S]elect / [N]one -
After creating issues:
Created {N} issues for tracking. What's next? 1. Investigate root cause now (BUG-0024@abc123) 2. Continue with original work (defer investigation) 3. Review recovery actions