Hive hive.error-recovery
Follow a structured recovery decision tree when tool calls fail instead of blindly retrying or giving up.
install
source · Clone the upstream repo
git clone https://github.com/aden-hive/hive
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aden-hive/hive "$T" && mkdir -p ~/.claude/skills && cp -r "$T/core/framework/skills/_default_skills/error-recovery" ~/.claude/skills/aden-hive-hive-hive-error-recovery && rm -rf "$T"
manifest:
core/framework/skills/_default_skills/error-recovery/SKILL.mdsource content
Operational Protocol: Error Recovery
When a tool call fails:
-
Diagnose — classify the failure as transient (network blip, rate limit, timeout) or structural (wrong selector, missing auth, invalid schema, permission denied).
-
Decide:
- Transient → retry once.
- Structural + fixable → fix the input and retry.
- Structural + unfixable → record the failure and move to the next item.
- Blocking all progress → escalate.
-
Adapt — if the same tool has failed {{max_retries_per_tool}}+ times in a row, stop using it and find an alternative approach.
Never silently drop a failed item. If the item is a task in the colony queue, write the failure to the DB instead of an in-memory buffer:
sqlite3 "$DB_PATH" "UPDATE tasks SET status='failed', last_error='<one-sentence reason>', completed_at=datetime('now'), updated_at=datetime('now') WHERE id='<task-id>' AND worker_id='<your-worker-id>';"
The
tasks.retry_count column and the stale-claim reclaimer handle auto-retry for crashes; your job is the within-run decision tree above. See hive.colony-progress-tracker for the full queue protocol.