Hive hive.error-recovery

Follow a structured recovery decision tree when tool calls fail instead of blindly retrying or giving up.

install

source · Clone the upstream repo

git clone https://github.com/aden-hive/hive

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aden-hive/hive "$T" && mkdir -p ~/.claude/skills && cp -r "$T/core/framework/skills/_default_skills/error-recovery" ~/.claude/skills/aden-hive-hive-hive-error-recovery && rm -rf "$T"

manifest: core/framework/skills/_default_skills/error-recovery/SKILL.md

source content

Operational Protocol: Error Recovery

When a tool call fails:

Diagnose — classify the failure as transient (network blip, rate limit, timeout) or structural (wrong selector, missing auth, invalid schema, permission denied).
Decide:
- Transient → retry once.
- Structural + fixable → fix the input and retry.
- Structural + unfixable → record the failure and move to the next item.
- Blocking all progress → escalate.
Adapt — if the same tool has failed {{max_retries_per_tool}}+ times in a row, stop using it and find an alternative approach.

Never silently drop a failed item. If the item is a task in the colony queue, write the failure to the DB instead of an in-memory buffer:

sqlite3 "$DB_PATH" "UPDATE tasks SET status='failed', last_error='<one-sentence reason>', completed_at=datetime('now'), updated_at=datetime('now') WHERE id='<task-id>' AND worker_id='<your-worker-id>';"

The

tasks.retry_count

column and the stale-claim reclaimer handle auto-retry for crashes; your job is the within-run decision tree above. See

hive.colony-progress-tracker

for the full queue protocol.