Skilllibrary human-interrupt-handling
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/05-agentic-orchestration-and-autonomy/human-interrupt-handling" ~/.claude/skills/merceralex397-collab-skilllibrary-human-interrupt-handling && rm -rf "$T"
manifest:
05-agentic-orchestration-and-autonomy/human-interrupt-handling/SKILL.mdsource content
Purpose
Define when and how an autonomous agent should pause execution to request human input, how to preserve context so the human has full situational awareness, and how to resume execution cleanly after receiving a response. Prevents agents from making irreversible decisions without appropriate human oversight.
When to use
- An agent encounters a decision that exceeds its delegated authority (e.g., deleting files, modifying production config).
- Confidence in the next action is below a defined threshold and human judgment is needed.
- The agent detects ambiguity that cannot be resolved from available context.
- A destructive or irreversible operation requires explicit human approval.
- The agent has been running autonomously and a scheduled human review point is reached.
Do NOT use when
- The agent has all information and authority needed to proceed without human input.
- The task scope was pre-approved and the agent is operating within those bounds.
- The agent framework already provides a built-in interrupt mechanism that is correctly configured.
- The question can be answered by reading existing documentation or code.
Operating procedure
- Define interrupt severity levels in a table:
.| Level | Name | Trigger Condition | Max Wait Time | Fallback Action |- Level 1 (Info): non-blocking notification, agent continues. Wait: 0s. Fallback: log and proceed.
- Level 2 (Query): agent pauses current branch but continues other work. Wait: 300s. Fallback: pick safest option.
- Level 3 (Approval): agent fully pauses, no further actions. Wait: 600s. Fallback: abort and rollback.
- Level 4 (Emergency): agent halts all work and alerts immediately. Wait: 1800s. Fallback: full rollback.
- When an interrupt condition is detected, classify it into the appropriate severity level.
- Create a context snapshot containing:
.{ "run_id", "current_step", "files_modified", "pending_actions", "decision_needed", "options_considered", "confidence_per_option", "relevant_code_snippets" } - Format the human-facing interrupt message: state the decision needed in one sentence, list 2–4 options with pros/cons, highlight the recommended option and its confidence score, and include the context snapshot as a collapsible detail block.
- Write the interrupt to the designated channel: inline prompt (for interactive sessions), GitHub issue comment (for async), or webhook (for external systems).
- Start a response timer matching the severity level's max wait time.
- While waiting: if severity ≤ 2, continue executing non-dependent work branches. If severity ≥ 3, write a checkpoint and halt all execution.
- On human response: parse the response, validate it matches one of the presented options (or is a free-form override), and log the decision with timestamp and author.
- Resume protocol: (a) reload the checkpoint or current state, (b) apply the human's decision, (c) verify the working tree is consistent (run
, check for conflicts), (d) continue execution from the interrupted step.git status - On timeout (no human response within max wait): execute the fallback action defined for that severity level. Log that the fallback was used.
- After the run completes, emit an interrupt summary: count of interrupts by level, average response time, timeouts triggered, decisions made.
Decision rules
- Any operation that deletes files, modifies environment variables, or changes production configuration is minimum Level 3.
- If the agent's confidence in all available options is below 50%, escalate to Level 3 regardless of the operation type.
- Never execute a Level 3+ fallback that involves destructive actions — prefer abort over guess.
- Human responses override agent recommendations without exception, even if the agent disagrees.
- If the same interrupt type fires 3+ times in one run, batch remaining instances into a single interrupt with a summary table.
Output requirements
- Interrupt Severity Table — levels, triggers, wait times, and fallback actions.
- Context Snapshot — structured JSON with all state needed for human to make an informed decision.
- Interrupt Message — formatted question with options, pros/cons, and recommendation.
- Decision Log — timestamped record of each interrupt, human response, and outcome.
- Interrupt Summary — end-of-run statistics on interrupt count, response times, and timeouts.
References
— escalation paths and severity classification.references/failure-escalation.md
— checkpoint format used for context preservation during interrupts.references/checkpoint-rules.md
— delegation boundaries that define when interrupts are needed.references/delegate-contracts.md
Related skills
— run-control pause/resume interacts with interrupt handling.autonomous-run-control
— multi-agent checkpoints may trigger synchronized interrupts.collaboration-checkpoints
— ambiguous goals may require human interrupt before decomposition begins.goal-decomposition
— failed verification may escalate to human interrupt.verification-before-advance
Failure handling
- Interrupt delivery failure: If the interrupt message cannot be delivered (channel unavailable), fall back to the next available channel. If all channels fail, halt the run and write the interrupt to a local
file.interrupts.log - Malformed human response: If the response does not match any presented option and cannot be parsed as a free-form override, re-send the interrupt with a clarification request (once). On second failure, execute the safest fallback.
- Context snapshot too large: If the snapshot exceeds 5000 tokens, summarize to key facts and link to full state. Never truncate without a summary.
- Resume state corruption: If the working tree has changed between interrupt and resume (e.g., another agent modified files), run conflict detection before resuming. If conflicts exist, trigger a collaboration checkpoint.