Skilllibrary autonomous-run-control
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/05-agentic-orchestration-and-autonomy/autonomous-run-control" ~/.claude/skills/merceralex397-collab-skilllibrary-autonomous-run-control && rm -rf "$T"
manifest:
05-agentic-orchestration-and-autonomy/autonomous-run-control/SKILL.mdsource content
Purpose
Govern autonomous agent execution by defining start/stop policies, resource budgets, timeout thresholds, checkpoint frequency, and rollback triggers. Prevents runaway agents from consuming unbounded resources or producing unrecoverable state.
When to use
- An agent will execute autonomously for more than 60 seconds or multiple tool calls.
- Token or cost budgets must be enforced for an agent run.
- A long-running workflow needs periodic checkpoints so partial progress is recoverable.
- An agent has previously run away (exceeded time or token limits) and guardrails are needed.
- Rollback capability is required in case an autonomous run produces bad output.
Do NOT use when
- The task is a single synchronous command completing in under 30 seconds.
- The user is manually stepping through execution and controlling each action.
- The agent framework already provides built-in run control that is correctly configured.
Operating procedure
- Read the agent configuration file (e.g.,
,agents.yaml
, or equivalent) and extract current timeout, token limit, and checkpoint settings.AGENTS.md - Define a run budget table:
covering: wall-clock time, token count, tool-call count, and cost (USD).| Resource | Limit | Current Setting | Recommended | - Set wall-clock timeout: default 300 seconds for standard tasks, 900 seconds for multi-file refactors, 1800 seconds for full-repo operations.
- Set token budget: default 50,000 tokens for standard tasks, 150,000 for complex tasks. Include both input and output tokens.
- Set tool-call limit: default 30 calls for standard tasks, 100 for complex multi-step workflows.
- Define checkpoint frequency: emit a checkpoint every N tool calls (default N=10) or every M seconds (default M=120), whichever comes first.
- Write checkpoint format:
.{ "run_id": "<uuid>", "step": <n>, "timestamp": "<ISO-8601>", "state_snapshot": "<serialized>", "files_modified": ["<paths>"] } - Define runaway detection rules: trigger alert if any of these occur — (a) 3 consecutive identical tool calls, (b) token usage exceeds 80% of budget with <20% of task complete, (c) no new file modifications in last 10 tool calls despite active execution.
- Define graceful shutdown sequence: (a) complete current tool call, (b) write final checkpoint, (c) emit summary of progress and remaining work, (d) revert any uncommitted partial changes, (e) set run status to
.paused - Define rollback trigger conditions: (a) test suite failures exceed pre-run baseline, (b) linter errors increase, (c) agent explicitly reports unrecoverable error. On trigger: revert to last checkpoint via
or state restore.git checkout - Write all policies to a
or update the agent config with the new limits.run-control-policy.md
Decision rules
- Never allow an agent to run without at least a wall-clock timeout — infinite runs are always prohibited.
- If token usage exceeds 80% of budget, switch to summary-only mode: stop generating new code and produce a progress report instead.
- Checkpoint data must include the list of modified files so rollback can be surgical, not full-repo.
- Runaway detection must halt the agent, not just warn — warnings are insufficient for autonomous runs.
- If the agent is within 10% of any resource limit, emit a "budget warning" before the next tool call.
Output requirements
- Run Budget Table — resource limits with current vs. recommended values.
- Checkpoint Policy — frequency, format, and storage location for checkpoints.
- Runaway Detection Rules — enumerated conditions that trigger automatic halt.
- Graceful Shutdown Sequence — ordered steps for clean termination.
- Rollback Policy — conditions and mechanism for reverting to last-known-good state.
References
— detailed checkpoint format and storage patterns.references/checkpoint-rules.md
— escalation paths when run control detects anomalies.references/failure-escalation.md- Agent framework documentation for configuring native timeout and budget settings.
Related skills
— human interrupts interact with run-control pause/resume.human-interrupt-handling
— multi-agent checkpoints build on single-agent checkpoint format.collaboration-checkpoints
— run completion requires producing contracted artifacts.artifact-contracts
— state persistence across paused/resumed runs.workflow-state-memory
Failure handling
- Checkpoint write failure: If checkpoint cannot be persisted, halt the run immediately — do not continue without recovery capability.
- Budget exceeded silently: If the framework does not enforce budgets natively, wrap each tool call in a budget-check decorator and abort if exceeded.
- Rollback failure: If
or state restore fails, preserve the current state as-is, log all modified file paths, and escalate to human.git checkout - Runaway false positive: If the agent is halted but was making legitimate progress (e.g., repetitive but correct bulk operations), allow resume with a 2× budget increase and a note explaining the override.