Skilllibrary autonomous-run-control

install

source · Clone the upstream repo

git clone https://github.com/merceralex397-collab/skilllibrary

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/05-agentic-orchestration-and-autonomy/autonomous-run-control" ~/.claude/skills/merceralex397-collab-skilllibrary-autonomous-run-control && rm -rf "$T"

manifest: 05-agentic-orchestration-and-autonomy/autonomous-run-control/SKILL.md

source content

Purpose

Govern autonomous agent execution by defining start/stop policies, resource budgets, timeout thresholds, checkpoint frequency, and rollback triggers. Prevents runaway agents from consuming unbounded resources or producing unrecoverable state.

When to use

An agent will execute autonomously for more than 60 seconds or multiple tool calls.
Token or cost budgets must be enforced for an agent run.
A long-running workflow needs periodic checkpoints so partial progress is recoverable.
An agent has previously run away (exceeded time or token limits) and guardrails are needed.
Rollback capability is required in case an autonomous run produces bad output.

Do NOT use when

The task is a single synchronous command completing in under 30 seconds.
The user is manually stepping through execution and controlling each action.
The agent framework already provides built-in run control that is correctly configured.

Operating procedure

Read the agent configuration file (e.g.,
```
agents.yaml
```
,
```
AGENTS.md
```
, or equivalent) and extract current timeout, token limit, and checkpoint settings.
Define a run budget table:
```
| Resource | Limit | Current Setting | Recommended |
```
covering: wall-clock time, token count, tool-call count, and cost (USD).
Set wall-clock timeout: default 300 seconds for standard tasks, 900 seconds for multi-file refactors, 1800 seconds for full-repo operations.
Set token budget: default 50,000 tokens for standard tasks, 150,000 for complex tasks. Include both input and output tokens.
Set tool-call limit: default 30 calls for standard tasks, 100 for complex multi-step workflows.
Define checkpoint frequency: emit a checkpoint every N tool calls (default N=10) or every M seconds (default M=120), whichever comes first.

Write checkpoint format:

{ "run_id": "<uuid>", "step": <n>, "timestamp": "<ISO-8601>", "state_snapshot": "<serialized>", "files_modified": ["<paths>"] }

Define runaway detection rules: trigger alert if any of these occur — (a) 3 consecutive identical tool calls, (b) token usage exceeds 80% of budget with <20% of task complete, (c) no new file modifications in last 10 tool calls despite active execution.
Define graceful shutdown sequence: (a) complete current tool call, (b) write final checkpoint, (c) emit summary of progress and remaining work, (d) revert any uncommitted partial changes, (e) set run status to
```
paused
```
.
Define rollback trigger conditions: (a) test suite failures exceed pre-run baseline, (b) linter errors increase, (c) agent explicitly reports unrecoverable error. On trigger: revert to last checkpoint via
```
git checkout
```
or state restore.
Write all policies to a
```
run-control-policy.md
```
or update the agent config with the new limits.

Decision rules

Never allow an agent to run without at least a wall-clock timeout — infinite runs are always prohibited.
If token usage exceeds 80% of budget, switch to summary-only mode: stop generating new code and produce a progress report instead.
Checkpoint data must include the list of modified files so rollback can be surgical, not full-repo.
Runaway detection must halt the agent, not just warn — warnings are insufficient for autonomous runs.
If the agent is within 10% of any resource limit, emit a "budget warning" before the next tool call.

Output requirements

Run Budget Table — resource limits with current vs. recommended values.
Checkpoint Policy — frequency, format, and storage location for checkpoints.
Runaway Detection Rules — enumerated conditions that trigger automatic halt.
Graceful Shutdown Sequence — ordered steps for clean termination.
Rollback Policy — conditions and mechanism for reverting to last-known-good state.

References

```
references/checkpoint-rules.md
```
— detailed checkpoint format and storage patterns.
```
references/failure-escalation.md
```
— escalation paths when run control detects anomalies.
Agent framework documentation for configuring native timeout and budget settings.

Related skills

```
human-interrupt-handling
```
— human interrupts interact with run-control pause/resume.
```
collaboration-checkpoints
```
— multi-agent checkpoints build on single-agent checkpoint format.
```
artifact-contracts
```
— run completion requires producing contracted artifacts.
```
workflow-state-memory
```
— state persistence across paused/resumed runs.

Failure handling

Checkpoint write failure: If checkpoint cannot be persisted, halt the run immediately — do not continue without recovery capability.
Budget exceeded silently: If the framework does not enforce budgets natively, wrap each tool call in a budget-check decorator and abort if exceeded.
Rollback failure: If
```
git checkout
```
or state restore fails, preserve the current state as-is, log all modified file paths, and escalate to human.
Runaway false positive: If the agent is halted but was making legitimate progress (e.g., repetitive but correct bulk operations), allow resume with a 2× budget increase and a note explaining the override.