Skilllibrary workflow-state-memory
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/05-agentic-orchestration-and-autonomy/workflow-state-memory" ~/.claude/skills/merceralex397-collab-skilllibrary-workflow-state-memory && rm -rf "$T"
manifest:
05-agentic-orchestration-and-autonomy/workflow-state-memory/SKILL.mdsource content
Purpose
Defines how to serialize, store, and reload workflow state so that agent sessions can be interrupted and resumed without losing progress — replacing reliance on conversational memory with explicit, file-backed checkpoints.
When to use
- A workflow spans multiple sessions or may be interrupted mid-run.
- The user says "save progress", "checkpoint", or "I'll continue later."
- A long-running task (>10 steps) risks losing intermediate results if the session drops.
- Multiple agents need to share state through a common file rather than passing context through prompts.
Do NOT use when
- The entire task fits in one session and produces a single atomic output.
- An external system (CI pipeline, database, issue tracker) already tracks the workflow state.
- The workflow is stateless by design (each run is independent).
- The user explicitly says state persistence is not needed.
Operating procedure
- Identify state-worthy data. List every piece of intermediate data
that would be lost if the session ended now. Typical candidates:
- Current stage index and stage name.
- Todo/task list with per-item status (pending, in_progress, done, failed).
- File paths that have been modified.
- Gate check results from
.verification-before-advance - Research findings from
.subagent-research-patterns
- Choose a storage location. Default to
in the repo root. If the repo has a.workflow-state/
, check whether the directory is excluded; if not, add it. For non-repo contexts, use.gitignore
./tmp/workflow-state/ - Define the checkpoint schema. Create a JSON file named
with this structure:<workflow-name>-state.json{ "workflow": "<name>", "version": 1, "created_at": "<ISO-8601>", "updated_at": "<ISO-8601>", "current_stage": "<stage-id>", "stages": [ { "id": "…", "status": "…", "evidence": "…" } ], "artifacts": [ "<path1>", "<path2>" ], "notes": "<free-text for context>" } - Write the initial checkpoint. Run
to create the directory and write the JSON file with current values. Validate the JSON withbash
before proceeding.python3 -m json.tool <file> - Update the checkpoint after each stage. After completing a stage (and
passing its gate if
is active), update theverification-before-advance
, the stage'scurrent_stage
, andstatus
. Write the file atomically: write to aupdated_at
file first, then.tmp
it over the original.mv - Resume from checkpoint. At session start, check for an existing state
file:
. If found, read it withls .workflow-state/*-state.json
, parse theview
, and skip all stages markedcurrent_stage
. Report to the user: "Resuming from stage N: <name>. Stages 1–(N-1) already done."done - Handle version migration. If the checkpoint schema
does not match the expected version, run a migration function: read old fields, map them to new fields, bump the version, and write a new file. Keep the old file asversion
.<name>-state.v<old>.json.bak - Clean up on completion. When all stages are done, move the state file
to
with a timestamp suffix. Do not delete it — it serves as an audit trail..workflow-state/archive/
Decision rules
- Always write state to a file, never rely on SQL session tables alone (those are ephemeral to the CLI session, not the workflow).
- Use JSON for state files — not YAML, not plain text — to enable programmatic parsing on resume.
- If two sessions attempt to resume the same workflow, compare
timestamps; the newer one wins. Archive the older state file with aupdated_at
prefix.conflict- - Checkpoint frequency: after every stage completion by default. The user may lower this to every N stages for fast workflows.
- Never store secrets, tokens, or credentials in the state file.
Output requirements
- State File — a valid JSON checkpoint at the defined storage location.
- Resume Report — on session start, a brief summary of what was recovered: stages done, current stage, any conflicts resolved.
- Archive Entry — on workflow completion, the final state file moved to the archive directory.
References
— canonical checkpoint format and rules.references/checkpoint-rules.md
— how sub-agents reference shared state.references/delegate-contracts.md
Related skills
— gate checks that update stage status.verification-before-advance
— research results stored as state artifacts.subagent-research-patterns
— shared blackboard state for multi-agent coordination.swarm-patterns
— higher-level session resume orchestration.session-resume-rehydration
Failure handling
- State file is corrupted (invalid JSON): Attempt to recover by parsing
with
in lenient mode. If unrecoverable, rename the file with apython3 -c "import json; …"
suffix, start a fresh state, and warn the user about lost progress..corrupt - Storage location is read-only: Fall back to
and warn the user that state will not survive a machine restart./tmp/workflow-state/ - Conflicting concurrent writes: Use the
comparison rule. If timestamps are identical (rare), prompt the user to choose which state to keep.updated_at - Missing state file on resume: Treat as a fresh start. Do not error — log "No prior state found, beginning from stage 1."