Skilllibrary workflow-state-memory

install

source · Clone the upstream repo

git clone https://github.com/merceralex397-collab/skilllibrary

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/05-agentic-orchestration-and-autonomy/workflow-state-memory" ~/.claude/skills/merceralex397-collab-skilllibrary-workflow-state-memory && rm -rf "$T"

manifest: 05-agentic-orchestration-and-autonomy/workflow-state-memory/SKILL.md

source content

Purpose

Defines how to serialize, store, and reload workflow state so that agent sessions can be interrupted and resumed without losing progress — replacing reliance on conversational memory with explicit, file-backed checkpoints.

When to use

A workflow spans multiple sessions or may be interrupted mid-run.
The user says "save progress", "checkpoint", or "I'll continue later."
A long-running task (>10 steps) risks losing intermediate results if the session drops.
Multiple agents need to share state through a common file rather than passing context through prompts.

Do NOT use when

The entire task fits in one session and produces a single atomic output.
An external system (CI pipeline, database, issue tracker) already tracks the workflow state.
The workflow is stateless by design (each run is independent).
The user explicitly says state persistence is not needed.

Operating procedure

Identify state-worthy data. List every piece of intermediate data that would be lost if the session ended now. Typical candidates:
- Current stage index and stage name.
- Todo/task list with per-item status (pending, in_progress, done, failed).
- File paths that have been modified.
- Gate check results from
```
verification-before-advance
```
  .
- Research findings from
```
subagent-research-patterns
```
  .
Choose a storage location. Default to
```
.workflow-state/
```
in the repo root. If the repo has a
```
.gitignore
```
, check whether the directory is excluded; if not, add it. For non-repo contexts, use
```
/tmp/workflow-state/
```
.

Define the checkpoint schema. Create a JSON file named

<workflow-name>-state.json

with this structure:

{
  "workflow": "<name>",
  "version": 1,
  "created_at": "<ISO-8601>",
  "updated_at": "<ISO-8601>",
  "current_stage": "<stage-id>",
  "stages": [ { "id": "…", "status": "…", "evidence": "…" } ],
  "artifacts": [ "<path1>", "<path2>" ],
  "notes": "<free-text for context>"
}

Write the initial checkpoint. Run
```
bash
```
to create the directory and write the JSON file with current values. Validate the JSON with
```
python3 -m json.tool <file>
```
before proceeding.
Update the checkpoint after each stage. After completing a stage (and passing its gate if
```
verification-before-advance
```
is active), update the
```
current_stage
```
, the stage's
```
status
```
, and
```
updated_at
```
. Write the file atomically: write to a
```
.tmp
```
file first, then
```
mv
```
it over the original.
Resume from checkpoint. At session start, check for an existing state file:
```
ls .workflow-state/*-state.json
```
. If found, read it with
```
view
```
, parse the
```
current_stage
```
, and skip all stages marked
```
done
```
. Report to the user: "Resuming from stage N: <name>. Stages 1–(N-1) already done."
Handle version migration. If the checkpoint schema
```
version
```
does not match the expected version, run a migration function: read old fields, map them to new fields, bump the version, and write a new file. Keep the old file as
```
<name>-state.v<old>.json.bak
```
.
Clean up on completion. When all stages are done, move the state file to
```
.workflow-state/archive/
```
with a timestamp suffix. Do not delete it — it serves as an audit trail.

Decision rules

Always write state to a file, never rely on SQL session tables alone (those are ephemeral to the CLI session, not the workflow).
Use JSON for state files — not YAML, not plain text — to enable programmatic parsing on resume.
If two sessions attempt to resume the same workflow, compare
```
updated_at
```
timestamps; the newer one wins. Archive the older state file with a
```
conflict-
```
prefix.
Checkpoint frequency: after every stage completion by default. The user may lower this to every N stages for fast workflows.
Never store secrets, tokens, or credentials in the state file.

Output requirements

State File — a valid JSON checkpoint at the defined storage location.
Resume Report — on session start, a brief summary of what was recovered: stages done, current stage, any conflicts resolved.
Archive Entry — on workflow completion, the final state file moved to the archive directory.

References

```
references/checkpoint-rules.md
```
— canonical checkpoint format and rules.
```
references/delegate-contracts.md
```
— how sub-agents reference shared state.

Related skills

```
verification-before-advance
```
— gate checks that update stage status.
```
subagent-research-patterns
```
— research results stored as state artifacts.
```
swarm-patterns
```
— shared blackboard state for multi-agent coordination.
```
session-resume-rehydration
```
— higher-level session resume orchestration.

Failure handling

State file is corrupted (invalid JSON): Attempt to recover by parsing with
```
python3 -c "import json; …"
```
in lenient mode. If unrecoverable, rename the file with a
```
.corrupt
```
suffix, start a fresh state, and warn the user about lost progress.
Storage location is read-only: Fall back to
```
/tmp/workflow-state/
```
and warn the user that state will not survive a machine restart.
Conflicting concurrent writes: Use the
```
updated_at
```
comparison rule. If timestamps are identical (rare), prompt the user to choose which state to keep.
Missing state file on resume: Treat as a fresh start. Do not error — log "No prior state found, beginning from stage 1."