Session-orchestrator vault-sync
Validates vault frontmatter (Zod schema) and wiki-link integrity. Phase 1: hard gate at session-end. Used by session-end skill to block close on invalid vault state.
git clone https://github.com/Kanevry/session-orchestrator
T=$(mktemp -d) && git clone --depth=1 https://github.com/Kanevry/session-orchestrator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/vault-sync" ~/.claude/skills/kanevry-session-orchestrator-vault-sync && rm -rf "$T"
skills/vault-sync/SKILL.mdVault Sync Skill
Status
STATUS: PHASE 1 IMPLEMENTED (2026-04-13). Session-End hard gate (section 3.1) operational. Phase 2 (wave-executor incremental, 3.2) and Phase 3 (evolve advisory, 3.3) not yet implemented.
Implementation
Phase 1 ships a self-contained validator that reads every
.md file under VAULT_DIR, parses YAML frontmatter, validates against the canonical vaultFrontmatterSchema, and flags dangling wiki-links as warnings.
Files
— Node.js ESM validator. Usesvalidator.mjs
+zod
npm packages. Readsyaml
(env or default cwd), walks the tree, skippingVAULT_DIR
,node_modules/
,.git/
,.obsidian/
. For each90-archive/
: parses frontmatter, validates against the inline Zod schema, extracts.md
, verifies each target resolves. Emits JSON report on stdout.[[wiki-links]]
— Thin POSIX wrapper. Resolvesvalidator.sh
from arg 1 or env, self-bootstraps deps viaVAULT_DIR
on first run, execs the Node validator. Session-end and other callers use this entry point.pnpm install --silent
— Declarespackage.json
(zod
, matching projects-baseline) and^3.24.0
(yaml
) as deps.^2.5.0
is committed;pnpm-lock.yaml
is gitignored.node_modules/
— 16 BATS cases covering clean vaults, broken frontmatter, missing required fields, dangling links, no-vault skipping, README-style files, nested directories, and archive/obsidian exclusion.tests/validator.bats
— Seven fixture vaults matching each test scenario.tests/fixtures/
Schema source
The inline Zod schema is vendored from the canonical source at
projects-baseline/packages/zod-schemas/src/vault-frontmatter.ts. The skill is intentionally self-contained (no monorepo workspace dependency), so the schema is duplicated with a header comment pointing at the SSOT. Drift is to be caught by a future smoke test that imports the canonical schema and diffs the shape — NOT YET IMPLEMENTED. Until that test exists, any change to the canonical schema must be mirrored here in the same commit.
How session-end invokes it
VAULT_DIR=/path/to/vault bash ~/Projects/session-orchestrator/skills/vault-sync/validator.sh
- Exit
— vault valid (or skipped because no vault exists / no .md files). Warnings may still be present in the JSON report.0 - Exit
— one or more validation errors. Session-end surfaces them in the quality gate report and refuses to close.1 - Exit
— invalid invocation or infrastructure error. Two cases: (a)2
is not set andVAULT_DIR
does not look like a Meta-Vault (nocwd
, no_meta/
, no.obsidian/
withCLAUDE.md
+## Session Config
block) — actionable error printed to stderr; (b) infrastructure error (missingvault-sync:
, missingnode
, cannot bootstrap deps). In both cases no JSON is emitted to stdout.validator.mjs
JSON output shape (stdout):
{ "status": "ok|invalid|skipped", "vault_dir": "...", "files_checked": N, "files_skipped_no_frontmatter": N, "errors": [{"file": "...", "path": "frontmatter.id", "message": "..."}], "warnings": [{"file": "...", "type": "dangling-wiki-link", "message": "..."}] }
Opt-in
--check-expires flag downgrades expired notes to warnings; default off (Phase 1 leaves freshness for the Phase 3 evolve advisory).
CLI Flags
The validator (both
validator.mjs and the validator.sh wrapper) accepts:
— gate severity.--mode <hard|warn|off>
(default) exits 1 on any frontmatter/schema error.hard
exits 0 but still populates thewarn
array in the JSON output so callers can surface them as warnings.errors
short-circuits tooff
— useful during onboarding when the gate is enabled but the vault is not yet clean.status: "skipped-mode-off"
— repeatable. Glob patterns (relative to--exclude <glob>
, POSIX-style forward slashes) matching files to skip. SupportsVAULT_DIR
(any number of segments),**
(any chars except*
), and/
(single char except?
). Excluded files are counted in/
and contribute nothing toexcluded_count
/errors
. Example:warnings
.--exclude "**/_MOC.md" --exclude "**/README.md"
— flag expired notes (--check-expires
date in the past) as warnings. Default off.expires:
Environment variables:
— directory to scan. Defaults toVAULT_DIR
. Can also be passed as the first positional argument to$PWD
.validator.sh
Example invocation:
VAULT_DIR=~/Projects/vault bash validator.sh \ --mode warn \ --exclude "**/_MOC.md" \ --exclude "**/_overview.md" \ --exclude "**/README.md"
The JSON output always includes the
mode and excluded_count fields when the validator runs past the mode-off / no-vault short-circuits.
Purpose
A "project vault" is a markdown-based knowledge base living under
vault/ at the project root. Each file carries strict YAML frontmatter (id, title, tags, status, created, expires, sources) and uses wiki-style links to cross-reference peer notes. The vault is consumed by two audiences: humans browsing the knowledge base, and Sophie-style RAG agents that embed and retrieve notes during chat. Because both audiences depend on the same content, drift is expensive: a stale status: verified note with a dead source URL quietly poisons retrieval results, and a broken wiki-link breaks both navigation and graph traversal.
Automated validation is therefore mandatory, not optional. The vault needs four kinds of checks: frontmatter schema conformance, wiki-link integrity, source whitelist enforcement (especially for regulated content like
austrian-law that must cite only approved government URLs), and freshness (expires date in the past). Session-orchestrator is the right home for the in-session layer because every project with a vault will eventually want this, and session lifecycle hooks (wave boundaries, session end, evolve) are exactly the points where drift becomes visible.
The reference architecture is 3 layers:
- Layer A: local git hooks (pre-commit, pre-push) -- IMPLEMENTED in reference project. Fast, fail-early, blocks bad commits.
- Layer B: session-orchestrator:vault-sync skill (THIS SPEC) -- PENDING. Continuous freshness inside normal session flow.
- Layer C: remote CI job -- IMPLEMENTED in reference project's
. Final gate, catches anything the other two miss..gitlab-ci.yml
Layer B is the continuous freshness layer. Its job is to run inside normal session flow without requiring developers to remember to validate. If Layers A and C are the bookends, Layer B is the spine.
Invocation Points
3.1 Session-End Hard Gate
- Trigger: called by
skill as part of Phase 1 (quality gates), alongside typecheck / lint / test.session-orchestrator:session-end - Behavior: full validation run over the entire vault. No incremental mode here -- a clean session close must prove the whole vault is valid.
- Error handling: validation errors block the session close. The session-end skill surfaces them in the quality gate report and refuses to commit until they are fixed.
- Rationale: a clean session must leave the vault in a valid state. This is the one place where the hard gate is non-negotiable.
- Timeout budget: ~30s for typical vaults (<500 files). Projects with larger vaults should override via
(see Inputs).full-validation-threshold
3.2 Wave-Executor Incremental Check
- Trigger: called by
after any wave whose agents modified files undersession-orchestrator:wave-executor
(detected viavault/**
).git diff --name-only - Behavior: incremental validation scoped to the files changed in that wave (diff against
). Frontmatter and wiki-link resolution run on the touched files only; the source whitelist check runs on touched files only.$WAVE_START_REF..HEAD - Error handling: findings are reported inline in the wave progress output. Warnings do not block. Errors trigger a fix task for the next wave rather than aborting the current one -- the vault is a living document and incremental corrections are normal.
- Rationale: catch drift within a single session, not only at session end. A wave that introduces a broken wiki-link should surface it in the next wave's plan, not at the finish line.
3.3 Evolve Advisory Scan
- Trigger: called by
as part of learning extraction.session-orchestrator:evolve - Behavior: read-only freshness audit. Flags notes with
whosestatus: verified
date has passed, and optionally (opt-in via config) probes source URLs for 404s.expires - Error handling: output is an advisory section in the evolve report. Never blocks.
- Rationale: learning extraction is the moment when patterns surface. Staleness is a pattern. Surfacing it here turns vault maintenance into a natural byproduct of the evolve cycle.
Inputs
Phase 1 (implemented)
- Environment variable:
— directory to scan forVAULT_DIR
files. Defaults to.md
. Can also be passed as the first positional argument to$PWD
.validator.sh
- Session Config — optional
section invault-sync
:CLAUDE.md
(default:vault-sync.enabled: true|false
; whenfalse
, the gate is skipped silently)false
(default:vault-sync.mode: hard|warn|off
via Session Config;warn
blocks session close on errors,hard
reports but does not block,warn
short-circuits tooff
. Note:status: skipped-mode-off
invoked directly withoutvalidator.sh
defaults to--mode
— thehard
default applies only when dispatched by session-end.)warn
(default: project rootvault-sync.vault-dir: <path>
; passed to the validator via$PWD
)VAULT_DIR
(default:vault-sync.exclude: [<glob>, ...]
; repeatable glob patterns — files matching any pattern are counted in[]
but not validated)excluded_count
Future Phases (not yet implemented)
The following fields are planned for Phase 2/3 but are not consumed by the current validator:
— files above this count trigger incremental mode instead of full scan (Phase 2, wave-executor)full-validation-threshold: N
— opt-in URL probe for source whitelist enforcement (Phase 3, evolve advisory)network-source-check: true|false- Runtime context: current git HEAD, session-start ref (for incremental diff in wave-executor, Phase 2)
Outputs
- Return value: JSON object emitted to stdout with the shape:
When the validator short-circuits early (no vault directory, no{ "status": "ok|invalid|skipped|skipped-mode-off", "mode": "hard|warn|off", "vault_dir": "<absolute path>", "files_checked": N, "excluded_count": N, "files_skipped_no_frontmatter": N, "errors": [{ "file": "<relative path>", "path": "<frontmatter field path>", "message": "<description>" }], "warnings": [{ "file": "<relative path>", "type": "dangling-wiki-link|expired", "message": "<description>" }] }
files, or.md
), themode: off
,files_checked
, andexcluded_count
fields may be omitted or zero. Infiles_skipped_no_frontmatter
mode,warn
is still populated buterrors
isstatus
so callers can surface findings without blocking."ok" - Exit codes:
-- vault is valid (or scan was skipped for a legitimate reason)0
-- validation errors (one or more files failed a rule)1
-- invalid invocation (VAULT_DIR unset + cwd not a vault) or infrastructure error (validator command not found, validator crashed). No JSON emitted to stdout in this case.2
- Surface points:
- Layer B (wave-executor): inline in the wave progress output, next to typecheck / lint results
- Layer A (session-end): in the quality gate report, same format as other gates
- Layer C (evolve): in the evolve report under "Vault Advisory"
Error Handling Matrix
| Error Type | Severity | Action |
|---|---|---|
| Frontmatter schema violation (missing required field) | ERROR | Block (hard gate) / add fix task (wave-executor) |
| Wiki-link resolution failure | ERROR | Block (hard gate) / add fix task (wave-executor) |
| Source whitelist violation (austrian-law without approved URL) | ERROR | Block (hard gate) / add fix task (wave-executor) |
Stale note ( date in past) | WARNING | Advisory only -- never blocks |
Validator command not found ( missing) | INFRA ERROR | Skip with clear warning; do NOT fail the session |
| Vault directory does not exist | INFO | Skip silently (project may not use vault) |
Open Design Questions
- Should incremental mode include wiki-link validation across the FULL vault, or only the files touched in this wave? Cross-reference bugs can hide in untouched files (note X suddenly has no backlinks because note Y was renamed), which argues for full scan. Performance argues for incremental. Probably hybrid: incremental for schema + touched-file links, full for the backlink graph.
- Should the skill auto-fix trivial issues (missing
date, tag case normalization, trailing whitespace) or always error out and leave fixes to humans? Auto-fix is convenient but risky inside an AI-driven session because it writes to files that agents are simultaneously editing.created: - How does this skill integrate with a future pgvector embeddings pipeline (a later roadmap phase)? Should it trigger re-embedding of changed notes, or stay strictly validation-only and leave embedding to a separate skill?
- Should validation failures in wave-executor block the NEXT wave from starting, or only be reported as a fix task? Blocking is safer but reduces parallelism; reporting is faster but risks compounding errors across waves.
- What is the contract for projects that have no vault at all? Skip silently (current default) or require an explicit
opt-out in Session Config? Silent skip is user-friendly but masks misconfiguration.vault-sync.enabled: false - How do we handle multi-repo vaults -- e.g. a monorepo where each package has its own
, or a project that references a sibling repo's vault via a git submodule? Is there a singlevault/
or a list?VAULT_ROOT - Should the skill learn from previous findings (cache last-known-clean state, skip files whose mtime has not changed) or always run fresh? Caching is a significant perf win on big vaults but introduces its own correctness risks.
- How are secrets in vault files handled? If a note contains an accidentally-committed token in its
field, should the skill block on it (SEC-type check) or is that strictly the job of the existing secret-scan pre-commit hook?sources
Schema sync
Source of truth
The canonical schema lives in the private GitLab monorepo:
projects-baseline/packages/zod-schemas/src/vault-frontmatter.ts
The vendored copy in
validator.mjs is auto-generated — it is never edited by hand inside the sentinel block.
Sync workflow
When the canonical schema changes:
- Maintainer runs from the
root:session-orchestratornode scripts/sync-vault-schema.mjs --write - Commits the resulting
change (only the schema block between sentinels changes).validator.mjs - GitLab CI verifies via
mode: it clones the canonical source and asserts that the vendored copy contains no drift. The pipeline fails if the generated output differs from what is currently committed.--check
Sentinel markers
The schema block in
validator.mjs is delimited by:
// ── BEGIN GENERATED SCHEMA (sync-vault-schema.mjs) — do not edit between sentinels ── ... // ── END GENERATED SCHEMA ──
Never edit the content between these sentinels by hand. Any manual edit will be overwritten on the next
--write run and will cause --check to report drift.
Drift detection
tests/schema-drift.test.mjs covers 5 scenarios via vitest:
- Idempotency — running
twice produces identical output.--write - --check clean — exits 0 when vendored copy matches canonical.
- --check drift — exits 1 when vendored copy has been modified outside sentinels.
- Missing canonical — exits 2 when the canonical source file is not found.
- Sentinel presence — asserts both sentinel markers exist in
.validator.mjs
CLI reference
node scripts/sync-vault-schema.mjs [options]
| Flag / Env var | Description |
|---|---|
| Overwrite the vendored schema block in with the canonical source. |
| Exit 0 if vendored copy matches canonical; exit 1 if drift detected. Used in CI. |
| Override path to the canonical source file. |
| Override path to the target file. |
| Env var alternative to . |
Failure modes
| Exit code | Meaning |
|---|---|
| No drift (--check) or write succeeded (--write). |
| Drift detected (--check only). |
| Source or target file not found. |
| Malformed sentinels (BEGIN/END markers missing or out of order in ). |
References
-- reference implementation of the validator (Layer A/C entry point)<project>/scripts/vault/validate.ts
-- frontmatter schema (Zod)<project>/scripts/vault/schema.ts
STEP 3.1 -- Layer A local gate example<project>/.husky/pre-commit<project>/.gitlab-ci.yml
job -- Layer C remote gate examplevault:validate
-- human-readable schema doc<project>/vault/_meta/frontmatter-schema.md
Implementation Roadmap
- Phase 1 -- Minimal hard gate: session-end integration only. Full validation, no incremental, no caching. Must ship as a single wave. Reads
, executes it, parses exit code + JSON output, surfaces errors in the quality gate report. Acceptance: a session with a broken vault file refuses to close; a session with a clean vault closes with a "vault: valid (N files)" line in the report.VAULT_VALIDATOR_CMD - Phase 2 -- Incremental wave-executor integration: add incremental diff-based scan. Wire into the wave-executor post-wave hook so it runs only when
was touched. Wave progress reports findings inline. Introducevault/**
config. Acceptance: a wave that edits one vault file triggers a scan of exactly that file (plus its backlink graph); findings appear in the wave summary; errors generate a fix task for the next wave.full-validation-threshold - Phase 3 -- Advisory evolve integration + staleness: add evolve hook,
date check, optional network source check (gated behindexpires
). Output goes to the evolve report as a non-blocking advisory. Acceptance: an evolve run over a vault with 3 expired notes produces an advisory listing all 3, with no impact on session success or exit code.network-source-check: true