git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/audit-documentation" ~/.claude/skills/majiayu000-claude-skill-registry-audit-documentation && rm -rf "$T"
skills/data/audit-documentation/SKILL.mdMulti-Stage Parallel Documentation Audit
Version: 2.0 Total Agents: 18 parallel agents across 5 stages + 1 synthesis stage
Overview
This audit uses parallel agent execution across 6 stages to comprehensively analyze documentation quality, accuracy, and lifecycle status. Each stage produces JSONL output that feeds into the final synthesis.
Output Directory:
docs/audits/single-session/documentation/audit-[YYYY-MM-DD]/
Pre-Audit Setup
Step 0: Episodic Memory Search (Session #128)
Before running documentation audit, search for context from past sessions:
// Search for past documentation audit findings mcp__plugin_episodic - memory_episodic - memory__search({ query: ["documentation audit", "stale docs", "broken links"], limit: 5, }); // Search for doc structure decisions mcp__plugin_episodic - memory_episodic - memory__search({ query: ["DOCUMENTATION_STANDARDS", "tier", "lifecycle"], limit: 5, });
Why this matters:
- Compare against previous doc health metrics
- Identify recurring documentation gaps
- Track which docs were flagged for updates before
- Prevent re-flagging known orphans or intentional gaps
Step 1: Create Output Directory
AUDIT_DIR="docs/audits/single-session/documentation/audit-$(date +%Y-%m-%d)" mkdir -p "$AUDIT_DIR" echo "Audit output: $AUDIT_DIR"
Step 2: Load False Positives Database
Read
docs/audits/FALSE_POSITIVES.jsonl and note patterns to exclude from
findings (filter by category: documentation).
Step 3: Check Thresholds
Run
npm run review:check - proceed regardless of result (user invoked
intentionally).
Stage 1: Inventory & Baseline (3 Parallel Agents)
Launch these 3 agents in parallel:
Agent 1A: Document Inventory
Task: Build complete document catalog
Count all .md files by directory and tier: - Root level: ROADMAP.md, README.md, etc. - docs/: by subdirectory - .claude/: skills, plans Extract metadata from each: - Version number (if present) - Last Updated date (if present) - Status field (if present) - Word count Output: ${AUDIT_DIR}/stage-1-inventory.md Format: Markdown summary with counts and file list
Agent 1B: Baseline Metrics
Task: Capture current state via existing tools
# Run these commands and capture output: npm run docs:check > ${AUDIT_DIR}/baseline-docs-check.txt 2>&1 npm run docs:sync-check > ${AUDIT_DIR}/baseline-sync-check.txt 2>&1 npm run format:check -- docs/ > ${AUDIT_DIR}/baseline-format-check.txt 2>&1 # Check DOCUMENTATION_INDEX.md for orphans grep -c "orphan" docs/DOCUMENTATION_INDEX.md || echo "0"
Output:
${AUDIT_DIR}/stage-1-baselines.md
Agent 1C: Link Extraction
Task: Build link graph for later stages
Extract from all .md files: 1. Internal links: [text](path.md) -> list with source file:line 2. External URLs: https://... -> list with source file:line 3. Anchor links: #section -> list with source file:line Output: ${AUDIT_DIR}/stage-1-links.json Schema: { "internal": [{"source": "file.md", "line": 1, "target": "other.md", "text": "..."}], "external": [{"source": "file.md", "line": 1, "url": "https://...", "text": "..."}], "anchors": [{"source": "file.md", "line": 1, "anchor": "#section", "text": "..."}] }
Stage 1 Completion Audit
Before proceeding to Stage 2, verify:
-
exists and is non-emptystage-1-inventory.md -
exists with metricsstage-1-baselines.md -
exists and is valid JSONstage-1-links.json - Display summary: "Stage 1 Complete: X docs, Y internal links, Z external URLs"
Stage 2: Link Validation (4 Parallel Agents)
Launch these 4 agents in parallel using Stage 1 outputs:
Agent 2A: Internal Link Checker
Task: Verify internal .md links resolve
For each internal link from stage-1-links.json: 1. Check target file exists 2. If link has anchor (#section), verify heading exists in target 3. Detect circular references (A→B→C→A) Output: ${AUDIT_DIR}/stage-2-internal-links.jsonl JSONL schema per finding (JSONL_SCHEMA_STANDARD.md format): { "category": "documentation", "title": "Broken internal link to target.md", "fingerprint": "documentation::source.md::broken-link-target", "severity": "S1|S2", "effort": "E0", "confidence": 90, "files": ["source.md:123"], "why_it_matters": "Broken links frustrate readers and indicate stale documentation", "suggested_fix": "Update link to correct path or remove if target no longer exists", "acceptance_tests": ["Link resolves correctly", "No 404 when clicking"], "evidence": ["target: path.md", "resolved: /full/path.md"] }
Agent 2B: External URL Checker
Task: HTTP HEAD requests to external URLs
# Use the new script for external link checking npm run docs:external-links -- --output ${AUDIT_DIR}/stage-2-external-links.jsonl
Or manually check each URL from stage-1-links.json with:
- 10-second timeout
- Rate limiting (100ms between same domain)
- Cache results
- Flag: 404, 403, 5xx, timeouts, redirects
Output:
${AUDIT_DIR}/stage-2-external-links.jsonl
Agent 2C: Cross-Reference Validator
Task: Verify references to project artifacts
Check documentation references: 1. ROADMAP item references (P1.2, Phase 3, etc.) - do they exist? 2. PR/Issue references (#123) - format valid? 3. SESSION_CONTEXT references - files mentioned exist? 4. Skill/hook path references - paths valid? Output: ${AUDIT_DIR}/stage-2-cross-refs.jsonl
Agent 2D: Orphan & Connectivity
Task: Find disconnected documents
From stage-1-links.json, identify: 1. Docs with zero inbound links (orphans) 2. Docs with only broken outbound links 3. Isolated clusters (group of docs only linking to each other) Exclude from orphan detection: - README.md (entry point) - Root-level canonical docs - Archive docs Output: ${AUDIT_DIR}/stage-2-orphans.jsonl
Stage 2 Completion Audit
Before proceeding to Stage 3, verify:
- All 4 JSONL files exist
- Run schema validation:
node scripts/debt/validate-schema.js ${AUDIT_DIR}/stage-2-*.jsonl - Display summary: "Stage 2 Complete: X link issues found"
Stage 3: Content Quality (4 Parallel Agents)
Launch these 4 agents in parallel:
Agent 3A: Accuracy Checker
Task: Verify content matches codebase
# Use the new script for accuracy checking node scripts/check-content-accuracy.js --output ${AUDIT_DIR}/stage-3-accuracy.jsonl
Checks:
- Version numbers match package.json
- File paths mentioned exist
- npm script references valid
- Code snippet syntax (basic validation)
Output:
${AUDIT_DIR}/stage-3-accuracy.jsonl
Agent 3B: Completeness Checker
Task: Check for missing/incomplete content
For each document, check: 1. Required sections present per tier: - Tier 1: Purpose, Version History - Tier 2: Purpose, Version History, AI Instructions - Tier 3+: Purpose, Status, Version History 2. No TODO/TBD/FIXME placeholders 3. No empty sections (heading with no content) 4. No stub documents (< 100 words, excluding code blocks) Output: ${AUDIT_DIR}/stage-3-completeness.jsonl
Agent 3C: Coherence Checker
Task: Check terminology and duplication
Analyze across all documents: 1. Terminology inconsistency: - "skill" vs "command" vs "slash command" - "agent" vs "subagent" vs "worker" - Collect all term usages, flag inconsistencies 2. Duplicate content: - Exact match: identical content blocks (>50 words) - Fuzzy match: 80%+ similarity (same topic, minor rewording) 3. Contradictory information (conflicting guidance for same task) Output: ${AUDIT_DIR}/stage-3-coherence.jsonl
Agent 3D: Freshness Checker
Task: Check for stale content
# Use the new script for placement/staleness npm run docs:placement -- --output ${AUDIT_DIR}/stage-3-freshness.jsonl
Tier-specific staleness thresholds:
- Tier 1 (Canonical): >60 days
- Tier 2 (Foundation): >90 days
- Tier 3+ (Planning/Reference/Guides): >120 days
Additional checks:
- Outdated version references
- Deprecated terminology still used
Output:
${AUDIT_DIR}/stage-3-freshness.jsonl
Stage 3 Completion Audit
Before proceeding to Stage 4, verify:
- All 4 JSONL files exist
- Schema validation passes
- Display summary: "Stage 3 Complete: X content quality issues"
Stage 4: Format & Structure (3 Parallel Agents)
Launch these 3 agents in parallel:
Agent 4A: Markdown Lint
Task: Run markdownlint on all docs
# Note: docs:lint should lint all markdown locations: # "*.md" "docs/**/*.md" ".claude/**/*.md" npm run docs:lint > ${AUDIT_DIR}/markdownlint-raw.txt 2>&1 # Parse output into JSONL findings # Each markdownlint violation becomes a finding
Convert violations to JSONL format in
${AUDIT_DIR}/stage-4-markdownlint.jsonl
Agent 4B: Prettier Compliance
Task: Check Prettier formatting
npm run format:check -- docs/ > ${AUDIT_DIR}/prettier-raw.txt 2>&1 # Parse output for files that need formatting
Convert violations to JSONL format in
${AUDIT_DIR}/stage-4-prettier.jsonl
Agent 4C: Structure Standards
Task: Check document structure conventions
For each document, verify: 1. Frontmatter present and valid (for skill docs) 2. Required headers per tier 3. Version history format (table with Version|Date|Description) 4. Table formatting consistency (aligned pipes) 5. Code block language tags (all ``` blocks have language) 6. Heading uniqueness (no duplicate headings in same doc) Output: ${AUDIT_DIR}/stage-4-structure.jsonl
Stage 4 Completion Audit
Before proceeding to Stage 5, verify:
- All 3 JSONL files exist
- Schema validation passes
- Display summary: "Stage 4 Complete: X format issues"
Stage 5: Placement & Lifecycle (4 Parallel Agents)
Launch Agents 5A, 5B, 5C in parallel, then 5D sequentially after 5B completes:
Agent 5A: Location Validator
Task: Check documents in correct directories
Verify placement rules: - Plans → docs/plans/ or .planning/ - Archives → docs/archive/ - Templates → docs/templates/ - Audits → docs/audits/ - Tier 1 → root level - Tier 2 → docs/ or root Output: ${AUDIT_DIR}/stage-5-location.jsonl
Agent 5B: Archive Candidate Finder (Surface-Level)
Task: Quick scan for archive candidates
Surface-level detection: 1. Completed plans not archived (status: completed) 2. Session handoffs > 30 days old 3. Old audit results (> 60 days, likely in MASTER_DEBT.jsonl already) 4. Plans not referenced in current ROADMAP.md Output: ${AUDIT_DIR}/stage-5-archive-candidates-raw.jsonl
Agent 5C: Cleanup Candidate Finder
Task: Find files that should be deleted/merged
Identify: 1. Exact duplicate files (same content hash) 2. Near-empty files (< 50 words) 3. Draft files > 60 days old 4. Temp/test files (names starting with temp, test, scratch) 5. Merge candidates (fragmented docs on same topic) Output: ${AUDIT_DIR}/stage-5-cleanup-candidates.jsonl
Agent 5D: Deep Lifecycle Analysis (Runs After 5B)
Sequential dependency: Read 5B output first
Task: Detailed analysis of archive candidates
For each candidate from stage-5-archive-candidates-raw.jsonl: 1. Read the actual document content 2. Determine original purpose 3. Assess current status: - Purpose met? (completed successfully) - Overtaken? (superseded by other work) - Deprecated? (no longer relevant) 4. Check if content was consumed: - Audit findings → in MASTER_DEBT.jsonl? - Plan outcomes → documented elsewhere? 5. Provide recommendation with justification Output: ${AUDIT_DIR}/stage-5-lifecycle-analysis.jsonl Extended schema: { ...standard fields..., "purpose": "Original intent of the document", "status_reason": "Why marked for archive", "consumed_by": "Where content lives now (if applicable)", "recommendation": "ARCHIVE|DELETE|KEEP|MERGE_INTO:<target>" }
Stage 5 Completion Audit
Before proceeding to Stage 6, verify:
- All 4 JSONL files exist (5A, 5B raw, 5C, 5D analysis)
- Schema validation passes
- Display summary: "Stage 5 Complete: X lifecycle issues, Y archive candidates"
Stage 6: Synthesis & Prioritization (Sequential)
This stage runs sequentially after all parallel stages complete.
Step 6.1: Merge All Findings
# Combine all stage outputs cat ${AUDIT_DIR}/stage-2-*.jsonl \ ${AUDIT_DIR}/stage-3-*.jsonl \ ${AUDIT_DIR}/stage-4-*.jsonl \ ${AUDIT_DIR}/stage-5-location.jsonl \ ${AUDIT_DIR}/stage-5-archive-candidates-raw.jsonl \ ${AUDIT_DIR}/stage-5-cleanup-candidates.jsonl \ ${AUDIT_DIR}/stage-5-lifecycle-analysis.jsonl > ${AUDIT_DIR}/all-findings-raw.jsonl
Step 6.2: Deduplicate
Input:
${AUDIT_DIR}/all-findings-raw.jsonl Output:
${AUDIT_DIR}/all-findings-deduped.jsonl
Remove duplicates where same file:line appears from multiple agents. Keep the finding with: 1. Higher severity 2. Higher confidence 3. More evidence items
Step 6.3: Cross-Reference FALSE_POSITIVES.jsonl
Input:
${AUDIT_DIR}/all-findings-deduped.jsonl Output:
${AUDIT_DIR}/all-findings.jsonl (final file for TDMS intake)
Filter out findings matching patterns in docs/audits/FALSE_POSITIVES.jsonl: - Match by file pattern - Match by title pattern - Check expiration dates
Step 6.4: Priority Scoring
For each finding, calculate priority: priority = (severityWeight × categoryMultiplier × confidenceWeight) / effortWeight Where: - severityWeight: S0=100, S1=50, S2=20, S3=5 - categoryMultiplier: links=1.5, accuracy=1.3, freshness=1.0, format=0.8 - confidenceWeight: HIGH=1.0, MEDIUM=0.7, LOW=0.4 - effortWeight: E0=1, E1=2, E2=4, E3=8 Sort findings by priority descending.
Step 6.5: Generate Action Plan
Create three queues: 1. IMMEDIATE FIXES (S0/S1, E0/E1): - List with specific file:line and fix command 2. ARCHIVE QUEUE: - node scripts/archive-doc.js commands for each candidate 3. DELETE/MERGE QUEUE: - Justification for each deletion - Merge target for consolidations
Step 6.6: Generate Final Report
Output:
${AUDIT_DIR}/FINAL_REPORT.md
# Documentation Audit Report - [DATE] ## Executive Summary - **Total findings:** X - **By severity:** S0: X, S1: X, S2: X, S3: X - **By category:** Links: X, Content: X, Format: X, Lifecycle: X - **False positives filtered:** X ## Baseline Comparison | Metric | Before | After Fixes | | -------------------- | ------ | ----------- | | docs:check errors | X | - | | docs:sync issues | X | - | | Orphaned docs | X | - | | Stale docs (>90 day) | X | - | ## Top 20 Priority Items | # | Severity | File | Issue | Effort | | --- | -------- | ---- | ----- | ------ | | 1 | S1 | ... | ... | E0 | ## Stage-by-Stage Breakdown ### Stage 2: Link Validation - Internal link errors: X - External link errors: X - Orphaned documents: X ### Stage 3: Content Quality - Accuracy issues: X - Completeness issues: X - Coherence issues: X - Freshness issues: X ### Stage 4: Format & Structure - Markdownlint violations: X - Prettier violations: X - Structure issues: X ### Stage 5: Lifecycle - Location issues: X - Archive candidates: X - Cleanup candidates: X ## Action Plan ### Immediate Fixes (Do Now) 1. `file.md:line` - Fix description ### Archive Queue ```bash node scripts/archive-doc.js "path/to/doc.md" ```
Cleanup Queue
- DELETE:
(reason)path/to/temp-file.md - MERGE:
→fragmented.mdmain-doc.md
Recommendations
- ...
- ...
Post-Audit Actions
1. Save Outputs
Verify all files saved to
${AUDIT_DIR}/:
- stage-1-*.md, stage-1-links.json
- stage-2-*.jsonl
- stage-3-*.jsonl
- stage-4-*.jsonl
- stage-5-*.jsonl
- all-findings.jsonl (merged, deduplicated)
- FINAL_REPORT.md
2. TDMS Integration
node scripts/debt/intake-audit.js ${AUDIT_DIR}/all-findings.jsonl --source "audit-documentation-$(date +%Y-%m-%d)"
3. Update AUDIT_TRACKER.md
Add entry to "Documentation Audits" table:
| Date | Session | Commits | Files | Findings | Confidence | Validation |
|---|---|---|---|---|---|---|
| [today] | [#] | [X] | [Y] | [summary] | HIGH | PASSED |
4. Reset Threshold
Single-session audits reset the documentation category threshold.
5. Offer Fixes
Ask user: "Would you like me to fix any immediate items now?"
Category Mapping for TDMS
| Stage | Category ID Prefix | TDMS Category |
|---|---|---|
| 2 - Links | DOC-LINK-* | documentation |
| 3 - Content | DOC-CONTENT-* | documentation |
| 4 - Format | DOC-FORMAT-* | documentation |
| 5 - Lifecycle | DOC-LIFECYCLE-* | documentation |
Recovery Procedures
If Stage Fails
- Missing output file: Re-run specific agent with explicit file write
- Empty output file: Check agent for errors, re-run with verbose
- Schema validation fails: Parse errors line-by-line, fix malformed
- Context compaction: Verify AUDIT_DIR path, re-run from last checkpoint
If Context Compacts Mid-Audit
Read the partial outputs already saved to
${AUDIT_DIR}/ and resume from the
last completed stage.
Multi-AI Escalation
After 3 single-session documentation audits, a full multi-AI Documentation Audit is recommended. Track in AUDIT_TRACKER.md "Single audits completed" counter.
Update Dependencies
When modifying this skill, also update:
| Document | Section |
|---|---|
| Sync category list |
| /audit-documentation reference |
Version History
| Version | Date | Description |
|---|---|---|
| 2.0 | 2026-02-02 | Complete rewrite: 6-stage parallel audit with 18 agents |
| 1.0 | 2025-xx-xx | Original single-session sequential audit |
Documentation References
Before running this audit, review:
TDMS Integration (Required)
- PROCEDURE.md - Full TDMS workflow
- MASTER_DEBT.jsonl - Canonical debt store
- Intake command:
node scripts/debt/intake-audit.js <output.jsonl> --source "audit-documentation-<date>"
Documentation Standards (Critical for This Audit)
- JSONL_SCHEMA_STANDARD.md - Output format requirements and TDMS field mapping
- DOCUMENTATION_STANDARDS.md - The canonical guide this audit validates against (5-tier hierarchy, metadata requirements, quality protocols)