Aiwg doc-consolidate

Crawl repository for scattered docs and consolidate into categorized reference index in .aiwg/docs/

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/sdlc-complete/skills/doc-consolidate" ~/.claude/skills/jmagly-aiwg-doc-consolidate-f71f03 && rm -rf "$T"
manifest: agentic/code/frameworks/sdlc-complete/skills/doc-consolidate/SKILL.md
source content

Doc Consolidate

You are the Doc Consolidate Orchestrator — crawling a repository for documentation scattered across directories and building a consolidated reference index in

.aiwg/docs/
for planning, ops, semantic memory, and release-associated documentation.

Core Philosophy

Repos accumulate docs everywhere: README files in every package, deployment guides in

ops/
, release notes at root, API docs in
docs/api/
, troubleshooting buried in wiki-style folders. This skill answers: "what docs exist, where are they, and what are they for?"

No file duplication. Copying docs creates parallel drift. Instead, build a reference manifest and lightweight stubs that point to originals.

Natural Language Triggers

Users may say:

  • "consolidate docs"
  • "doc consolidate"
  • "find all docs"
  • "catalog docs"
  • "inventory docs"
  • "doc inventory"
  • "where are all the docs"
  • "what docs do we have"
  • "gather documentation"
  • "index all documentation"
  • "consolidate documentation for planning"

Parameters

--dry-run (optional)

Preview all discovered docs and their classifications without writing any files. Produces the same report as a live run but with no mutations.

--scope
<path>
(optional)

Limit discovery to a subtree. Useful for large monorepos.

/doc-consolidate --scope docs/
/doc-consolidate --scope packages/auth/

--incremental (optional)

Only process files changed since the last successful run. Reads timestamp from

.aiwg/reports/doc-consolidate-last-run.json
. Falls back to full scan if no prior run exists.

--prefix
<dir>
(optional)

Target a different project directory instead of the current working directory.

Categories

CategoryWhat belongs herePath heuristics
release
Changelogs, release notes, version announcements
CHANGELOG*
,
docs/releases/*
,
*release-note*
user
User guides, tutorials, quickstarts, how-tos
README*
,
docs/getting-started*
,
docs/quickstart*
,
docs/guide*
api
API references, SDK docs, integration guides
docs/api/*
,
*-reference.md
,
*-api.md
,
*cli-reference*
deployment
Deploy guides, runbooks, infra docs
docs/deploy*
,
ops/*
,
*runbook*
,
*infrastructure*
planning
Roadmaps, RFCs, proposals, ADRs
*roadmap*
,
*rfc*
,
*proposal*
,
ADR-*
,
docs/planning/*
communications
Announcements, blog posts, marketing
*announcement*
,
*blog*
,
*press*
,
*marketing*
help
Troubleshooting, FAQ, support docs
*troubleshoot*
,
FAQ*
,
*support*
,
*error-reference*
development
Contributing guides, dev setup, coding standards
CONTRIBUTING*
,
docs/development/*
,
docs/contributing*
,
*coding-standard*

Each doc gets a

primary
category and optional
tags
for secondary categorization.

Output Files

FilePurpose
.aiwg/docs/_manifest.yaml
Master index: path, category, title, summary, confidence, tags
.aiwg/docs/{category}/
Stub files referencing originals via
@
-mentions
.aiwg/reports/doc-consolidate-{timestamp}.md
Run report with stats and low-confidence flags
.aiwg/reports/doc-consolidate-last-run.json
Incremental state (timestamp, file list, checksums)

Execution Flow

Phase 1: Discovery

  1. Parse flags:
    --dry-run
    ,
    --scope
    ,
    --incremental
    ,
    --prefix
  2. If
    --incremental
    : load
    .aiwg/reports/doc-consolidate-last-run.json
  3. Walk repository recursively using Glob, collecting doc-like files:

Include patterns:

**/*.md
**/*.txt (only in docs/, doc/, documentation/ directories)
**/*.rst
**/*.adoc
**/README*
**/CHANGELOG*
**/CONTRIBUTING*
**/LICENSE*

Exclude patterns:

node_modules/**
.git/**
vendor/**
dist/**
build/**
.aiwg/docs/**       (output directory — avoid self-reference)
**/*.min.*
**/package-lock.json
  1. For each file, extract:

    • path
      — relative to project root
    • title
      — first H1 heading, or filename if no heading
    • summary
      — first non-empty paragraph (max 200 chars)
    • size
      — file size in bytes
    • lastModified
      — from git log or file mtime
  2. If

    --incremental
    : filter to files with mtime newer than last run, or use
    git diff --name-only --since={lastRun}
    for precision

  3. Report discovery results:

Discovery complete: {N} doc-like files found
  Scope: {scope or "full repo"}
  New since last run: {M} (if incremental)

Phase 2: Classification

Classify each discovered file into a category using a two-pass strategy:

Pass 1: Path heuristics (fast, deterministic)

Apply pattern matching on the file path. Rules evaluated in order, first match wins:

release:
  - path matches: CHANGELOG*, */CHANGELOG*
  - path matches: docs/releases/*, */releases/*
  - path matches: *release-note*, *release_note*
  - filename matches: RELEASES*, HISTORY*

user:
  - path matches: README*, */README*
  - path matches: docs/getting-started*, docs/quickstart*
  - path matches: docs/guide*, docs/tutorial*
  - path matches: docs/usage*, docs/install*

api:
  - path matches: docs/api/*, */api-docs/*
  - path matches: *-reference.md, *-api.md
  - path matches: *cli-reference*, *sdk-*
  - path matches: docs/integrations/*

deployment:
  - path matches: docs/deploy*, */deploy/*
  - path matches: ops/*, */ops/*
  - path matches: *runbook*, *infrastructure*
  - path matches: *docker*, *kubernetes*, *k8s*
  - path matches: docs/install/non-interactive*

planning:
  - path matches: *roadmap*, docs/roadmap*
  - path matches: *rfc*, docs/rfc/*
  - path matches: *proposal*, docs/proposals/*
  - path matches: ADR-*, */ADR-*, *adr-*
  - path matches: docs/planning/*

communications:
  - path matches: *announcement*, */announcements/*
  - path matches: docs/releases/*-announcement*
  - path matches: *blog*, *press*, *marketing*
  - path matches: *newsletter*

help:
  - path matches: *troubleshoot*, */troubleshooting/*
  - path matches: FAQ*, */FAQ*
  - path matches: *support*, docs/support/*
  - path matches: *error-reference*, *known-issues*

development:
  - path matches: CONTRIBUTING*, */CONTRIBUTING*
  - path matches: docs/development/*, docs/contributing/*
  - path matches: *coding-standard*, *style-guide*
  - path matches: docs/architecture/*, *dev-guide*

Confidence:

high
for path match.

Pass 2: Content analysis (for unmatched files)

For files that didn't match any path heuristic, read the first 500 characters and classify by keyword presence:

KeywordsCategory
version, release, changelog, breaking change, migration
release
getting started, tutorial, how to, step by step, quickstart
user
endpoint, API, request, response, parameter, authentication
api
deploy, server, infrastructure, docker, kubernetes, production
deployment
roadmap, milestone, planned, RFC, proposal, decision
planning
announcement, launch, update, community
communications
troubleshoot, FAQ, error, fix, solution, workaround
help
contributing, development, build, test, lint, setup
development

Confidence:

medium
for single-keyword match,
low
for no clear match (defaults to
user
).

Output: Each file gets

{ path, category, confidence, title, summary, tags }

Report classification results:

Classification complete:
  release: {N}    user: {N}    api: {N}
  deployment: {N} planning: {N} communications: {N}
  help: {N}       development: {N}
  Low confidence: {N} (review recommended)

Phase 3: Manifest Generation

If

--dry-run
: skip writing, proceed to report only.

Write

.aiwg/docs/_manifest.yaml
:

version: 1
generated: "2026-04-06T04:00:00Z"
project: /path/to/repo
total: 47
categories:
  release:
    - path: CHANGELOG.md
      title: Changelog
      summary: "All notable changes to this project..."
      confidence: high
      tags: [release, history]
      lastModified: "2026-04-05"
      size: 24500
    - path: docs/releases/v2026.4.0-announcement.md
      title: "v2026.4.0 Release"
      summary: "AIWG v2026.4.0 brings the web dashboard..."
      confidence: high
      tags: [release, communications]
      lastModified: "2026-04-04"
      size: 8200
  user:
    - path: README.md
      title: AIWG
      summary: "Framework for improving AI-generated content..."
      confidence: high
      tags: [user, overview]
      lastModified: "2026-04-05"
      size: 12000
  # ... remaining categories
stats:
  total: 47
  by_category:
    release: 5
    user: 12
    api: 8
    deployment: 3
    planning: 4
    communications: 5
    help: 3
    development: 7
  by_confidence:
    high: 38
    medium: 6
    low: 3

Phase 4: Stub Generation + Report

If

--dry-run
: skip stub creation, write report only.

Stub generation:

For each category with entries, ensure

.aiwg/docs/{category}/
exists. For each doc in the category, write a stub file:

<!-- Auto-generated by doc-consolidate. Do not edit — edit the source file instead. -->
# {title}

> {summary}

**Source**: @{path}
**Category**: {category}
**Confidence**: {confidence}
**Last modified**: {lastModified}
**Tags**: {tags joined by comma}

Stub filename: sanitized version of the source path (e.g.,

docs/cli-reference.md
becomes
docs--cli-reference.md
).

Run report: Write

.aiwg/reports/doc-consolidate-{timestamp}.md
:

# Doc Consolidate Report — {timestamp}

## Summary
- **Total docs discovered**: {N}
- **Categories**: {breakdown}
- **Low confidence**: {N} (listed below for review)
- **Mode**: {dry-run | live}
- **Scope**: {scope or full repo}

## By Category

### release ({N})
| File | Title | Confidence |
|------|-------|------------|
| CHANGELOG.md | Changelog | high |
| ... | ... | ... |

### user ({N})
...

## Low Confidence (Review Recommended)
| File | Assigned Category | Why |
|------|-------------------|-----|
| docs/misc/notes.md | user | No heuristic match, keyword: "how to" |

## Coverage Gaps
- No docs found in: {categories with 0 entries}

Incremental state: Write

.aiwg/reports/doc-consolidate-last-run.json
:

{
  "timestamp": "2026-04-06T04:00:00Z",
  "totalFiles": 47,
  "byCategory": { "release": 5, "user": 12, ... },
  "files": {
    "CHANGELOG.md": { "checksum": "sha256:abc...", "category": "release" },
    "README.md": { "checksum": "sha256:def...", "category": "user" }
  }
}

Integration Points

SystemHow
doc-syncAfter consolidation,
doc-sync
can read
_manifest.yaml
to locate all docs for drift detection instead of re-scanning
build-artifact-indexStubs in
.aiwg/docs/
appear in
aiwg index query
results
ralph-memoryWrite a memory entry with consolidation stats so future agent loops know doc coverage without re-scanning
fortemiManifest entries can be ingested as structured notes for semantic search across session history
intake-from-codebaseCan consume prior consolidation results from
_manifest.yaml
instead of re-analyzing docs

Examples

Full consolidation

/doc-consolidate

Preview without writing

/doc-consolidate --dry-run

Scope to docs directory

/doc-consolidate --scope docs/

Incremental update

/doc-consolidate --incremental

Target a different project

/doc-consolidate --prefix /path/to/other-project

Guided consolidation

User: "consolidate all the docs, focus on release and deployment docs"
→ Orchestrator runs full consolidation, highlights release and deployment categories in report

Safety and Guardrails

  1. Never modify source docs — only write to
    .aiwg/docs/
    and
    .aiwg/reports/
  2. Stubs are disposable — safe to delete
    .aiwg/docs/
    and re-run
  3. Low-confidence files flagged — never silently misclassify; report for human review
  4. Dry-run first — recommend
    --dry-run
    before first live run on a new repo
  5. Incremental is safe — merges with existing manifest, doesn't drop previous entries
  6. No git operations — does not commit, push, or modify git state

Completion Criteria

completion:
  required:
    - manifest_written: true          # .aiwg/docs/_manifest.yaml exists and is valid YAML
    - stubs_generated: true           # .aiwg/docs/{category}/ directories populated
    - report_written: true            # .aiwg/reports/doc-consolidate-{ts}.md
    - incremental_state_saved: true   # .aiwg/reports/doc-consolidate-last-run.json
  optional:
    - low_confidence_zero: false      # Not required — low confidence files are flagged, not blocked
    - all_categories_populated: false # Some repos won't have all 8 categories