Claude-Skills doc-drift-detector
git clone https://github.com/borghei/Claude-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/borghei/Claude-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/engineering/doc-drift-detector" ~/.claude/skills/borghei-claude-skills-doc-drift-detector && rm -rf "$T"
engineering/doc-drift-detector/SKILL.mdDocumentation Drift Detector
The agent detects documentation drift by mapping code directories to their docs, comparing git modification histories, extracting Python function signatures via AST, validating every markdown link and anchor, and scoring freshness on a weighted 0-100 scale. All four CLI tools use the Python standard library only.
Quick Start
# 1. Run full drift analysis on a repository python scripts/drift_analyzer.py /path/to/repo # 2. Score documentation freshness python scripts/doc_staleness_scorer.py /path/to/repo # 3. Validate API docs against Python source python scripts/api_doc_validator.py /path/to/repo/src /path/to/repo/docs/api.md # 4. Check all markdown links python scripts/link_checker.py /path/to/repo # JSON output for any tool python scripts/drift_analyzer.py /path/to/repo --json # Set failure threshold for CI python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60
All tools support
--help for full usage details.
Core Workflows
Workflow 1: Full Drift Analysis
Scan all documentation against code changes since each doc was last updated. This is the primary entry point for understanding the overall drift state of a repository.
# Basic analysis python scripts/drift_analyzer.py /path/to/repo # Analyze with custom doc patterns python scripts/drift_analyzer.py /path/to/repo --doc-patterns "*.md,*.rst,*.txt" # JSON output for tooling python scripts/drift_analyzer.py /path/to/repo --json # Only show high-severity drift python scripts/drift_analyzer.py /path/to/repo --min-severity high # Analyze specific directory python scripts/drift_analyzer.py /path/to/repo --scope src/
What it does:
- Discovers all documentation files in the repo
- For each doc, identifies the code directories it describes (via path proximity and content references)
- Compares the doc's last-modified date against the git history of its associated code
- Identifies specific changes (renamed files, moved directories, changed function signatures)
- Classifies each drift instance by category and severity
- Generates an actionable report with specific file:line references
Output example:
Documentation Drift Report ========================== Repository: /path/to/repo Scan date: 2026-03-18 Docs found: 12 Drifted: 5 HIGH SEVERITY: docs/api.md (last updated: 2026-01-15) - 23 code files changed since doc update - 4 functions renamed in src/handlers/ - 2 new modules undocumented Category: Factual + Structural Recommendation: Manual update required MEDIUM SEVERITY: README.md (last updated: 2026-02-28) - Installation section references removed dependency - Version string outdated (says 1.8.0, current 2.0.0) Category: Factual + Temporal Recommendation: Auto-fixable (version), Manual (installation)
Workflow 2: API Documentation Validation
Check that API documentation accurately reflects the actual function signatures, class definitions, and module structure in your Python source code.
# Validate API docs against source python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md # Scan entire docs directory python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive # JSON output python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md --json # Include private methods in validation python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private
What it detects:
- Functions/classes present in code but missing from docs
- Functions/classes documented but no longer in code (removed or renamed)
- Parameter mismatches (missing params, wrong types, wrong defaults)
- Deprecated items still documented as current
- Return type mismatches
- Module-level docstring drift
How it works:
The tool uses Python's
ast module to parse source files and extract function signatures, class definitions, decorators, and docstrings. It then parses the markdown documentation looking for function/class references, parameter lists, and code blocks. Mismatches are reported with exact locations in both source and documentation.
Workflow 3: README Health Check
Validate README sections against the actual project state. This combines drift analysis, link checking, and completeness scoring into a single README-focused report.
# Check README health python scripts/doc_staleness_scorer.py /path/to/repo --readme-focus # Check with custom sections python scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"
Validates:
- Required sections are present (Installation, Usage, API Reference, Contributing, License)
- Version strings match package version (package.json, setup.py, pyproject.toml)
- File references in README actually exist
- Badge URLs are well-formed
- Code examples reference existing files/functions
- Table of contents matches actual headings
Workflow 4: Link Integrity Audit
Check every link in every markdown file -- local file references, anchors, cross-document links, and optionally external URLs.
# Check all markdown links python scripts/link_checker.py /path/to/repo # Include external URL checks (slower, makes HTTP requests) python scripts/link_checker.py /path/to/repo --check-external # Check specific file python scripts/link_checker.py /path/to/repo/README.md # JSON output python scripts/link_checker.py /path/to/repo --json # Only show broken links python scripts/link_checker.py /path/to/repo --broken-only
What it checks:
- Local file references (
) -- does the file exist?[link](path/to/file.md) - Anchor references (
) -- does the heading exist?[link](#section-name) - Cross-document anchors (
) -- does the file and heading exist?[link](other.md#section) - Relative path correctness (catches
errors)../ - Case sensitivity issues (common on Linux but silent on macOS)
- Image references -- do referenced images exist?
- Duplicate anchors that would cause ambiguous links
Workflow 5: Continuous Doc Monitoring
Integrate documentation drift detection into your CI/CD pipeline for ongoing monitoring.
GitHub Actions example:
name: Documentation Drift Check on: pull_request: branches: [main, dev] push: branches: [main] jobs: doc-drift: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 # Full history for git log analysis - uses: actions/setup-python@v5 with: python-version: '3.11' - name: Run drift analysis run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json - name: Check staleness score run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50 - name: Validate API docs run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md - name: Check links run: python engineering/doc-drift-detector/scripts/link_checker.py . - name: Upload drift report if: always() uses: actions/upload-artifact@v4 with: name: drift-report path: drift-report.json
Pre-commit hook:
#!/bin/bash # .git/hooks/pre-commit # Fail commit if docs are severely stale python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet if [ $? -ne 0 ]; then echo "Documentation is critically stale. Update docs before committing." exit 1 fi
Tools
| Tool | Purpose | Lines | Key Feature |
|---|---|---|---|
| Full drift analysis between code and docs | ~550 | Git history comparison with code-to-doc mapping |
| Score documentation freshness 0-100 | ~450 | Weighted multi-dimensional scoring |
| Validate API docs against Python source | ~400 | AST-based signature extraction and comparison |
| Audit all markdown links and anchors | ~400 | Local file, anchor, and cross-document validation |
All tools:
- Python 3.8+ standard library only
- Support
for machine-readable output--json - Support
for usage details--help - Use non-zero exit codes on failure (CI/CD compatible)
- Work on any OS (Windows, macOS, Linux)
Staleness Scoring
Documentation freshness is scored on a 0-100 scale where 100 = perfectly current. The score is a weighted combination of five dimensions:
| Dimension | Weight | What It Measures |
|---|---|---|
| Last Updated | 20% | How recently the doc file was modified relative to its associated code |
| Code-Doc Alignment | 30% | Whether documented items (functions, classes, files) still exist and match |
| Link Health | 15% | Percentage of links that resolve correctly |
| Completeness | 20% | Whether expected sections are present and non-empty |
| Accuracy | 15% | Whether version strings, file paths, and other verifiable facts are correct |
Score interpretation:
| Score | Label | Action |
|---|---|---|
| 90-100 | Excellent | No action needed |
| 70-89 | Good | Minor updates recommended |
| 50-69 | Stale | Updates needed before next release |
| 30-49 | Critical | Immediate attention required |
| 0-29 | Abandoned | Full rewrite likely needed |
Customization:
# Override default weights python scripts/doc_staleness_scorer.py /path/to/repo \ --weight-updated 0.25 \ --weight-alignment 0.25 \ --weight-links 0.15 \ --weight-completeness 0.20 \ --weight-accuracy 0.15 # Set staleness thresholds python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60
Drift Categories
Every detected drift instance is classified into one or more categories:
Structural Drift
Missing or misorganized sections. A README lacks an Installation section. An API doc is missing an entire module. A CHANGELOG has no entries for the latest version.
Detection: Compare actual document headings against expected headings for that document type.
Factual Drift
Incorrect information. A function signature in the docs has the wrong parameters. An installation command references a removed package. A configuration example uses deprecated options.
Detection: Cross-reference documented facts against code analysis (AST parsing, file existence, git tags).
Referential Drift
Broken references. A link points to a file that was moved. An anchor references a heading that was renamed. An image path is wrong.
Detection: Link checker validates every reference against the filesystem and document structure.
Temporal Drift
Outdated time-sensitive content. Version strings are old. "Last updated" dates are stale. "Coming soon" items that shipped months ago. Roadmap items past their target date.
Detection: Extract version strings and dates, compare against git tags, package manifests, and current date.
Semantic Drift
Technically accurate but misleading. A description says "simple REST API" when the project now has GraphQL, gRPC, and WebSocket endpoints. The architecture overview omits a major new subsystem.
Detection: Compare document topic coverage against code directory structure and file counts. Flag when code complexity has grown significantly but documentation scope has not.
Auto-Fix vs Manual-Fix Classification
Not all drift can be fixed programmatically. The tools classify each issue:
Auto-Fixable (safe to automate)
- Version string updates -- replace old version with current from package manifest
- Date updates -- update "last modified" timestamps
- Broken local links -- suggest correct path when file was moved (git log tracks renames)
- Missing table of contents entries -- generate from actual headings
- Removed file references -- flag for deletion or suggest replacement
Manual-Fix Required (needs human judgment)
- Architectural description changes -- requires understanding intent
- API usage examples -- new examples need domain context
- Migration guides -- require understanding of breaking changes
- Getting started rewrites -- narrative flow needs human touch
- Security documentation updates -- compliance implications require review
Semi-Automated (template + human review)
- New function documentation -- generate skeleton from AST, human fills description
- Changelog entries -- generate from git commits, human edits for clarity
- README section additions -- provide template, human adds content
The drift report marks each issue with
[AUTO], [MANUAL], or [SEMI] tags.
Integration Points
With CI/CD Pipelines
All tools return non-zero exit codes when issues are found:
- Exit 0: No issues (or all within threshold)
- Exit 1: Issues found exceeding threshold
- Exit 2: Tool error (invalid arguments, missing files)
With Code Review
Add drift analysis to PR checks. When a PR modifies code in
src/, automatically check whether docs in docs/ need updates. The drift analyzer can scope its analysis to only changed directories.
With Documentation Generators
Pair with tools like Sphinx, MkDocs, or mdBook. Run API validation after doc generation to ensure the generated docs match source. Run link checker on the built output.
With Release Processes
Add staleness scoring to release checklists. Block releases if documentation score falls below threshold. Generate drift reports as release artifacts.
With Other Skills
- code-reviewer -- include doc drift in PR review reports
- senior-devops -- integrate into deployment pipelines
- senior-qa -- documentation quality as part of QA checklist
Reference Guides
| Guide | Description |
|---|---|
| Documentation Standards | README structure, API docs, changelogs, ADRs, docs-as-code |
| Drift Prevention Guide | Coupling strategies, CI gates, review checklists, prevention patterns |
Assets
| Asset | Description |
|---|---|
| Drift Report Template | Template for drift analysis reports |
| Sample Drift Data | Sample JSON for testing and demonstration |
Anti-Patterns
- Ignoring drift until release -- run drift analysis in CI on every PR, not as a release-day scramble
- Treating all drift as equal -- factual drift (wrong function signatures) is critical; temporal drift (stale dates) is cosmetic; prioritize by category
- Manual-only doc updates -- use
fixes for version strings and broken links; reserve human effort for semantic and architectural drift[AUTO] - Shallow clone in CI --
breaks git history comparison; always usefetch-depth: 1
for drift analysisfetch-depth: 0 - Skipping link checks on internal docs -- cross-document anchor references break silently on refactors; run
on every markdown changelink_checker.py
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
reports zero docs found | Repository has non-standard doc extensions or docs are in ignored directories (e.g., , ) | Use to explicitly specify extensions |
| Staleness scores are unexpectedly low | Docs reference files that were reorganized or moved to new directories | Run first to identify broken references, fix them, then re-score |
| API validator finds no source signatures | Source path points to a non-Python directory or all functions are -prefixed private | Verify contains files; add if the API surface uses private names |
| Link checker flags valid anchors as broken | Heading text contains special characters, inline code, or emoji that alter the slug | Compare the expected slug (lowercase, special chars stripped, spaces to hyphens) against the actual heading text |
| Git history comparison shows no changes | Shallow clone lacks full commit history (common in CI) | Clone with or pass to narrow the analysis window |
| External URL checks hang or time out | Target servers are slow or block automated HEAD requests | Omit for local-only validation, or run external checks in a separate non-blocking job |
Drift report marks everything as | Most detected drift is semantic or architectural, not auto-fixable | This is expected for large refactors; focus on and items first, then triage items by severity |
Success Criteria
- Zero stale docs older than 90 days -- every documentation file has been updated within the last 90 days relative to its associated code changes
- Aggregate staleness score above 80/100 -- the repository-wide freshness score stays in the "Good" or "Excellent" range
- Link integrity above 99% -- fewer than 1% of internal links (file references, anchors, cross-document links) are broken
- API doc coverage above 95% -- at least 95% of public functions and classes have corresponding entries in API documentation
- Zero high-severity drift issues in CI -- pull requests with high or critical drift are blocked before merge
- Version string accuracy at 100% -- every version reference in documentation matches the current release tag or package manifest
- Drift report turnaround under 60 seconds -- full drift analysis completes in under one minute for repositories with up to 500 documentation files
Scope & Limitations
Covers:
- Detection of documentation drift against git history for any git repository
- AST-based validation of Python API documentation (function signatures, class definitions, parameters, return types)
- Internal link validation including local files, markdown anchors, cross-document anchors, images, and case-sensitivity checks
- Multi-dimensional staleness scoring with configurable weights and CI/CD threshold enforcement
Does NOT cover:
- Non-Python source code API validation -- the AST-based validator only parses Python; for TypeScript, Go, Rust, or Java APIs, use language-specific doc generators and pair with the link checker
- External URL uptime monitoring --
performs one-shot HEAD requests but does not provide continuous monitoring; use the senior-devops skill for uptime dashboards--check-external - Automatic documentation rewriting -- tools classify issues as
,[AUTO]
, or[SEMI]
but do not generate replacement text; use the code-reviewer skill for AI-assisted doc suggestions[MANUAL] - Content quality or readability assessment -- staleness scoring measures freshness and structural completeness, not prose quality; see the standards/communication library for writing guidelines
Integration Points
| Skill | Integration | Data Flow |
|---|---|---|
| code-reviewer | Include drift report in PR review comments | output feeds into review checklists as a documentation health section |
| senior-devops | Add staleness gate to CI/CD pipelines | returns exit code 1 on failure, blocking deploys |
| senior-qa | Documentation quality as part of QA acceptance | output merges into QA dashboards alongside test coverage metrics |
| senior-fullstack | Validate generated project docs post-scaffold | Run against scaffolded directory to confirm generated API docs match source |
| senior-secops | Audit security documentation currency | detects when security docs fall behind policy changes |
| senior-architect | Architecture decision record (ADR) freshness | validates ADR completeness |
Tool Reference
drift_analyzer.py
Purpose: Scan a git repository for documentation that has fallen out of sync with code. Maps documentation files to their associated code directories, compares git modification dates, detects renamed files, version string drift, broken references, and structural gaps. Classifies every issue by category, severity, and fix type.
Usage:
python scripts/drift_analyzer.py <repo_path> [options]
Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional | (required) | Path to the git repository to analyze |
| flag | off | Output the full drift report as JSON |
| choice | | Minimum severity to include in report. Choices: , , , , |
| string | (all) | Limit code analysis to a subdirectory (e.g., ) |
| string | | Comma-separated file patterns for documentation discovery |
Example:
python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --json
Output Formats:
- Human-readable (default): Grouped by severity with
/[AUTO]
/[SEMI]
fix-type tags, category labels, and a fix-type summary[MANUAL] - JSON (
): Structured object with--json
,repository
,scan_date
(counts by severity, category, fix type), andsummary
arrayissues
Exit Codes: 0 = no high/critical issues, 1 = high or critical issues found, 2 = tool error (invalid path, not a git repo)
doc_staleness_scorer.py
Purpose: Score documentation freshness on a weighted 0-100 scale across five dimensions: last updated, code-doc alignment, link health, completeness, and accuracy. Supports CI/CD threshold gates and README-focused analysis.
Usage:
python scripts/doc_staleness_scorer.py <repo_path> [options]
Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional | (required) | Path to the git repository to score |
| flag | off | Output the full scoring report as JSON |
| float | (none) | Fail with exit code 1 if aggregate score falls below this value |
| flag | off | Only score README files (filenames starting with ) |
| string | | Comma-separated section names for completeness scoring |
| flag | off | Only print the aggregate score number (no report) |
| float | | Weight for the "last updated" dimension |
| float | | Weight for the "code-doc alignment" dimension |
| float | | Weight for the "link health" dimension |
| float | | Weight for the "completeness" dimension |
| float | | Weight for the "accuracy" dimension |
Example:
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quiet
Output Formats:
- Human-readable (default): Aggregate score with label, per-file score table sorted worst-first, and dimension breakdown with ASCII bars for the bottom 5 files
- JSON (
): Structured object with--json
,aggregate_score
,aggregate_label
, andtotal_documents
array (each withdocuments
,total_score
, and per-dimension scores/details)label - Quiet (
): Single line with the aggregate score (e.g.,--quiet
)72.3
Exit Codes: 0 = score above threshold (or no threshold set), 1 = score below threshold, 2 = tool error
api_doc_validator.py
Purpose: Extract function and class signatures from Python source files using the
ast module and compare them against API documentation in markdown files. Detects undocumented items, phantom documentation for removed code, parameter mismatches, and deprecated items.
Usage:
python scripts/api_doc_validator.py <source_path> <doc_path> [options]
Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional | (required) | Path to a Python source file or directory |
| positional | (required) | Path to API documentation file () or directory |
| flag | off | Output the validation report as JSON |
| flag | off | Recursively scan the doc directory for markdown files |
| flag | off | Include -prefixed private functions and classes in validation |
Example:
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --json
Output Formats:
- Human-readable (default): Summary counts (source signatures, documented items, issues), then issues grouped by severity with type tags, source/doc file locations, and a summary-by-type table
- JSON (
): Structured object with--json
(counts by type and severity) andsummary
array (each withissues
,type
,severity
, file/line references, andname
)description
Exit Codes: 0 = no high-severity issues, 1 = high-severity issues found (e.g., documented items missing from source), 2 = tool error
link_checker.py
Purpose: Scan markdown files for every link type (local files, anchors, cross-document anchors, images, HTML links, reference-style links) and validate them against the filesystem and document headings. Optionally validates external URLs via HTTP HEAD requests. Also detects duplicate heading anchors.
Usage:
python scripts/link_checker.py <path> [options]
Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional | (required) | File or directory to check (single file or directory for recursive scan) |
| flag | off | Output the link check report as JSON |
| flag | off | Only show broken links in the report (omit valid links from output) |
| flag | off | Also validate external URLs via HTTP HEAD requests (slower, makes network requests) |
Example:
python scripts/link_checker.py /path/to/repo --broken-only --json
Output Formats:
- Human-readable (default): Summary counts (total, valid, broken, skipped, duplicate anchors), broken links grouped by source file with line numbers and error messages, duplicate anchor list, and link-type breakdown table
- JSON (
): Structured object with--json
(counts),summary
array (each with source file, line, text, target, type, error),broken_links
map, and optionallyduplicate_anchors
(whenall_links
is not set)--broken-only
Exit Codes: 0 = no broken links and no duplicate anchors, 1 = broken links or duplicate anchors found, 2 = tool error
Last Updated: 2026-03-18 Version: 2.0.0 Tools: 4 Python CLI tools, 0 external dependencies Compatibility: Python 3.8+, any OS, any git repository