Claude-Skills doc-drift-detector

install
source · Clone the upstream repo
git clone https://github.com/borghei/Claude-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/borghei/Claude-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/engineering/doc-drift-detector" ~/.claude/skills/borghei-claude-skills-doc-drift-detector && rm -rf "$T"
manifest: engineering/doc-drift-detector/SKILL.md
source content

Documentation Drift Detector

The agent detects documentation drift by mapping code directories to their docs, comparing git modification histories, extracting Python function signatures via AST, validating every markdown link and anchor, and scoring freshness on a weighted 0-100 scale. All four CLI tools use the Python standard library only.


Quick Start

# 1. Run full drift analysis on a repository
python scripts/drift_analyzer.py /path/to/repo

# 2. Score documentation freshness
python scripts/doc_staleness_scorer.py /path/to/repo

# 3. Validate API docs against Python source
python scripts/api_doc_validator.py /path/to/repo/src /path/to/repo/docs/api.md

# 4. Check all markdown links
python scripts/link_checker.py /path/to/repo

# JSON output for any tool
python scripts/drift_analyzer.py /path/to/repo --json

# Set failure threshold for CI
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

All tools support

--help
for full usage details.


Core Workflows

Workflow 1: Full Drift Analysis

Scan all documentation against code changes since each doc was last updated. This is the primary entry point for understanding the overall drift state of a repository.

# Basic analysis
python scripts/drift_analyzer.py /path/to/repo

# Analyze with custom doc patterns
python scripts/drift_analyzer.py /path/to/repo --doc-patterns "*.md,*.rst,*.txt"

# JSON output for tooling
python scripts/drift_analyzer.py /path/to/repo --json

# Only show high-severity drift
python scripts/drift_analyzer.py /path/to/repo --min-severity high

# Analyze specific directory
python scripts/drift_analyzer.py /path/to/repo --scope src/

What it does:

  1. Discovers all documentation files in the repo
  2. For each doc, identifies the code directories it describes (via path proximity and content references)
  3. Compares the doc's last-modified date against the git history of its associated code
  4. Identifies specific changes (renamed files, moved directories, changed function signatures)
  5. Classifies each drift instance by category and severity
  6. Generates an actionable report with specific file:line references

Output example:

Documentation Drift Report
==========================
Repository: /path/to/repo
Scan date:  2026-03-18
Docs found: 12
Drifted:    5

HIGH SEVERITY:
  docs/api.md (last updated: 2026-01-15)
    - 23 code files changed since doc update
    - 4 functions renamed in src/handlers/
    - 2 new modules undocumented
    Category: Factual + Structural
    Recommendation: Manual update required

MEDIUM SEVERITY:
  README.md (last updated: 2026-02-28)
    - Installation section references removed dependency
    - Version string outdated (says 1.8.0, current 2.0.0)
    Category: Factual + Temporal
    Recommendation: Auto-fixable (version), Manual (installation)

Workflow 2: API Documentation Validation

Check that API documentation accurately reflects the actual function signatures, class definitions, and module structure in your Python source code.

# Validate API docs against source
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md

# Scan entire docs directory
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive

# JSON output
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md --json

# Include private methods in validation
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private

What it detects:

  • Functions/classes present in code but missing from docs
  • Functions/classes documented but no longer in code (removed or renamed)
  • Parameter mismatches (missing params, wrong types, wrong defaults)
  • Deprecated items still documented as current
  • Return type mismatches
  • Module-level docstring drift

How it works:

The tool uses Python's

ast
module to parse source files and extract function signatures, class definitions, decorators, and docstrings. It then parses the markdown documentation looking for function/class references, parameter lists, and code blocks. Mismatches are reported with exact locations in both source and documentation.

Workflow 3: README Health Check

Validate README sections against the actual project state. This combines drift analysis, link checking, and completeness scoring into a single README-focused report.

# Check README health
python scripts/doc_staleness_scorer.py /path/to/repo --readme-focus

# Check with custom sections
python scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"

Validates:

  • Required sections are present (Installation, Usage, API Reference, Contributing, License)
  • Version strings match package version (package.json, setup.py, pyproject.toml)
  • File references in README actually exist
  • Badge URLs are well-formed
  • Code examples reference existing files/functions
  • Table of contents matches actual headings

Workflow 4: Link Integrity Audit

Check every link in every markdown file -- local file references, anchors, cross-document links, and optionally external URLs.

# Check all markdown links
python scripts/link_checker.py /path/to/repo

# Include external URL checks (slower, makes HTTP requests)
python scripts/link_checker.py /path/to/repo --check-external

# Check specific file
python scripts/link_checker.py /path/to/repo/README.md

# JSON output
python scripts/link_checker.py /path/to/repo --json

# Only show broken links
python scripts/link_checker.py /path/to/repo --broken-only

What it checks:

  • Local file references (
    [link](path/to/file.md)
    ) -- does the file exist?
  • Anchor references (
    [link](#section-name)
    ) -- does the heading exist?
  • Cross-document anchors (
    [link](other.md#section)
    ) -- does the file and heading exist?
  • Relative path correctness (catches
    ../
    errors)
  • Case sensitivity issues (common on Linux but silent on macOS)
  • Image references -- do referenced images exist?
  • Duplicate anchors that would cause ambiguous links

Workflow 5: Continuous Doc Monitoring

Integrate documentation drift detection into your CI/CD pipeline for ongoing monitoring.

GitHub Actions example:

name: Documentation Drift Check
on:
  pull_request:
    branches: [main, dev]
  push:
    branches: [main]

jobs:
  doc-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for git log analysis

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Run drift analysis
        run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json

      - name: Check staleness score
        run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50

      - name: Validate API docs
        run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md

      - name: Check links
        run: python engineering/doc-drift-detector/scripts/link_checker.py .

      - name: Upload drift report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: drift-report
          path: drift-report.json

Pre-commit hook:

#!/bin/bash
# .git/hooks/pre-commit
# Fail commit if docs are severely stale
python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet
if [ $? -ne 0 ]; then
    echo "Documentation is critically stale. Update docs before committing."
    exit 1
fi

Tools

ToolPurposeLinesKey Feature
drift_analyzer.py
Full drift analysis between code and docs~550Git history comparison with code-to-doc mapping
doc_staleness_scorer.py
Score documentation freshness 0-100~450Weighted multi-dimensional scoring
api_doc_validator.py
Validate API docs against Python source~400AST-based signature extraction and comparison
link_checker.py
Audit all markdown links and anchors~400Local file, anchor, and cross-document validation

All tools:

  • Python 3.8+ standard library only
  • Support
    --json
    for machine-readable output
  • Support
    --help
    for usage details
  • Use non-zero exit codes on failure (CI/CD compatible)
  • Work on any OS (Windows, macOS, Linux)

Staleness Scoring

Documentation freshness is scored on a 0-100 scale where 100 = perfectly current. The score is a weighted combination of five dimensions:

DimensionWeightWhat It Measures
Last Updated20%How recently the doc file was modified relative to its associated code
Code-Doc Alignment30%Whether documented items (functions, classes, files) still exist and match
Link Health15%Percentage of links that resolve correctly
Completeness20%Whether expected sections are present and non-empty
Accuracy15%Whether version strings, file paths, and other verifiable facts are correct

Score interpretation:

ScoreLabelAction
90-100ExcellentNo action needed
70-89GoodMinor updates recommended
50-69StaleUpdates needed before next release
30-49CriticalImmediate attention required
0-29AbandonedFull rewrite likely needed

Customization:

# Override default weights
python scripts/doc_staleness_scorer.py /path/to/repo \
  --weight-updated 0.25 \
  --weight-alignment 0.25 \
  --weight-links 0.15 \
  --weight-completeness 0.20 \
  --weight-accuracy 0.15

# Set staleness thresholds
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

Drift Categories

Every detected drift instance is classified into one or more categories:

Structural Drift

Missing or misorganized sections. A README lacks an Installation section. An API doc is missing an entire module. A CHANGELOG has no entries for the latest version.

Detection: Compare actual document headings against expected headings for that document type.

Factual Drift

Incorrect information. A function signature in the docs has the wrong parameters. An installation command references a removed package. A configuration example uses deprecated options.

Detection: Cross-reference documented facts against code analysis (AST parsing, file existence, git tags).

Referential Drift

Broken references. A link points to a file that was moved. An anchor references a heading that was renamed. An image path is wrong.

Detection: Link checker validates every reference against the filesystem and document structure.

Temporal Drift

Outdated time-sensitive content. Version strings are old. "Last updated" dates are stale. "Coming soon" items that shipped months ago. Roadmap items past their target date.

Detection: Extract version strings and dates, compare against git tags, package manifests, and current date.

Semantic Drift

Technically accurate but misleading. A description says "simple REST API" when the project now has GraphQL, gRPC, and WebSocket endpoints. The architecture overview omits a major new subsystem.

Detection: Compare document topic coverage against code directory structure and file counts. Flag when code complexity has grown significantly but documentation scope has not.


Auto-Fix vs Manual-Fix Classification

Not all drift can be fixed programmatically. The tools classify each issue:

Auto-Fixable (safe to automate)

  • Version string updates -- replace old version with current from package manifest
  • Date updates -- update "last modified" timestamps
  • Broken local links -- suggest correct path when file was moved (git log tracks renames)
  • Missing table of contents entries -- generate from actual headings
  • Removed file references -- flag for deletion or suggest replacement

Manual-Fix Required (needs human judgment)

  • Architectural description changes -- requires understanding intent
  • API usage examples -- new examples need domain context
  • Migration guides -- require understanding of breaking changes
  • Getting started rewrites -- narrative flow needs human touch
  • Security documentation updates -- compliance implications require review

Semi-Automated (template + human review)

  • New function documentation -- generate skeleton from AST, human fills description
  • Changelog entries -- generate from git commits, human edits for clarity
  • README section additions -- provide template, human adds content

The drift report marks each issue with

[AUTO]
,
[MANUAL]
, or
[SEMI]
tags.


Integration Points

With CI/CD Pipelines

All tools return non-zero exit codes when issues are found:

  • Exit 0: No issues (or all within threshold)
  • Exit 1: Issues found exceeding threshold
  • Exit 2: Tool error (invalid arguments, missing files)

With Code Review

Add drift analysis to PR checks. When a PR modifies code in

src/
, automatically check whether docs in
docs/
need updates. The drift analyzer can scope its analysis to only changed directories.

With Documentation Generators

Pair with tools like Sphinx, MkDocs, or mdBook. Run API validation after doc generation to ensure the generated docs match source. Run link checker on the built output.

With Release Processes

Add staleness scoring to release checklists. Block releases if documentation score falls below threshold. Generate drift reports as release artifacts.

With Other Skills

  • code-reviewer -- include doc drift in PR review reports
  • senior-devops -- integrate into deployment pipelines
  • senior-qa -- documentation quality as part of QA checklist

Reference Guides

GuideDescription
Documentation StandardsREADME structure, API docs, changelogs, ADRs, docs-as-code
Drift Prevention GuideCoupling strategies, CI gates, review checklists, prevention patterns

Assets

AssetDescription
Drift Report TemplateTemplate for drift analysis reports
Sample Drift DataSample JSON for testing and demonstration

Anti-Patterns

  • Ignoring drift until release -- run drift analysis in CI on every PR, not as a release-day scramble
  • Treating all drift as equal -- factual drift (wrong function signatures) is critical; temporal drift (stale dates) is cosmetic; prioritize by category
  • Manual-only doc updates -- use
    [AUTO]
    fixes for version strings and broken links; reserve human effort for semantic and architectural drift
  • Shallow clone in CI --
    fetch-depth: 1
    breaks git history comparison; always use
    fetch-depth: 0
    for drift analysis
  • Skipping link checks on internal docs -- cross-document anchor references break silently on refactors; run
    link_checker.py
    on every markdown change

Troubleshooting

ProblemCauseSolution
drift_analyzer.py
reports zero docs found
Repository has non-standard doc extensions or docs are in ignored directories (e.g.,
node_modules
,
dist
)
Use
--doc-patterns "*.md,*.rst,*.txt"
to explicitly specify extensions
Staleness scores are unexpectedly lowDocs reference files that were reorganized or moved to new directoriesRun
link_checker.py
first to identify broken references, fix them, then re-score
API validator finds no source signaturesSource path points to a non-Python directory or all functions are
_
-prefixed private
Verify
source_path
contains
.py
files; add
--include-private
if the API surface uses private names
Link checker flags valid anchors as brokenHeading text contains special characters, inline code, or emoji that alter the slugCompare the expected slug (lowercase, special chars stripped, spaces to hyphens) against the actual heading text
Git history comparison shows no changesShallow clone lacks full commit history (common in CI)Clone with
fetch-depth: 0
or pass
--scope
to narrow the analysis window
External URL checks hang or time outTarget servers are slow or block automated HEAD requestsOmit
--check-external
for local-only validation, or run external checks in a separate non-blocking job
Drift report marks everything as
[MANUAL]
Most detected drift is semantic or architectural, not auto-fixableThis is expected for large refactors; focus on
[AUTO]
and
[SEMI]
items first, then triage
[MANUAL]
items by severity

Success Criteria

  • Zero stale docs older than 90 days -- every documentation file has been updated within the last 90 days relative to its associated code changes
  • Aggregate staleness score above 80/100 -- the repository-wide freshness score stays in the "Good" or "Excellent" range
  • Link integrity above 99% -- fewer than 1% of internal links (file references, anchors, cross-document links) are broken
  • API doc coverage above 95% -- at least 95% of public functions and classes have corresponding entries in API documentation
  • Zero high-severity drift issues in CI -- pull requests with high or critical drift are blocked before merge
  • Version string accuracy at 100% -- every version reference in documentation matches the current release tag or package manifest
  • Drift report turnaround under 60 seconds -- full drift analysis completes in under one minute for repositories with up to 500 documentation files

Scope & Limitations

Covers:

  • Detection of documentation drift against git history for any git repository
  • AST-based validation of Python API documentation (function signatures, class definitions, parameters, return types)
  • Internal link validation including local files, markdown anchors, cross-document anchors, images, and case-sensitivity checks
  • Multi-dimensional staleness scoring with configurable weights and CI/CD threshold enforcement

Does NOT cover:

  • Non-Python source code API validation -- the AST-based validator only parses Python; for TypeScript, Go, Rust, or Java APIs, use language-specific doc generators and pair with the link checker
  • External URL uptime monitoring --
    --check-external
    performs one-shot HEAD requests but does not provide continuous monitoring; use the senior-devops skill for uptime dashboards
  • Automatic documentation rewriting -- tools classify issues as
    [AUTO]
    ,
    [SEMI]
    , or
    [MANUAL]
    but do not generate replacement text; use the code-reviewer skill for AI-assisted doc suggestions
  • Content quality or readability assessment -- staleness scoring measures freshness and structural completeness, not prose quality; see the standards/communication library for writing guidelines

Integration Points

SkillIntegrationData Flow
code-reviewerInclude drift report in PR review comments
drift_analyzer.py --json
output feeds into review checklists as a documentation health section
senior-devopsAdd staleness gate to CI/CD pipelines
doc_staleness_scorer.py --threshold 50
returns exit code 1 on failure, blocking deploys
senior-qaDocumentation quality as part of QA acceptance
link_checker.py --json
output merges into QA dashboards alongside test coverage metrics
senior-fullstackValidate generated project docs post-scaffoldRun
api_doc_validator.py
against scaffolded
docs/
directory to confirm generated API docs match source
senior-secopsAudit security documentation currency
drift_analyzer.py --scope security/
detects when security docs fall behind policy changes
senior-architectArchitecture decision record (ADR) freshness
doc_staleness_scorer.py --required-sections "Status,Context,Decision,Consequences"
validates ADR completeness

Tool Reference

drift_analyzer.py

Purpose: Scan a git repository for documentation that has fallen out of sync with code. Maps documentation files to their associated code directories, compares git modification dates, detects renamed files, version string drift, broken references, and structural gaps. Classifies every issue by category, severity, and fix type.

Usage:

python scripts/drift_analyzer.py <repo_path> [options]

Parameters:

FlagTypeDefaultDescription
repo_path
positional(required)Path to the git repository to analyze
--json
flagoffOutput the full drift report as JSON
--min-severity
choice
low
Minimum severity to include in report. Choices:
critical
,
high
,
medium
,
low
,
info
--scope
string
""
(all)
Limit code analysis to a subdirectory (e.g.,
src/
)
--doc-patterns
string
*.md,*.rst,*.txt,*.adoc
Comma-separated file patterns for documentation discovery

Example:

python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --json

Output Formats:

  • Human-readable (default): Grouped by severity with
    [AUTO]
    /
    [SEMI]
    /
    [MANUAL]
    fix-type tags, category labels, and a fix-type summary
  • JSON (
    --json
    ): Structured object with
    repository
    ,
    scan_date
    ,
    summary
    (counts by severity, category, fix type), and
    issues
    array

Exit Codes: 0 = no high/critical issues, 1 = high or critical issues found, 2 = tool error (invalid path, not a git repo)


doc_staleness_scorer.py

Purpose: Score documentation freshness on a weighted 0-100 scale across five dimensions: last updated, code-doc alignment, link health, completeness, and accuracy. Supports CI/CD threshold gates and README-focused analysis.

Usage:

python scripts/doc_staleness_scorer.py <repo_path> [options]

Parameters:

FlagTypeDefaultDescription
repo_path
positional(required)Path to the git repository to score
--json
flagoffOutput the full scoring report as JSON
--threshold
float(none)Fail with exit code 1 if aggregate score falls below this value
--readme-focus
flagoffOnly score README files (filenames starting with
readme
)
--required-sections
string
Installation,Usage,API,Contributing,License
Comma-separated section names for completeness scoring
--quiet
flagoffOnly print the aggregate score number (no report)
--weight-updated
float
0.20
Weight for the "last updated" dimension
--weight-alignment
float
0.30
Weight for the "code-doc alignment" dimension
--weight-links
float
0.15
Weight for the "link health" dimension
--weight-completeness
float
0.20
Weight for the "completeness" dimension
--weight-accuracy
float
0.15
Weight for the "accuracy" dimension

Example:

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quiet

Output Formats:

  • Human-readable (default): Aggregate score with label, per-file score table sorted worst-first, and dimension breakdown with ASCII bars for the bottom 5 files
  • JSON (
    --json
    ): Structured object with
    aggregate_score
    ,
    aggregate_label
    ,
    total_documents
    , and
    documents
    array (each with
    total_score
    ,
    label
    , and per-dimension scores/details)
  • Quiet (
    --quiet
    ): Single line with the aggregate score (e.g.,
    72.3
    )

Exit Codes: 0 = score above threshold (or no threshold set), 1 = score below threshold, 2 = tool error


api_doc_validator.py

Purpose: Extract function and class signatures from Python source files using the

ast
module and compare them against API documentation in markdown files. Detects undocumented items, phantom documentation for removed code, parameter mismatches, and deprecated items.

Usage:

python scripts/api_doc_validator.py <source_path> <doc_path> [options]

Parameters:

FlagTypeDefaultDescription
source_path
positional(required)Path to a Python source file or directory
doc_path
positional(required)Path to API documentation file (
.md
) or directory
--json
flagoffOutput the validation report as JSON
--recursive
flagoffRecursively scan the doc directory for markdown files
--include-private
flagoffInclude
_
-prefixed private functions and classes in validation

Example:

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --json

Output Formats:

  • Human-readable (default): Summary counts (source signatures, documented items, issues), then issues grouped by severity with type tags, source/doc file locations, and a summary-by-type table
  • JSON (
    --json
    ): Structured object with
    summary
    (counts by type and severity) and
    issues
    array (each with
    type
    ,
    severity
    ,
    name
    , file/line references, and
    description
    )

Exit Codes: 0 = no high-severity issues, 1 = high-severity issues found (e.g., documented items missing from source), 2 = tool error


link_checker.py

Purpose: Scan markdown files for every link type (local files, anchors, cross-document anchors, images, HTML links, reference-style links) and validate them against the filesystem and document headings. Optionally validates external URLs via HTTP HEAD requests. Also detects duplicate heading anchors.

Usage:

python scripts/link_checker.py <path> [options]

Parameters:

FlagTypeDefaultDescription
path
positional(required)File or directory to check (single
.md
file or directory for recursive scan)
--json
flagoffOutput the link check report as JSON
--broken-only
flagoffOnly show broken links in the report (omit valid links from output)
--check-external
flagoffAlso validate external URLs via HTTP HEAD requests (slower, makes network requests)

Example:

python scripts/link_checker.py /path/to/repo --broken-only --json

Output Formats:

  • Human-readable (default): Summary counts (total, valid, broken, skipped, duplicate anchors), broken links grouped by source file with line numbers and error messages, duplicate anchor list, and link-type breakdown table
  • JSON (
    --json
    ): Structured object with
    summary
    (counts),
    broken_links
    array (each with source file, line, text, target, type, error),
    duplicate_anchors
    map, and optionally
    all_links
    (when
    --broken-only
    is not set)

Exit Codes: 0 = no broken links and no duplicate anchors, 1 = broken links or duplicate anchors found, 2 = tool error


Last Updated: 2026-03-18 Version: 2.0.0 Tools: 4 Python CLI tools, 0 external dependencies Compatibility: Python 3.8+, any OS, any git repository