Awesome-omni-skill markdown-consolidator

Intelligent consolidation and synthesis of multiple markdown files with overlapping content and different update dates. Use when: (1) Multiple AI-generated markdown files need merging, (2) Knowledge bases have fragmented or duplicate content, (3) Documentation requires recency-aware synthesis, (4) Supporting documents need re-synthesis after AI task completion, (5) Project documentation has semantic overlap across files, (6) Periodic knowledge base maintenance and deduplication is needed.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/documentation/markdown-consolidator" ~/.claude/skills/diegosouzapw-awesome-omni-skill-markdown-consolidator && rm -rf "$T"
manifest: skills/documentation/markdown-consolidator/SKILL.md
source content

Markdown Consolidator

Consolidate and synthesize multiple markdown files with intelligent handling of overlapping content, different update dates, and semantic deduplication.

Core Problem

AI-assisted workflows generate fragmented documentation:

  • Each AI session creates task-specific markdown files
  • AI references supporting docs but doesn't update them post-task
  • Knowledge becomes scattered across files with overlapping content
  • Different timestamps make version reconciliation complex

Workflow Overview

1. ANALYZE  → Inventory files, extract metadata, identify relationships
2. CLUSTER  → Group semantically related files using content analysis
3. PLAN     → Create merge strategy based on recency, overlap, authority
4. SYNTHESIZE → Merge content with intelligent conflict resolution
5. VALIDATE → Verify completeness and coherence of output

Analysis Phase

Step 1: File Inventory

Run the inventory script to analyze all markdown files:

python scripts/inventory.py <directory> --output inventory.json

The script extracts:

  • File paths and sizes
  • Modification timestamps (file system and YAML frontmatter)
  • Section headers (H1-H6 structure)
  • Word/token counts per section
  • Internal links (
    [[wikilinks]]
    and
    [markdown](links)
    )
  • YAML frontmatter metadata
  • Content fingerprints for similarity detection

Step 2: Relationship Mapping

python scripts/analyze_relationships.py inventory.json --output relationships.json

Identifies:

  • Semantic clusters: Files covering similar topics (via TF-IDF/embedding similarity)
  • Temporal chains: Files that evolved from each other (via timestamp + similarity)
  • Reference graphs: Which files reference which (via link analysis)
  • Conflict zones: Sections with contradictory or overlapping content

Clustering Phase

Clustering Strategies

Choose based on your consolidation goal:

Topic-based clustering (default) Groups files by semantic similarity of content.

python scripts/cluster.py relationships.json --method topic --threshold 0.6

Temporal clustering Groups files by modification date ranges.

python scripts/cluster.py relationships.json --method temporal --window 7d

Hierarchical clustering Groups by directory structure + content similarity.

python scripts/cluster.py relationships.json --method hierarchical

Cluster Output

Creates

clusters.json
with structure:

{
  "clusters": [
    {
      "id": "cluster_001",
      "theme": "API Authentication",
      "files": ["auth-design.md", "oauth-notes.md", "token-handling.md"],
      "primary_file": "auth-design.md",
      "overlap_score": 0.72,
      "conflicts": ["token-handling.md:L45 vs oauth-notes.md:L23"]
    }
  ]
}

Planning Phase

Merge Strategy Selection

Authority-based (recommended for documentation)

  • Most recent file is authoritative for conflicts
  • Older unique content is preserved with attribution
  • Use when files represent evolving understanding

Comprehensive (for knowledge bases)

  • Union of all unique information
  • Conflicts flagged for manual review
  • Use when completeness matters more than consistency

Canonical (for specifications)

  • Designate one file as canonical
  • Others provide supplementary/historical context
  • Use when single source of truth is required

Create Merge Plan

python scripts/plan_merge.py clusters.json --strategy authority --output merge_plan.json

Generates actionable merge plan:

{
  "cluster_id": "cluster_001",
  "output_file": "consolidated/authentication.md",
  "sections": [
    {
      "heading": "## Overview",
      "sources": [{"file": "auth-design.md", "lines": "1-25", "action": "primary"}],
      "conflicts": []
    },
    {
      "heading": "## Token Handling",
      "sources": [
        {"file": "token-handling.md", "lines": "10-45", "action": "primary"},
        {"file": "oauth-notes.md", "lines": "20-35", "action": "supplement"}
      ],
      "conflicts": [
        {
          "description": "Token expiry differs: 24h vs 1h",
          "resolution": "Use most recent (token-handling.md: 24h)"
        }
      ]
    }
  ]
}

Synthesis Phase

Execute Merge

python scripts/synthesize.py merge_plan.json --output consolidated/

The synthesizer:

  1. Creates section-by-section merged content
  2. Preserves original attribution via HTML comments
  3. Resolves conflicts per strategy
  4. Maintains internal link consistency
  5. Updates frontmatter with merge metadata

Synthesis Rules

Content Deduplication

  • Exact duplicates: Remove, keep first occurrence
  • Near duplicates (>80% similarity): Merge, note sources
  • Partial overlap: Keep both with clear section breaks

Conflict Resolution

Authority strategy:
  1. Prefer most recently modified source
  2. Prefer explicitly dated content over undated
  3. Prefer longer/more detailed explanations
  4. Flag unresolvable conflicts for review

Comprehensive strategy:
  1. Include all non-contradictory content
  2. Present conflicts as "Version A / Version B" blocks
  3. Add TODO markers for manual resolution

Link Handling

  • Internal links updated to point to consolidated files
  • Broken links flagged with
    <!-- BROKEN: original-target.md -->
  • External links preserved as-is

Output Format

Consolidated files include:

---
title: Authentication System
consolidated_from:
  - file: auth-design.md
    modified: 2024-12-01T10:30:00
  - file: oauth-notes.md
    modified: 2024-11-28T15:45:00
  - file: token-handling.md
    modified: 2024-12-02T09:00:00
consolidated_at: 2024-12-03T14:00:00
strategy: authority
---

# Authentication System

<!-- SOURCE: auth-design.md:1-25 -->
## Overview
...

<!-- SOURCE: token-handling.md:10-45, SUPPLEMENTED: oauth-notes.md:20-35 -->
## Token Handling
...

<!-- CONFLICT RESOLVED: Used token-handling.md (most recent) -->
Token expiry is set to 24 hours...

Validation Phase

python scripts/validate.py consolidated/ --original <source_dir>

Validates:

  • Completeness: All source content represented or explicitly excluded
  • Link integrity: All internal links resolve
  • Coherence: No contradictions in final output
  • Metadata: Proper attribution and timestamps

Generates

validation_report.md
:

## Consolidation Validation Report

### Coverage
- 47/47 source files processed
- 3 files excluded (empty/invalid)
- 12 clusters created
- 8 consolidated files produced

### Content Coverage
- 98.3% of source content preserved
- 1.7% deduplicated (exact matches)
- 5 conflicts resolved automatically
- 2 conflicts flagged for review

### Issues
- [ ] REVIEW: consolidated/auth.md:L145 - conflicting token formats
- [ ] REVIEW: consolidated/api.md:L67 - unclear which version is correct

Quick Start

For immediate consolidation of a directory:

# Full pipeline
python scripts/consolidate.py <source_dir> <output_dir> --strategy authority

# This runs: inventory → analyze → cluster → plan → synthesize → validate

Advanced: Incremental Updates

For ongoing maintenance:

# Detect changes since last consolidation
python scripts/detect_changes.py <source_dir> --since "2024-12-01"

# Re-consolidate only affected clusters
python scripts/consolidate.py <source_dir> <output_dir> --incremental

Configuration

Create

.consolidator.yaml
in project root:

# Files/directories to exclude
exclude:
  - "**/archive/**"
  - "**/.obsidian/**"
  - "**/templates/**"

# Similarity threshold for clustering (0-1)
similarity_threshold: 0.6

# Default merge strategy
default_strategy: authority

# Preserve original files
keep_originals: true
archive_path: .consolidated-archive/

# Frontmatter fields to preserve
preserve_frontmatter:
  - tags
  - aliases
  - created

# Output format
output:
  add_source_comments: true
  add_merge_frontmatter: true
  update_internal_links: true

Integration Patterns

With Claude Code Sessions

Add to your CLAUDE.md:

## Post-Task Consolidation

After completing any task that creates or modifies markdown files:
1. Run `/project:consolidate` to update knowledge base
2. Review flagged conflicts in validation report
3. Archive original files if consolidation successful

With Basic Memory MCP

The consolidator can output in Basic Memory format:

python scripts/synthesize.py merge_plan.json --format basic-memory

Outputs files with observation/relation syntax compatible with Basic Memory's knowledge graph.

Reference Documentation