Awesome-omni-skill markdown-consolidator
Intelligent consolidation and synthesis of multiple markdown files with overlapping content and different update dates. Use when: (1) Multiple AI-generated markdown files need merging, (2) Knowledge bases have fragmented or duplicate content, (3) Documentation requires recency-aware synthesis, (4) Supporting documents need re-synthesis after AI task completion, (5) Project documentation has semantic overlap across files, (6) Periodic knowledge base maintenance and deduplication is needed.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/documentation/markdown-consolidator" ~/.claude/skills/diegosouzapw-awesome-omni-skill-markdown-consolidator && rm -rf "$T"
skills/documentation/markdown-consolidator/SKILL.mdMarkdown Consolidator
Consolidate and synthesize multiple markdown files with intelligent handling of overlapping content, different update dates, and semantic deduplication.
Core Problem
AI-assisted workflows generate fragmented documentation:
- Each AI session creates task-specific markdown files
- AI references supporting docs but doesn't update them post-task
- Knowledge becomes scattered across files with overlapping content
- Different timestamps make version reconciliation complex
Workflow Overview
1. ANALYZE → Inventory files, extract metadata, identify relationships 2. CLUSTER → Group semantically related files using content analysis 3. PLAN → Create merge strategy based on recency, overlap, authority 4. SYNTHESIZE → Merge content with intelligent conflict resolution 5. VALIDATE → Verify completeness and coherence of output
Analysis Phase
Step 1: File Inventory
Run the inventory script to analyze all markdown files:
python scripts/inventory.py <directory> --output inventory.json
The script extracts:
- File paths and sizes
- Modification timestamps (file system and YAML frontmatter)
- Section headers (H1-H6 structure)
- Word/token counts per section
- Internal links (
and[[wikilinks]]
)[markdown](links) - YAML frontmatter metadata
- Content fingerprints for similarity detection
Step 2: Relationship Mapping
python scripts/analyze_relationships.py inventory.json --output relationships.json
Identifies:
- Semantic clusters: Files covering similar topics (via TF-IDF/embedding similarity)
- Temporal chains: Files that evolved from each other (via timestamp + similarity)
- Reference graphs: Which files reference which (via link analysis)
- Conflict zones: Sections with contradictory or overlapping content
Clustering Phase
Clustering Strategies
Choose based on your consolidation goal:
Topic-based clustering (default) Groups files by semantic similarity of content.
python scripts/cluster.py relationships.json --method topic --threshold 0.6
Temporal clustering Groups files by modification date ranges.
python scripts/cluster.py relationships.json --method temporal --window 7d
Hierarchical clustering Groups by directory structure + content similarity.
python scripts/cluster.py relationships.json --method hierarchical
Cluster Output
Creates
clusters.json with structure:
{ "clusters": [ { "id": "cluster_001", "theme": "API Authentication", "files": ["auth-design.md", "oauth-notes.md", "token-handling.md"], "primary_file": "auth-design.md", "overlap_score": 0.72, "conflicts": ["token-handling.md:L45 vs oauth-notes.md:L23"] } ] }
Planning Phase
Merge Strategy Selection
Authority-based (recommended for documentation)
- Most recent file is authoritative for conflicts
- Older unique content is preserved with attribution
- Use when files represent evolving understanding
Comprehensive (for knowledge bases)
- Union of all unique information
- Conflicts flagged for manual review
- Use when completeness matters more than consistency
Canonical (for specifications)
- Designate one file as canonical
- Others provide supplementary/historical context
- Use when single source of truth is required
Create Merge Plan
python scripts/plan_merge.py clusters.json --strategy authority --output merge_plan.json
Generates actionable merge plan:
{ "cluster_id": "cluster_001", "output_file": "consolidated/authentication.md", "sections": [ { "heading": "## Overview", "sources": [{"file": "auth-design.md", "lines": "1-25", "action": "primary"}], "conflicts": [] }, { "heading": "## Token Handling", "sources": [ {"file": "token-handling.md", "lines": "10-45", "action": "primary"}, {"file": "oauth-notes.md", "lines": "20-35", "action": "supplement"} ], "conflicts": [ { "description": "Token expiry differs: 24h vs 1h", "resolution": "Use most recent (token-handling.md: 24h)" } ] } ] }
Synthesis Phase
Execute Merge
python scripts/synthesize.py merge_plan.json --output consolidated/
The synthesizer:
- Creates section-by-section merged content
- Preserves original attribution via HTML comments
- Resolves conflicts per strategy
- Maintains internal link consistency
- Updates frontmatter with merge metadata
Synthesis Rules
Content Deduplication
- Exact duplicates: Remove, keep first occurrence
- Near duplicates (>80% similarity): Merge, note sources
- Partial overlap: Keep both with clear section breaks
Conflict Resolution
Authority strategy: 1. Prefer most recently modified source 2. Prefer explicitly dated content over undated 3. Prefer longer/more detailed explanations 4. Flag unresolvable conflicts for review Comprehensive strategy: 1. Include all non-contradictory content 2. Present conflicts as "Version A / Version B" blocks 3. Add TODO markers for manual resolution
Link Handling
- Internal links updated to point to consolidated files
- Broken links flagged with
<!-- BROKEN: original-target.md --> - External links preserved as-is
Output Format
Consolidated files include:
--- title: Authentication System consolidated_from: - file: auth-design.md modified: 2024-12-01T10:30:00 - file: oauth-notes.md modified: 2024-11-28T15:45:00 - file: token-handling.md modified: 2024-12-02T09:00:00 consolidated_at: 2024-12-03T14:00:00 strategy: authority --- # Authentication System <!-- SOURCE: auth-design.md:1-25 --> ## Overview ... <!-- SOURCE: token-handling.md:10-45, SUPPLEMENTED: oauth-notes.md:20-35 --> ## Token Handling ... <!-- CONFLICT RESOLVED: Used token-handling.md (most recent) --> Token expiry is set to 24 hours...
Validation Phase
python scripts/validate.py consolidated/ --original <source_dir>
Validates:
- Completeness: All source content represented or explicitly excluded
- Link integrity: All internal links resolve
- Coherence: No contradictions in final output
- Metadata: Proper attribution and timestamps
Generates
validation_report.md:
## Consolidation Validation Report ### Coverage - 47/47 source files processed - 3 files excluded (empty/invalid) - 12 clusters created - 8 consolidated files produced ### Content Coverage - 98.3% of source content preserved - 1.7% deduplicated (exact matches) - 5 conflicts resolved automatically - 2 conflicts flagged for review ### Issues - [ ] REVIEW: consolidated/auth.md:L145 - conflicting token formats - [ ] REVIEW: consolidated/api.md:L67 - unclear which version is correct
Quick Start
For immediate consolidation of a directory:
# Full pipeline python scripts/consolidate.py <source_dir> <output_dir> --strategy authority # This runs: inventory → analyze → cluster → plan → synthesize → validate
Advanced: Incremental Updates
For ongoing maintenance:
# Detect changes since last consolidation python scripts/detect_changes.py <source_dir> --since "2024-12-01" # Re-consolidate only affected clusters python scripts/consolidate.py <source_dir> <output_dir> --incremental
Configuration
Create
.consolidator.yaml in project root:
# Files/directories to exclude exclude: - "**/archive/**" - "**/.obsidian/**" - "**/templates/**" # Similarity threshold for clustering (0-1) similarity_threshold: 0.6 # Default merge strategy default_strategy: authority # Preserve original files keep_originals: true archive_path: .consolidated-archive/ # Frontmatter fields to preserve preserve_frontmatter: - tags - aliases - created # Output format output: add_source_comments: true add_merge_frontmatter: true update_internal_links: true
Integration Patterns
With Claude Code Sessions
Add to your CLAUDE.md:
## Post-Task Consolidation After completing any task that creates or modifies markdown files: 1. Run `/project:consolidate` to update knowledge base 2. Review flagged conflicts in validation report 3. Archive original files if consolidation successful
With Basic Memory MCP
The consolidator can output in Basic Memory format:
python scripts/synthesize.py merge_plan.json --format basic-memory
Outputs files with observation/relation syntax compatible with Basic Memory's knowledge graph.
Reference Documentation
- ALGORITHMS.md - Detailed similarity/clustering algorithms
- CONFLICT-RESOLUTION.md - Conflict handling patterns
- INTEGRATION.md - Integration with other tools