Aiwg source-unifier
Merge multiple documentation sources (docs, GitHub, PDF) with conflict detection. Use when combining docs + code for complete skill coverage.
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/addons/doc-intelligence/skills/source-unifier" ~/.claude/skills/jmagly-aiwg-source-unifier-3a55f4 && rm -rf "$T"
agentic/code/addons/doc-intelligence/skills/source-unifier/SKILL.mdSource Unifier Skill
Purpose
Single responsibility: Intelligently merge documentation from multiple sources (websites, GitHub repos, PDFs) while detecting and transparently reporting conflicts between documented and implemented behavior. (BP-4)
Grounding Checkpoint (Archetype 1 Mitigation)
Before executing, VERIFY:
- All source URLs/paths are accessible
- Each source type is correctly identified (docs, github, pdf)
- Output directory is writable
- Merge mode is specified (rule-based or AI-enhanced)
- Conflict resolution strategy is defined
DO NOT merge without inspecting each source first.
Uncertainty Escalation (Archetype 2 Mitigation)
ASK USER instead of guessing when:
- Conflict severity unclear (is doc or code authoritative?)
- Multiple valid interpretations of API signature
- Source versions don't match (v2 docs vs v3 code)
- Merge strategy produces ambiguous results
NEVER silently resolve conflicts. Always report discrepancies.
Context Scope (Archetype 3 Mitigation)
| Context Type | Included | Excluded |
|---|---|---|
| RELEVANT | All specified sources, merge config | Unrelated documentation |
| PERIPHERAL | Version history for context | Other projects |
| DISTRACTOR | Previous merge attempts | Unrelated codebases |
Conflict Types
| Type | Severity | Description | Example |
|---|---|---|---|
| Missing in code | HIGH | Documented but not implemented | API endpoint in docs, not in code |
| Missing in docs | MEDIUM | Implemented but not documented | Hidden feature in code |
| Signature mismatch | MEDIUM | Different parameters/types | vs |
| Description mismatch | LOW | Different explanations | Wording differences |
Workflow Steps
Step 1: Verify Sources (Grounding)
# Test documentation URL curl -I https://docs.example.com/ # Test GitHub repo gh repo view owner/repo --json name,description # Test PDF file file manual.pdf && pdfinfo manual.pdf
Step 2: Create Unified Configuration
{ "name": "myframework", "description": "Complete framework knowledge from docs + code", "merge_mode": "rule-based", "conflict_resolution": { "missing_in_code": "warn", "missing_in_docs": "include", "signature_mismatch": "show_both", "description_mismatch": "prefer_docs" }, "sources": [ { "type": "documentation", "base_url": "https://docs.example.com/", "extract_api": true, "max_pages": 200 }, { "type": "github", "repo": "owner/myframework", "include_code": true, "code_analysis_depth": "surface", "max_issues": 100 }, { "type": "pdf", "path": "docs/manual.pdf", "extract_tables": true } ] }
Step 3: Execute Unified Scraping
Option A: With skill-seekers
skill-seekers unified --config unified-config.json
Option B: Manual merge workflow
- Scrape each source independently
- Extract API signatures from each
- Compare and detect conflicts
- Generate merged output with conflict annotations
Step 4: Review Conflict Report
The unifier generates a conflict report:
# Conflict Report: myframework ## Summary - Total APIs analyzed: 245 - Conflicts detected: 18 - Missing in code: 3 (HIGH) - Missing in docs: 8 (MEDIUM) - Signature mismatches: 5 (MEDIUM) - Description mismatches: 2 (LOW) ## HIGH Severity Conflicts ### `deprecated_function()` - **Status**: Documented but not found in code - **Documentation**: "Use this function to..." - **Code**: NOT FOUND - **Recommendation**: Remove from docs or implement ## MEDIUM Severity Conflicts ### `process_data(input: str)` - **Status**: Signature mismatch - **Documentation**: `process_data(input: str)` - **Code**: `process_data(input: str, validate: bool = True)` - **Recommendation**: Update documentation to include `validate` parameter
Step 5: Validate Merged Output
# Check merged skill structure ls -la output/myframework/ # Verify conflict annotations grep -r "⚠️\|Conflict\|WARNING" output/myframework/references/ # Count conflict markers grep -c "Conflict" output/myframework/references/*.md
Recovery Protocol (Archetype 4 Mitigation)
On error:
- PAUSE - Preserve partial merge state
- DIAGNOSE - Check error type:
→ Skip source, note in reportSource unavailable
→ Check source format, retry with different parserParse error
→ Process sources sequentiallyMemory error
→ Increase conflict threshold or filter by severityConflict overflow
- ADAPT - Adjust merge strategy
- RETRY - Resume merge (max 3 attempts)
- ESCALATE - Present partial results, ask user for conflict resolution
Checkpoint Support
State saved to:
.aiwg/working/checkpoints/source-unifier/
checkpoints/source-unifier/ ├── source_1_docs.json # Processed docs ├── source_2_github.json # Processed GitHub ├── source_3_pdf.json # Processed PDF ├── conflicts.json # Detected conflicts └── merge_progress.json # Current merge state
Resume:
skill-seekers unified --config config.json --resume
Output Structure
output/myframework/ ├── SKILL.md # Main skill with conflict summary ├── references/ │ ├── index.md # Unified index │ ├── api_reference.md # Merged API docs (with conflict markers) │ ├── guides.md # Merged guides │ └── conflicts.md # Detailed conflict report ├── sources/ │ ├── documentation.md # Original docs content │ ├── github.md # GitHub-extracted content │ └── pdf.md # PDF-extracted content └── metadata/ ├── sources.json # Source metadata └── conflict_summary.json # Machine-readable conflicts
Conflict Markers in Output
Merged content includes inline conflict markers:
#### `process_data(input: str, validate: bool = True)` ⚠️ **Conflict**: Documentation signature differs from implementation **Documentation says:** ```python def process_data(input: str) -> dict: """Process input data and return results."""
Code implementation:
def process_data(input: str, validate: bool = True) -> dict: """Process input data with optional validation."""
Resolution: Documentation should be updated to include the
validate parameter added in v2.3.
## Merge Modes | Mode | Description | Use Case | |------|-------------|----------| | `rule-based` | Apply predefined rules for conflict resolution | Fast, deterministic | | `ai-enhanced` | Use AI to intelligently merge conflicting content | Better quality, slower | | `manual` | Generate conflicts only, user resolves | Full control | ## Troubleshooting | Issue | Diagnosis | Solution | |-------|-----------|----------| | Too many conflicts | Sources very different | Filter by severity, merge incrementally | | False positives | Parser differences | Normalize API extraction | | Missing content | Source incomplete | Add supplementary source | | Merge too slow | Large sources | Use parallel processing | ## References - Skill Seekers Unified Scraping: https://github.com/jmagly/Skill_Seekers/blob/main/docs/UNIFIED_SCRAPING.md - REF-001: Production-Grade Agentic Workflows (BP-4, BP-6 model consortium parallel) - REF-002: LLM Failure Modes (Archetype 1-4 mitigations)