Aiwg check-completeness
Analyze collection completeness against canonical discography and generate prioritized gap report
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/check-completeness" ~/.claude/skills/jmagly-aiwg-check-completeness && rm -rf "$T"
.agents/skills/check-completeness/SKILL.mdcheck-completeness
Analyze collection completeness by comparing actual inventory against canonical discography, identify gaps, score coverage, and generate prioritized acquisition plans.
Purpose
Answer collection completeness questions:
- "How complete is my [artist] collection?"
- "What am I missing?"
- "What should I acquire next?"
- "What legendary performances am I missing?"
- "Where should I upgrade quality?"
Produces comprehensive gap reports with actionable acquisition strategies prioritized by importance, availability, and cost.
Parameters
Required
- Artist name (quoted if contains spaces)<artist>
- Matched against collection directory structure
- Used to fetch canonical discography from MusicBrainz
- Example:
or"Pink Floyd""The Beatles"
Optional
- Collection root directory--collection <path>
- Default: Current working directory
- Must contain artist subdirectories
- Example:
/mnt/media/Music
- Show only missing items, skip completeness scoring--gaps-only
- Faster execution when you just need gap list
- Useful for quick "what's missing?" queries
- Filter gaps by priority level--priority <level>
- Values:
,high
,medium
,low
(default:all
)all - Example:
shows only critical gaps--priority high
- Write report to file instead of stdout--output <file>
- Format determined by extension:
,.yaml
,.json.md - Default: stdout (terminal display)
- Example:
--output gaps-report.yaml
- Canonical discography source--canonical-source <source>
- Values:
(default),musicbrainz
,discogsallmusic - Falls back to secondary sources if primary fails
- Example:
--canonical-source discogs
- Include legendary/bootleg content in analysis--include-legendary
- Default: Official releases only
- Adds section for high-priority unofficial recordings
- Requires legendary catalog file or web search
- Minimum quality for "complete" status--quality-threshold <level>
- Values:
,lossless
,320kbps
,256kbps
(default:any
)any - Example:
treats 320kbps as needing upgrade--quality-threshold lossless
- Generate GAP-NOTE.md files for missing items--create-gap-notes
- Creates placeholder files in artist directory structure
- Each contains acquisition strategy and priority
- Enables parallel gap-fill workflow
- Incremental update mode--incremental
- Loads previous state file if exists
- Only re-fetches canonical if >7 days old
- Faster for routine monitoring
- Include cost estimates in report--cost-estimate
- Estimates based on typical pricing for format/availability
- Adds budget planning section to output
- Example range: "$15-25 for used CD"
Workflow
1. Load Collection Inventory
# Use existing scan results if available SCAN_FILE=".media-curator/scans/${ARTIST_SLUG}/scan-latest.json" if [[ -f "$SCAN_FILE" ]]; then INVENTORY=$(cat "$SCAN_FILE") else # Trigger fresh scan /scan-collection "$ARTIST" --format json --output "$SCAN_FILE" INVENTORY=$(cat "$SCAN_FILE") fi
Inventory provides:
- All albums/releases present in collection
- Track counts and file formats
- Quality levels (lossless/lossy bitrates)
- File sizes and directory structure
2. Fetch Canonical Discography
# Check for cached canonical data CANONICAL_FILE=".media-curator/completeness/${ARTIST_SLUG}/canonical.yaml" CACHE_AGE_DAYS=$((( $(date +%s) - $(stat -c %Y "$CANONICAL_FILE" 2>/dev/null || echo 0) ) / 86400)) if [[ $CACHE_AGE_DAYS -gt 7 ]] || [[ ! -f "$CANONICAL_FILE" ]]; then # Fetch fresh canonical discography # MusicBrainz API example: MBID=$(musicbrainz-lookup "$ARTIST" --type artist --format json | jq -r '.artists[0].id') RELEASES=$(musicbrainz-releases "$MBID" --format json) # Parse into canonical.yaml structure echo "$RELEASES" | parse-canonical > "$CANONICAL_FILE" fi
Canonical discography includes:
- Studio albums with track counts and years
- EPs and singles with A/B sides
- Official live albums
- Compilations (filtered by novelty)
- Official box sets
MusicBrainz advantages:
- Comprehensive and accurate
- Includes release dates and track counts
- Distinguishes editions (original, remaster, deluxe)
- Free API with rate limiting
Discogs alternative:
- More complete for rare/indie releases
- Includes pressing details and variants
- Better for vinyl collectors
- Requires API key
3. Build Extended Content Catalog
If
--include-legendary flag set:
# Load legendary performances catalog LEGENDARY_FILE=".media-curator/completeness/${ARTIST_SLUG}/legendary.yaml" if [[ ! -f "$LEGENDARY_FILE" ]]; then # Web search for "artist legendary performances bootleg" # Or load from community-maintained catalog # Or use private tracker tags (if integrated) fetch-legendary-catalog "$ARTIST" > "$LEGENDARY_FILE" fi
Extended catalog includes:
- Notable covers (tribute albums, sessions)
- Soundtrack contributions
- Collaboration tracks (appears on other artists' albums)
- Legendary live performances (soundboard, historical significance)
- Unreleased demos and outtakes
4. Compare Inventory to Canonical
# Comparison logic pseudocode for release in canonical_releases: match = find_in_inventory(release.title, release.year) if match: if match.track_count == release.track_count: release.status = "complete" else: release.status = "partial" release.missing_tracks = release.track_count - match.track_count release.quality = match.quality release.source = match.source else: release.status = "missing"
Fuzzy matching rules:
- Tolerate minor title differences ("The Album" vs "Album, The")
- Match by year if title ambiguous
- Prefer exact track count match over year match
- Flag multiple matches as potential duplicates
5. Score Completeness
# Weighted scoring formula weights = { 'studio_albums': 0.40, 'eps': 0.15, 'singles': 0.15, 'live_albums': 0.10, 'notable_covers': 0.10, 'legendary': 0.10 } component_scores = {} for category, weight in weights.items(): total = len(canonical[category]) complete = sum(1 for item in canonical[category] if item.status == 'complete') partial = sum(0.5 for item in canonical[category] if item.status == 'partial') component_scores[category] = ((complete + partial) / total * 100) if total > 0 else 100 overall_score = sum(component_scores[cat] * weights[cat] for cat in weights) # Quality bonus lossless_pct = (lossless_count / total_count) * 100 if lossless_pct > 80: overall_score += 5
6. Identify and Prioritize Gaps
# Gap prioritization logic for release in canonical_releases: if release.status == "missing" or release.status == "partial": priority = calculate_priority(release) availability = check_availability(release) cost = estimate_cost(release, availability) gaps.append({ 'item': release.title, 'type': release.type, 'year': release.year, 'priority': priority, 'availability': availability, 'estimated_cost': cost, 'reason': priority_reason(release) }) gaps.sort(key=lambda x: (priority_order[x['priority']], x['year']))
Priority calculation:
def calculate_priority(release): score = 0 # Type importance if release.type == 'studio_album': score += 40 elif release.type in ['ep', 'single']: score += 15 elif release.type == 'live_album': score += 10 # Critical acclaim (if available) if release.rating and release.rating > 8.0: score += 20 # Classic period (artist-specific) if release.year in artist_classic_years: score += 15 # Completes a set if is_part_of_trilogy(release) and have_other_parts(release): score += 10 # Historical significance if release.legendary or release.historically_significant: score += 25 # Assign tier if score >= 60: return "HIGH" elif score >= 30: return "MEDIUM" else: return "LOW"
7. Check Availability
def check_availability(release): sources = [] # Streaming services (easy, legal, lossy) if available_on_streaming(release): sources.append({ 'type': 'streaming', 'services': ['Apple Music', 'Spotify', 'Tidal'], 'quality': '256kbps AAC', 'cost': 'subscription', 'difficulty': 'easy' }) # Digital purchase (legal, often lossless) if available_on_bandcamp(release): sources.append({ 'type': 'digital_purchase', 'platform': 'Bandcamp', 'quality': 'FLAC / 320kbps MP3', 'cost': '$7-12', 'difficulty': 'easy' }) # Physical (CD/vinyl) discogs_listings = check_discogs(release) if discogs_listings: sources.append({ 'type': 'physical', 'format': 'CD', 'availability': f"{len(discogs_listings)} listings", 'price_range': f"${min(discogs_listings)} - ${max(discogs_listings)}", 'difficulty': 'moderate' }) # Private trackers (torrents, requires ratio) if user_has_tracker_access(): tracker_results = search_trackers(release) if tracker_results: sources.append({ 'type': 'private_tracker', 'sites': tracker_results.sites, 'quality': 'FLAC', 'cost': 'ratio', 'difficulty': 'moderate' if tracker_results.seeded else 'hard' }) return sources
8. Generate Report
Output format depends on
--output extension or defaults to rich terminal display:
Terminal output (default):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Pink Floyd Collection Completeness Report ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 📊 Overall Score: 92.3% (Excellent Collection) 🎵 Canonical Releases: 94.1% ✅ Studio Albums: 16/17 (94.1%) ✅ EPs: 4/4 (100%) ✅ Singles: 24/30 (80.0%) ✅ Live Albums: 12/15 (80.0%) 🎸 Extended Content: 78.5% ✅ Notable Covers: 8/12 (66.7%) ✅ Soundtracks: 3/5 (60.0%) ⚠️ Legendary: 12/47 (25.5%) 💿 Quality Distribution: 🟢 Lossless (FLAC): 87% (298 files) 🟡 High (320kbps): 11% (38 files) 🔴 Medium (256kbps): 2% (6 files) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Top 5 Acquisition Priorities ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🔴 HIGH: The Endless River (2014) Missing: Complete studio album Reason: Only missing album from official discography Sources: • Apple Music (stream rip, 256kbps) - FREE with subscription • Bandcamp (FLAC) - $9.99 • Discogs (used CD) - $12-18 Recommended: Bandcamp FLAC purchase 🔴 HIGH: Live at Pompeii (1972) - Soundboard Missing: Legendary performance Reason: Exceptional quality, historically significant Sources: • Private tracker (FLAC soundboard) - Ratio cost • YouTube rip (256kbps, unofficial) - FREE but low quality Recommended: Private tracker if available 🟡 MEDIUM: Obscured by Clouds B-sides Missing: 3 tracks from singles Reason: Completes album era, unique content Sources: • Compilation "Works" (1983) - Discogs $15-20 • Individual singles (rare) - eBay ~$30 each Recommended: Works compilation 🟡 MEDIUM: The Wall (24bit/96kHz remaster) Upgrade: Current 320kbps → Lossless remaster Reason: Significant audio improvement, bonus tracks Sources: • HDtracks (24bit FLAC) - $19.99 • Qobuz (24bit streaming) - Subscription Recommended: Wait for HDtracks sale (<$15) 🟢 LOW: Relics (1971 compilation) Missing: Compilation of earlier singles Reason: Low priority, all tracks owned on original albums Sources: • Streaming (256kbps) - FREE • CD (out of print) - eBay $20-30 Recommended: Stream rip if desired for convenience ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Acquisition Plan Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Next 5 acquisitions: 1. The Endless River - Bandcamp FLAC ($9.99) 2. Live at Pompeii - Private tracker (ratio) 3. Obscured by Clouds B-sides - Works CD ($18) 4. The Wall remaster - Wait for sale (~$15) 5. (Optional) Relics - Stream rip (FREE) 💰 Estimated cost: $43-60 (excluding sale wait items) ⏱️ Estimated time: 1-2 weeks 🎯 Next score: ~96.5% (Completionist tier) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Report saved to: .media-curator/completeness/pink-floyd/gap-report-2026-02-14.yaml GAP-NOTE files created: 5 items
YAML output (--output report.yaml):
Uses the full gap report structure defined in the Completeness Tracker agent documentation.
Markdown output (--output report.md):
Human-readable report with sections, tables, and acquisition instructions.
9. Create GAP-NOTE Files (if --create-gap-notes)
for gap in high_priority_gaps: NOTE_PATH="${COLLECTION}/${ARTIST}/${gap.year} - ${gap.title}/GAP-NOTE.md" mkdir -p "$(dirname "$NOTE_PATH")" cat > "$NOTE_PATH" << EOF # GAP-NOTE: ${gap.title} (${gap.year}) ## Expected Content - ${gap.type}: ${gap.track_count} tracks - ${gap.description} - Rating: ${gap.rating}/10 ## Acquisition Strategy ${format_acquisition_steps(gap.sources)} ## Priority **${gap.priority}** - ${gap.reason} ## Sources ${format_source_links(gap)} - Last searched: Never EOF done
Examples
Example 1: Basic completeness check
/check-completeness "Pink Floyd" --collection /mnt/media/Music
Output: Full terminal report with scores and top 5 priorities
Example 2: Just show high-priority gaps
/check-completeness "The Beatles" --gaps-only --priority high
Output:
High-Priority Gaps for The Beatles: 1. Let It Be Sessions (1970) - Legendary unreleased sessions 2. Live at Hollywood Bowl (original 1977) - Out of print, reissue available 3. Anthology outtakes (various) - 12 tracks not on official release
Example 3: Create acquisition workflow
/check-completeness "Led Zeppelin" \ --include-legendary \ --create-gap-notes \ --output gaps-report.yaml
Creates:
- Full structured reportgaps-report.yaml- GAP-NOTE.md files in each missing album directory
- Updated completeness state file
Example 4: Incremental monitoring
# Weekly cron job /check-completeness "Grateful Dead" \ --incremental \ --quality-threshold lossless \ --output /reports/weekly-$(date +%F).yaml
Fast execution, only re-fetches canonical if stale.
Integration with Other Commands
Before completeness check:
/scan-collection "Artist" --format json # Generates inventory
After completeness check:
/acquisition-plan "Artist" --from-gaps gaps-report.yaml # Plan downloads /quality-audit "Artist" --upgrades-only # Find quality improvement targets
Parallel gap filling:
/completeness-fill "Artist" --priority high --source streaming & /completeness-fill "Artist" --priority high --source physical & wait /check-completeness "Artist" --incremental # Re-check after acquisition
Error Handling
MusicBrainz API failure:
- Fall back to Discogs API
- If both fail, use cached canonical (warn if >30 days old)
- Suggest manual canonical.yaml creation
Artist not found:
- Suggest fuzzy matches from collection directory names
- Offer to use directory name as canonical source
No collection inventory:
- Auto-trigger
if not found/scan-collection - Error if collection path invalid
Performance
Expected execution time:
- Small collection (<500 files): 5-10 seconds
- Medium collection (500-2000 files): 15-30 seconds
- Large collection (>2000 files): 30-60 seconds
- Incremental mode: 50% faster (uses cached canonical)
Rate limiting:
- MusicBrainz: 1 request/second (built-in delay)
- Discogs: 60 requests/minute (with API key)
- Wait and retry on 429 responses
Output Files
All output written to
.media-curator/completeness/{artist-slug}/:
.media-curator/completeness/pink-floyd/ ├── canonical.yaml # Cached canonical discography ├── legendary.yaml # Extended content catalog ├── state.json # Completeness tracking state ├── gap-report-2026-02-14.yaml ├── gap-report-2026-01-15.yaml # Historical reports └── history.json # Completeness score over time
Success Criteria
- Completeness scores accurate within ±2%
- Zero false positives (claiming missing when present)
- Priority assignments align with collector consensus >90%
- Availability checks current within 7 days
- Report generation <60 seconds for any collection size
References
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research canonical discography (MusicBrainz, Discogs) before determining completeness
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/analyze-artist/SKILL.md — Artist analysis that establishes the canonical discography used as completeness baseline
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/gap-documentation/SKILL.md — Gap documentation skill used to record identified missing items
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/find-sources/SKILL.md — Source discovery skill used to locate missing releases identified by completeness check