Aiwg gap-documentation

Document missing content in media archives with standardized GAP-NOTE markers that guide acquisition and measure completeness

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/gap-documentation" ~/.claude/skills/jmagly-aiwg-gap-documentation && rm -rf "$T"
manifest: .agents/skills/gap-documentation/SKILL.md
source content

GAP-NOTE Documentation Skill

A systematic approach to documenting missing content in media archives using standardized gap markers that guide acquisition, measure completeness, and enable parallel collection workflows.

Overview

GAP-NOTE.md files serve as placeholders in directories where content is expected but not yet acquired. They transform empty directories from ambiguous voids into actionable instructions, creating a clear separation between "not yet collected" and "intentionally excluded."

This pattern emerged from field testing during music archive curation (issue #81), where agents needed explicit guidance on acquisition priorities and strategies without requiring central coordination.

The GAP-NOTE.md Template

Standard Structure

# GAP-NOTE: {Directory Purpose}

## Expected Content
{What should be here}

## Acquisition Strategy
{How to obtain it}

## Priority
{HIGH | MEDIUM | LOW}

## Sources
- {Source 1}
- {Source 2}

Field Descriptions

Title Line (

# GAP-NOTE: ...
)

  • Brief description of directory purpose
  • Helps agents understand context at a glance
  • Example:
    # GAP-NOTE: BBC Radio Sessions (1990-1995)

Expected Content

  • Specific description of missing content
  • Include formats, quality expectations, metadata requirements
  • Be concrete enough to validate completion
  • Example: "Lossless audio (FLAC/ALAC) of all BBC Radio 1 sessions, with broadcast date, presenter name, and tracklisting"

Acquisition Strategy

  • Step-by-step instructions for obtaining content
  • Include API endpoints, web scrapers, manual download URLs
  • Specify required credentials, rate limits, legal considerations
  • Example: "Use BBC Sounds API → filter by artist → download via yt-dlp → transcode to FLAC"

Priority

  • Standardized levels: HIGH, MEDIUM, LOW
  • Determines resource allocation for parallel workflows
  • See Priority Scoring section below

Sources

  • URLs, API endpoints, contact information
  • Structured as bulleted list for easy parsing
  • Include backup sources where available
  • Example:
    - https://www.bbc.co.uk/sounds/brand/b006wkqb

Lifecycle Management

1. Creation

Created when establishing directory structure but before content acquisition:

# Create directory and gap marker
mkdir -p "Artist Name/Radio Sessions/BBC Radio 1"
cat > "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md" <<'INNER_EOF'
# GAP-NOTE: BBC Radio 1 Sessions

## Expected Content
Lossless audio recordings of all BBC Radio 1 sessions (1990-1995)
with metadata: broadcast date, presenter, tracklist, duration

## Acquisition Strategy
1. Query BBC Sounds API for artist sessions
2. Download via yt-dlp with metadata extraction
3. Transcode to FLAC if necessary
4. Extract and verify tracklist from audio

## Priority
HIGH

## Sources
- https://www.bbc.co.uk/sounds/brand/b006wkqb
- https://archive.org/details/bbcradio1sessions
INNER_EOF

2. Active Use

Agents read GAP-NOTE.md as instructions during acquisition:

# Find high-priority gaps
find . -name "GAP-NOTE.md" -exec grep -l "^## Priority$" {} \; | \
  xargs grep -A1 "^## Priority$" | grep "HIGH" | cut -d: -f1

# Read acquisition strategy
grep -A20 "^## Acquisition Strategy$" "path/to/GAP-NOTE.md"

3. Completion and Removal

Once content acquired, GAP-NOTE.md is removed or replaced:

# Option 1: Remove when content present
rm "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md"

# Option 2: Replace with catalog
mv "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md" \
   "Artist Name/Radio Sessions/BBC Radio 1/CATALOG.md"

Priority Scoring

HIGH Priority

Criteria:

  • Core content essential to collection purpose
  • Readily available from reliable sources
  • Legal and ethical to acquire
  • Low technical complexity

Examples:

  • Radio sessions from major broadcasters (BBC, NPR)
  • Album artwork from official label websites
  • Press photos from artist official sites
  • Lyrics from licensed databases

Expected timeframe: Hours to days

MEDIUM Priority

Criteria:

  • Supplementary content enhancing collection
  • Requires moderate effort or coordination
  • May involve multiple sources or processing steps
  • Quality/completeness trade-offs acceptable

Examples:

  • Concert bootlegs from trading communities
  • Artist photos from Wikimedia Commons requiring attribution
  • Interviews transcribed from audio/video
  • Fan-created setlists cross-referenced with recordings

Expected timeframe: Days to weeks

LOW Priority

Criteria:

  • Nice-to-have content, non-essential
  • Difficult to source or verify
  • May require manual intervention
  • Uncertain availability or legal status

Examples:

  • Pre-label era demo recordings
  • Rare promotional materials
  • Personal photos from fan archives
  • Unverified session information

Expected timeframe: Weeks to months (or indefinite)

Completeness Measurement

Gap Inventory

Count remaining gaps to measure collection progress:

# Total gap count
find . -name "GAP-NOTE.md" | wc -l

# Gaps by priority
echo "=== Collection Gaps by Priority ==="
echo -n "HIGH:   "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "HIGH"
echo -n "MEDIUM: "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "MEDIUM"
echo -n "LOW:    "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "LOW"

Completeness Percentage

Calculate completion based on gap ratio:

#!/bin/bash
# Calculate archive completeness

TOTAL_DIRS=$(find . -type d | wc -l)
GAP_DIRS=$(find . -name "GAP-NOTE.md" | wc -l)
COMPLETE_DIRS=$((TOTAL_DIRS - GAP_DIRS))
PERCENT=$((COMPLETE_DIRS * 100 / TOTAL_DIRS))

echo "Archive Completeness: $PERCENT%"
echo "  Complete: $COMPLETE_DIRS/$TOTAL_DIRS directories"
echo "  Gaps remaining: $GAP_DIRS"

Gap Report

Generate structured report for review:

#!/bin/bash
# Generate gap report with locations and priorities

echo "# Archive Gap Report"
echo "Generated: $(date -I)"
echo ""

for priority in HIGH MEDIUM LOW; do
  echo "## $priority Priority"
  find . -name "GAP-NOTE.md" -print0 | while IFS= read -r -d '' file; do
    if grep -A1 "^## Priority$" "$file" | grep -q "$priority"; then
      dir=$(dirname "$file")
      title=$(grep "^# GAP-NOTE:" "$file" | sed 's/# GAP-NOTE: //')
      echo "- **$dir**"
      echo "  $title"
    fi
  done
  echo ""
done

Parallel Gap Fill Pattern

Multiple agents can work on gaps simultaneously by claiming GAP-NOTE files:

1. Claim a Gap

# Add agent claim to gap note
echo "" >> "path/to/GAP-NOTE.md"
echo "## Claimed By" >> "path/to/GAP-NOTE.md"
echo "Agent: $AGENT_ID" >> "path/to/GAP-NOTE.md"
echo "Timestamp: $(date -Iseconds)" >> "path/to/GAP-NOTE.md"

2. Execute Acquisition

Follow the acquisition strategy documented in the GAP-NOTE:

# Read strategy section
STRATEGY=$(sed -n '/^## Acquisition Strategy$/,/^##/p' "path/to/GAP-NOTE.md" | \
           grep -v "^##")

# Execute steps (example)
while IFS= read -r step; do
  echo "Executing: $step"
  # ... implement step execution ...
done <<< "$STRATEGY"

3. Validate and Close

# Verify content acquired
if [ -f "expected-content-file.flac" ]; then
  # Move GAP-NOTE to completion log
  mkdir -p .archive-logs/completed-gaps
  mv "path/to/GAP-NOTE.md" ".archive-logs/completed-gaps/$(date -I)-gap-filled.md"
  echo "Gap filled successfully"
else
  # Update GAP-NOTE with failure info
  echo "" >> "path/to/GAP-NOTE.md"
  echo "## Acquisition Attempt" >> "path/to/GAP-NOTE.md"
  echo "Failed: $(date -Iseconds)" >> "path/to/GAP-NOTE.md"
  echo "Reason: Content not available from listed sources" >> "path/to/GAP-NOTE.md"
fi

Real-World Examples

Example 1: Radio Sessions

# GAP-NOTE: BBC Radio 1 Sessions (1990-1995)

## Expected Content
Lossless audio recordings (FLAC/ALAC) of all BBC Radio 1 sessions
between 1990-1995, including:
- Full session audio
- Broadcast date and presenter name
- Complete tracklist with song titles
- Duration and audio quality metadata

## Acquisition Strategy
1. Query BBC Sounds API: `https://api.bbc.co.uk/sounds/search?q={artist}`
2. Filter results for Radio 1 sessions in date range
3. Download via yt-dlp: `yt-dlp --extract-audio --audio-format flac {url}`
4. Extract metadata using `ffprobe` and validate against expected format
5. Organize into subdirectories by broadcast date

## Priority
HIGH

## Sources
- https://www.bbc.co.uk/sounds/brand/b006wkqb
- https://archive.org/details/bbcradio1sessions
- https://musicbrainz.org (for session metadata validation)

Example 2: Album Artwork

# GAP-NOTE: Album Artwork Collection

## Expected Content
High-resolution album covers (minimum 1400x1400px, JPEG or PNG)
for all studio albums, EPs, and major singles:
- Front cover (required)
- Back cover (optional)
- CD/vinyl labels (optional)
- Embedded artwork extracted from audio files

## Acquisition Strategy
1. Check MusicBrainz Cover Art Archive API
2. Query Discogs API for releases: `https://api.discogs.com/artists/{id}/releases`
3. Download largest available image from each release
4. Fallback: extract embedded artwork from FLAC files using `metaflac --export-picture-to=cover.jpg`
5. Validate dimensions and format

## Priority
MEDIUM

## Sources
- https://coverartarchive.org/release/{mbid}
- https://api.discogs.com/releases/{id}
- Embedded in existing FLAC files

Example 3: Interview Transcripts

# GAP-NOTE: Interview Archive

## Expected Content
Transcripts of all major interviews (print, radio, video) from 1988-2000:
- Full text transcription
- Interview date and publication/broadcast source
- Interviewer name
- Topic tags (e.g., "album recording", "tour", "influences")

## Acquisition Strategy
1. Search online archives: Rock's Backpages, music magazine databases
2. For video/audio interviews: use Whisper API for transcription
3. Manual transcription for short clips (<5 minutes)
4. Cross-reference dates with tour/album timeline for validation
5. Format as markdown with YAML frontmatter

## Priority
LOW

## Sources
- https://www.rocksbackpages.com
- https://archive.org/details/audio (radio interviews)
- YouTube channels (fan uploads, use Whisper for transcription)

Management Commands

Create Gap Template

# Function to scaffold new gap note
create_gap_note() {
  local dir="$1"
  local title="$2"
  local priority="${3:-MEDIUM}"

  mkdir -p "$dir"
  cat > "$dir/GAP-NOTE.md" <<INNER_EOF
# GAP-NOTE: $title

## Expected Content
{Describe what content should be here}

## Acquisition Strategy
1. {Step 1}
2. {Step 2}

## Priority
$priority

## Sources
- {Source URL 1}
INNER_EOF

  echo "Created gap note: $dir/GAP-NOTE.md"
}

# Usage
create_gap_note "Artist/Radio Sessions/NPR Tiny Desk" \
                "NPR Tiny Desk Concert" \
                "HIGH"

Bulk Status Check

# Check all gaps and report status
check_gaps() {
  echo "=== Archive Gap Status ==="
  echo "Generated: $(date)"
  echo ""

  find . -name "GAP-NOTE.md" -print0 | sort -z | while IFS= read -r -d '' file; do
    dir=$(dirname "$file")
    title=$(grep "^# GAP-NOTE:" "$file" | sed 's/# GAP-NOTE: //')
    priority=$(grep -A1 "^## Priority$" "$file" | tail -n1 | tr -d ' ')
    claimed=$(grep -q "^## Claimed By$" "$file" && echo "[CLAIMED]" || echo "")

    printf "%-60s [%s] %s\n" "$dir" "$priority" "$claimed"
  done
}

Clean Completed Gaps

# Archive completed gap notes
archive_completed_gaps() {
  mkdir -p .archive-logs/completed-gaps

  find . -name "GAP-NOTE.md" -print0 | while IFS= read -r -d '' file; do
    dir=$(dirname "$file")

    # Check if directory has actual content now
    content_count=$(find "$dir" -type f ! -name "GAP-NOTE.md" | wc -l)

    if [ "$content_count" -gt 0 ]; then
      echo "Gap filled: $dir"
      mv "$file" ".archive-logs/completed-gaps/$(date -I)-$(basename "$dir" | tr ' ' '-').md"
    fi
  done
}

Integration with Media Curator Framework

Gap Detection

The Media Curator framework can automatically detect gaps:

# Detect expected-but-missing directories
aiwg media-curator detect-gaps --archive /path/to/archive

Gap Filling Workflow

# Fill high-priority gaps
aiwg media-curator fill-gaps --priority HIGH --parallel 3

Gap Reporting

# Generate gap report
aiwg media-curator gap-report --format markdown > GAPS.md

Best Practices

  1. Be specific in Expected Content - Vague descriptions lead to incomplete acquisitions
  2. Test acquisition strategies - Verify each step works before documenting
  3. Update priority as context changes - Source availability affects urgency
  4. Use consistent formatting - Enables automated parsing and reporting
  5. Archive completed gaps - Maintain history of what was filled and when
  6. Review gaps periodically - Sources and priorities evolve over time
  7. Claim gaps before starting - Prevents duplicate effort in parallel workflows

See Also

  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/catalog-generation.md
    - Replacing GAP-NOTEs with catalogs
  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/archive-organization.md
    - Directory structure conventions
  • Issue #81 (gitea) - Original field testing that discovered this pattern

References

  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/check-completeness/SKILL.md — Completeness analysis that identifies gaps which this skill documents
  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/analyze-artist/SKILL.md — Artist analysis that establishes the canonical discography used to detect gaps
  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/find-sources/SKILL.md — Source discovery used to locate items identified as gaps
  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Provenance tracking applied when gaps are resolved and acquisitions recorded