Aiwg gap-documentation

Document missing content in media archives with standardized GAP-NOTE markers that guide acquisition and measure completeness

install

source · Clone the upstream repo

git clone https://github.com/jmagly/aiwg

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/gap-documentation" ~/.claude/skills/jmagly-aiwg-gap-documentation && rm -rf "$T"

manifest: .agents/skills/gap-documentation/SKILL.md

source content

GAP-NOTE Documentation Skill

A systematic approach to documenting missing content in media archives using standardized gap markers that guide acquisition, measure completeness, and enable parallel collection workflows.

Overview

GAP-NOTE.md files serve as placeholders in directories where content is expected but not yet acquired. They transform empty directories from ambiguous voids into actionable instructions, creating a clear separation between "not yet collected" and "intentionally excluded."

This pattern emerged from field testing during music archive curation (issue #81), where agents needed explicit guidance on acquisition priorities and strategies without requiring central coordination.

The GAP-NOTE.md Template

Standard Structure

# GAP-NOTE: {Directory Purpose}

## Expected Content
{What should be here}

## Acquisition Strategy
{How to obtain it}

## Priority
{HIGH | MEDIUM | LOW}

## Sources
- {Source 1}
- {Source 2}

Field Descriptions

Title Line (

# GAP-NOTE: ...

)

Brief description of directory purpose
Helps agents understand context at a glance

Example:

# GAP-NOTE: BBC Radio Sessions (1990-1995)

Expected Content

Specific description of missing content
Include formats, quality expectations, metadata requirements
Be concrete enough to validate completion
Example: "Lossless audio (FLAC/ALAC) of all BBC Radio 1 sessions, with broadcast date, presenter name, and tracklisting"

Acquisition Strategy

Step-by-step instructions for obtaining content
Include API endpoints, web scrapers, manual download URLs
Specify required credentials, rate limits, legal considerations
Example: "Use BBC Sounds API → filter by artist → download via yt-dlp → transcode to FLAC"

Priority

Standardized levels: HIGH, MEDIUM, LOW
Determines resource allocation for parallel workflows
See Priority Scoring section below

Sources

URLs, API endpoints, contact information
Structured as bulleted list for easy parsing
Include backup sources where available

Example:

- https://www.bbc.co.uk/sounds/brand/b006wkqb

Lifecycle Management

1. Creation

Created when establishing directory structure but before content acquisition:

# Create directory and gap marker
mkdir -p "Artist Name/Radio Sessions/BBC Radio 1"
cat > "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md" <<'INNER_EOF'
# GAP-NOTE: BBC Radio 1 Sessions

## Expected Content
Lossless audio recordings of all BBC Radio 1 sessions (1990-1995)
with metadata: broadcast date, presenter, tracklist, duration

## Acquisition Strategy
1. Query BBC Sounds API for artist sessions
2. Download via yt-dlp with metadata extraction
3. Transcode to FLAC if necessary
4. Extract and verify tracklist from audio

## Priority
HIGH

## Sources
- https://www.bbc.co.uk/sounds/brand/b006wkqb
- https://archive.org/details/bbcradio1sessions
INNER_EOF

2. Active Use

Agents read GAP-NOTE.md as instructions during acquisition:

# Find high-priority gaps
find . -name "GAP-NOTE.md" -exec grep -l "^## Priority$" {} \; | \
  xargs grep -A1 "^## Priority$" | grep "HIGH" | cut -d: -f1

# Read acquisition strategy
grep -A20 "^## Acquisition Strategy$" "path/to/GAP-NOTE.md"

3. Completion and Removal

Once content acquired, GAP-NOTE.md is removed or replaced:

# Option 1: Remove when content present
rm "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md"

# Option 2: Replace with catalog
mv "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md" \
   "Artist Name/Radio Sessions/BBC Radio 1/CATALOG.md"

Priority Scoring

HIGH Priority

Criteria:

Core content essential to collection purpose
Readily available from reliable sources
Legal and ethical to acquire
Low technical complexity

Examples:

Radio sessions from major broadcasters (BBC, NPR)
Album artwork from official label websites
Press photos from artist official sites
Lyrics from licensed databases

Expected timeframe: Hours to days

MEDIUM Priority

Criteria:

Supplementary content enhancing collection
Requires moderate effort or coordination
May involve multiple sources or processing steps
Quality/completeness trade-offs acceptable

Examples:

Concert bootlegs from trading communities
Artist photos from Wikimedia Commons requiring attribution
Interviews transcribed from audio/video
Fan-created setlists cross-referenced with recordings

Expected timeframe: Days to weeks

LOW Priority

Criteria:

Nice-to-have content, non-essential
Difficult to source or verify
May require manual intervention
Uncertain availability or legal status

Examples:

Pre-label era demo recordings
Rare promotional materials
Personal photos from fan archives
Unverified session information

Expected timeframe: Weeks to months (or indefinite)

Completeness Measurement

Gap Inventory

Count remaining gaps to measure collection progress:

# Total gap count
find . -name "GAP-NOTE.md" | wc -l

# Gaps by priority
echo "=== Collection Gaps by Priority ==="
echo -n "HIGH:   "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "HIGH"
echo -n "MEDIUM: "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "MEDIUM"
echo -n "LOW:    "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "LOW"

Completeness Percentage

Calculate completion based on gap ratio:

#!/bin/bash
# Calculate archive completeness

TOTAL_DIRS=$(find . -type d | wc -l)
GAP_DIRS=$(find . -name "GAP-NOTE.md" | wc -l)
COMPLETE_DIRS=$((TOTAL_DIRS - GAP_DIRS))
PERCENT=$((COMPLETE_DIRS * 100 / TOTAL_DIRS))

echo "Archive Completeness: $PERCENT%"
echo "  Complete: $COMPLETE_DIRS/$TOTAL_DIRS directories"
echo "  Gaps remaining: $GAP_DIRS"

Gap Report

Generate structured report for review:

#!/bin/bash
# Generate gap report with locations and priorities

echo "# Archive Gap Report"
echo "Generated: $(date -I)"
echo ""

for priority in HIGH MEDIUM LOW; do
  echo "## $priority Priority"
  find . -name "GAP-NOTE.md" -print0 | while IFS= read -r -d '' file; do
    if grep -A1 "^## Priority$" "$file" | grep -q "$priority"; then
      dir=$(dirname "$file")
      title=$(grep "^# GAP-NOTE:" "$file" | sed 's/# GAP-NOTE: //')
      echo "- **$dir**"
      echo "  $title"
    fi
  done
  echo ""
done

Parallel Gap Fill Pattern

Multiple agents can work on gaps simultaneously by claiming GAP-NOTE files:

1. Claim a Gap

# Add agent claim to gap note
echo "" >> "path/to/GAP-NOTE.md"
echo "## Claimed By" >> "path/to/GAP-NOTE.md"
echo "Agent: $AGENT_ID" >> "path/to/GAP-NOTE.md"
echo "Timestamp: $(date -Iseconds)" >> "path/to/GAP-NOTE.md"

2. Execute Acquisition

Follow the acquisition strategy documented in the GAP-NOTE:

# Read strategy section
STRATEGY=$(sed -n '/^## Acquisition Strategy$/,/^##/p' "path/to/GAP-NOTE.md" | \
           grep -v "^##")

# Execute steps (example)
while IFS= read -r step; do
  echo "Executing: $step"
  # ... implement step execution ...
done <<< "$STRATEGY"

3. Validate and Close

# Verify content acquired
if [ -f "expected-content-file.flac" ]; then
  # Move GAP-NOTE to completion log
  mkdir -p .archive-logs/completed-gaps
  mv "path/to/GAP-NOTE.md" ".archive-logs/completed-gaps/$(date -I)-gap-filled.md"
  echo "Gap filled successfully"
else
  # Update GAP-NOTE with failure info
  echo "" >> "path/to/GAP-NOTE.md"
  echo "## Acquisition Attempt" >> "path/to/GAP-NOTE.md"
  echo "Failed: $(date -Iseconds)" >> "path/to/GAP-NOTE.md"
  echo "Reason: Content not available from listed sources" >> "path/to/GAP-NOTE.md"
fi

Real-World Examples

Example 1: Radio Sessions

# GAP-NOTE: BBC Radio 1 Sessions (1990-1995)

## Expected Content
Lossless audio recordings (FLAC/ALAC) of all BBC Radio 1 sessions
between 1990-1995, including:
- Full session audio
- Broadcast date and presenter name
- Complete tracklist with song titles
- Duration and audio quality metadata

## Acquisition Strategy
1. Query BBC Sounds API: `https://api.bbc.co.uk/sounds/search?q={artist}`
2. Filter results for Radio 1 sessions in date range
3. Download via yt-dlp: `yt-dlp --extract-audio --audio-format flac {url}`
4. Extract metadata using `ffprobe` and validate against expected format
5. Organize into subdirectories by broadcast date

## Priority
HIGH

## Sources
- https://www.bbc.co.uk/sounds/brand/b006wkqb
- https://archive.org/details/bbcradio1sessions
- https://musicbrainz.org (for session metadata validation)

Example 2: Album Artwork

# GAP-NOTE: Album Artwork Collection

## Expected Content
High-resolution album covers (minimum 1400x1400px, JPEG or PNG)
for all studio albums, EPs, and major singles:
- Front cover (required)
- Back cover (optional)
- CD/vinyl labels (optional)
- Embedded artwork extracted from audio files

## Acquisition Strategy
1. Check MusicBrainz Cover Art Archive API
2. Query Discogs API for releases: `https://api.discogs.com/artists/{id}/releases`
3. Download largest available image from each release
4. Fallback: extract embedded artwork from FLAC files using `metaflac --export-picture-to=cover.jpg`
5. Validate dimensions and format

## Priority
MEDIUM

## Sources
- https://coverartarchive.org/release/{mbid}
- https://api.discogs.com/releases/{id}
- Embedded in existing FLAC files

Example 3: Interview Transcripts

# GAP-NOTE: Interview Archive

## Expected Content
Transcripts of all major interviews (print, radio, video) from 1988-2000:
- Full text transcription
- Interview date and publication/broadcast source
- Interviewer name
- Topic tags (e.g., "album recording", "tour", "influences")

## Acquisition Strategy
1. Search online archives: Rock's Backpages, music magazine databases
2. For video/audio interviews: use Whisper API for transcription
3. Manual transcription for short clips (<5 minutes)
4. Cross-reference dates with tour/album timeline for validation
5. Format as markdown with YAML frontmatter

## Priority
LOW

## Sources
- https://www.rocksbackpages.com
- https://archive.org/details/audio (radio interviews)
- YouTube channels (fan uploads, use Whisper for transcription)

Management Commands

Create Gap Template

# Function to scaffold new gap note
create_gap_note() {
  local dir="$1"
  local title="$2"
  local priority="${3:-MEDIUM}"

  mkdir -p "$dir"
  cat > "$dir/GAP-NOTE.md" <<INNER_EOF
# GAP-NOTE: $title

## Expected Content
{Describe what content should be here}

## Acquisition Strategy
1. {Step 1}
2. {Step 2}

## Priority
$priority

## Sources
- {Source URL 1}
INNER_EOF

  echo "Created gap note: $dir/GAP-NOTE.md"
}

# Usage
create_gap_note "Artist/Radio Sessions/NPR Tiny Desk" \
                "NPR Tiny Desk Concert" \
                "HIGH"

Bulk Status Check

# Check all gaps and report status
check_gaps() {
  echo "=== Archive Gap Status ==="
  echo "Generated: $(date)"
  echo ""

  find . -name "GAP-NOTE.md" -print0 | sort -z | while IFS= read -r -d '' file; do
    dir=$(dirname "$file")
    title=$(grep "^# GAP-NOTE:" "$file" | sed 's/# GAP-NOTE: //')
    priority=$(grep -A1 "^## Priority$" "$file" | tail -n1 | tr -d ' ')
    claimed=$(grep -q "^## Claimed By$" "$file" && echo "[CLAIMED]" || echo "")

    printf "%-60s [%s] %s\n" "$dir" "$priority" "$claimed"
  done
}

Clean Completed Gaps

# Archive completed gap notes
archive_completed_gaps() {
  mkdir -p .archive-logs/completed-gaps

  find . -name "GAP-NOTE.md" -print0 | while IFS= read -r -d '' file; do
    dir=$(dirname "$file")

    # Check if directory has actual content now
    content_count=$(find "$dir" -type f ! -name "GAP-NOTE.md" | wc -l)

    if [ "$content_count" -gt 0 ]; then
      echo "Gap filled: $dir"
      mv "$file" ".archive-logs/completed-gaps/$(date -I)-$(basename "$dir" | tr ' ' '-').md"
    fi
  done
}

Integration with Media Curator Framework

Gap Detection

The Media Curator framework can automatically detect gaps:

# Detect expected-but-missing directories
aiwg media-curator detect-gaps --archive /path/to/archive

Gap Filling Workflow

# Fill high-priority gaps
aiwg media-curator fill-gaps --priority HIGH --parallel 3

Gap Reporting

# Generate gap report
aiwg media-curator gap-report --format markdown > GAPS.md

Best Practices

Be specific in Expected Content - Vague descriptions lead to incomplete acquisitions
Test acquisition strategies - Verify each step works before documenting
Update priority as context changes - Source availability affects urgency
Use consistent formatting - Enables automated parsing and reporting
Archive completed gaps - Maintain history of what was filled and when
Review gaps periodically - Sources and priorities evolve over time
Claim gaps before starting - Prevents duplicate effort in parallel workflows

References

@$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/check-completeness/SKILL.md — Completeness analysis that identifies gaps which this skill documents
@$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/analyze-artist/SKILL.md — Artist analysis that establishes the canonical discography used to detect gaps
@$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/find-sources/SKILL.md — Source discovery used to locate items identified as gaps
@$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Provenance tracking applied when gaps are resolved and acquisitions recorded

Aiwg gap-documentation

GAP-NOTE Documentation Skill

Overview

The GAP-NOTE.md Template

Standard Structure

Field Descriptions

Lifecycle Management

1. Creation

2. Active Use

3. Completion and Removal

Priority Scoring

HIGH Priority

MEDIUM Priority

LOW Priority

Completeness Measurement

Gap Inventory

Completeness Percentage

Gap Report

Parallel Gap Fill Pattern

1. Claim a Gap

2. Execute Acquisition

3. Validate and Close

Real-World Examples

Example 1: Radio Sessions

Example 2: Album Artwork

Example 3: Interview Transcripts

Management Commands

Create Gap Template

Bulk Status Check

Clean Completed Gaps

Integration with Media Curator Framework

Gap Detection

Gap Filling Workflow

Gap Reporting

Best Practices

See Also

References