Aiwg acquire

Download media from discovered sources with format selection and progress tracking

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/acquire" ~/.claude/skills/jmagly-aiwg-acquire && rm -rf "$T"
manifest: .agents/skills/acquire/SKILL.md
source content

/acquire

Download media from discovered sources with intelligent format selection, parallel execution, and comprehensive progress tracking.

Purpose

The

/acquire
command orchestrates media downloads from various sources (YouTube, Internet Archive, Bandcamp, direct links) using the Acquisition Manager agent. It handles:

  • Single URL or batch downloads from source plans
  • Automatic format selection based on content type
  • Parallel download execution with resource limits
  • Real-time progress tracking and status reporting
  • Error recovery with retry strategies
  • Quality verification and metadata recording
  • Network mount awareness for optimal performance

Parameters

Required (one of)

--plan <sources.yaml>

  • Source plan file generated by
    /find-sources
    command
  • YAML format containing URLs, metadata, and acquisition strategy
  • Example:
    .curator/sources/plan-001.yaml

--url <URL>

  • Single URL for one-off downloads
  • Supports: YouTube, Internet Archive, Bandcamp, SoundCloud, Vimeo, direct links
  • Example:
    https://youtube.com/watch?v=abc123

Optional

--format <audio|video|best>

  • Format preference for downloads
  • audio
    : Extract audio only (Opus 128K default)
  • video
    : Best video up to 1080p with audio
  • best
    : Let agent decide based on content type (default)
  • Example:
    --format audio

--output <directory>

  • Base output directory for downloads
  • Default:
    ./downloads/<timestamp>
  • Structure:
    <output>/<artist>/<era>/audio/
    and
    /video/
  • Example:
    --output /mnt/archive/music

--parallel <N>

  • Maximum concurrent downloads
  • Range: 1-5 (default: 3)
  • Lower for slow networks, higher for fast connections
  • Example:
    --parallel 5

--local-work <directory>

  • Local working directory (required for network mount outputs)
  • Downloads happen here first, then copied to output
  • Default:
    /tmp/curator-work-$$
  • Example:
    --local-work /fast-local-disk/tmp

--verify-after

  • Enable post-download verification
  • Checks file integrity, size, media validity
  • Adds ~5% overhead but prevents corruption
  • Example:
    --verify-after

--extract-audio

  • Automatically extract audio from downloaded videos
  • Creates parallel audio files in audio/ directory
  • Uses Opus 128K by default
  • Example:
    --extract-audio

--resume <session-id>

  • Resume interrupted download session
  • Loads state from
    .curator/sessions/<session-id>/state.json
  • Skips completed downloads, retries failed
  • Example:
    --resume 20260214-143022

Usage Examples

Single URL Download

# Download single YouTube video (audio only)
/acquire --url "https://youtube.com/watch?v=abc123" --format audio

# Download single video with metadata
/acquire --url "https://youtube.com/watch?v=abc123" \
  --format video \
  --output /mnt/archive/concerts \
  --verify-after

Batch Download from Plan

# Download entire source plan (generated by /find-sources)
/acquire --plan .curator/sources/plan-001.yaml

# Download to network mount (uses local working dir)
/acquire --plan .curator/sources/plan-001.yaml \
  --output /mnt/network-storage/archive \
  --local-work /fast-ssd/curator-work \
  --parallel 3

Audio Extraction Workflow

# Download videos and auto-extract audio
/acquire --plan .curator/sources/plan-001.yaml \
  --format video \
  --extract-audio \
  --verify-after

# Result structure:
# downloads/artist/era/video/concert.mkv
# downloads/artist/era/audio/concert.opus

Resume Interrupted Session

# Resume after network failure
/acquire --resume 20260214-143022

# Resume with different output (move completed files)
/acquire --resume 20260214-143022 \
  --output /new/location

Workflow

1. Initialization

# Validate parameters
- Check --plan exists OR --url provided
- Verify output directory is writable
- Check required tools (yt-dlp, wget, curl, ffmpeg)
- Test network mount performance if applicable
- Create session directory structure

# Session setup
SESSION_ID=$(date +%Y%m%d-%H%M%S)
SESSION_DIR=".curator/sessions/$SESSION_ID"
mkdir -p "$SESSION_DIR"/{logs,metadata}

# Initialize state file
cat > "$SESSION_DIR/state.json" <<EOF
{
  "session_id": "$SESSION_ID",
  "started_at": "$(date -Iseconds)",
  "status": "initializing",
  "parameters": {
    "plan": "$PLAN_FILE",
    "format": "$FORMAT",
    "output": "$OUTPUT_DIR",
    "parallel": $PARALLEL
  },
  "downloads": []
}
EOF

2. Source Validation

# If --plan provided, parse YAML
if [[ -n "$PLAN_FILE" ]]; then
  # Extract URLs and metadata
  SOURCES=$(yq eval '.sources[] | .url' "$PLAN_FILE")
  TOTAL_SOURCES=$(echo "$SOURCES" | wc -l)

  # Validate each URL is accessible
  while IFS= read -r url; do
    if yt-dlp --dump-json "$url" >/dev/null 2>&1; then
      echo "VALID: $url"
    else
      echo "WARNING: Cannot access $url"
    fi
  done <<< "$SOURCES"
fi

# If --url provided, validate single URL
if [[ -n "$SINGLE_URL" ]]; then
  if ! yt-dlp --dump-json "$SINGLE_URL" >/dev/null 2>&1; then
    echo "ERROR: Cannot access URL: $SINGLE_URL"
    exit 1
  fi
fi

3. Directory Structure Creation

create_acquisition_structure() {
  local output_base="$1"
  local artist="$2"
  local era="$3"

  # Sanitize directory names
  local safe_artist=$(echo "$artist" | sed 's/[^a-zA-Z0-9_-]/_/g')
  local safe_era=$(echo "$era" | sed 's/[^a-zA-Z0-9_-]/_/g')

  # Create directory tree
  local audio_dir="$output_base/$safe_artist/$safe_era/audio"
  local video_dir="$output_base/$safe_artist/$safe_era/video"

  mkdir -p "$audio_dir/.curator"
  mkdir -p "$video_dir/.curator"
  mkdir -p "$output_base/$safe_artist/.curator"

  # Write artist metadata
  cat > "$output_base/$safe_artist/.curator/artist-info.json" <<EOF
{
  "name": "$artist",
  "era": "$era",
  "created_at": "$(date -Iseconds)",
  "session_id": "$SESSION_ID"
}
EOF

  echo "$audio_dir:$video_dir"
}

4. Launch Downloads

# Download orchestration with concurrency control
ACTIVE_DOWNLOADS=()
MAX_CONCURRENT=${PARALLEL:-3}

launch_download() {
  local url="$1"
  local output_dir="$2"
  local format="$3"
  local download_id="dl-$SESSION_ID-$(date +%s)"

  # Wait for slot availability
  while [[ ${#ACTIVE_DOWNLOADS[@]} -ge $MAX_CONCURRENT ]]; do
    check_completed_downloads
    sleep 2
  done

  # Determine target directory (audio vs video)
  local target_dir
  if [[ "$format" == "audio" || "$format" == "bestaudio" ]]; then
    target_dir="$output_dir/audio"
  else
    target_dir="$output_dir/video"
  fi

  # Launch download in background
  (
    download_with_retry "$url" "$target_dir" "$format" "$download_id"
    echo "COMPLETED:$download_id:$?" >> "$SESSION_DIR/completion.log"
  ) &

  local pid=$!
  ACTIVE_DOWNLOADS+=("$download_id:$pid")

  # Update state file
  update_state_file "add" "$download_id" "$url" "in_progress"
}

download_with_retry() {
  local url="$1"
  local target_dir="$2"
  local format="$3"
  local download_id="$4"

  local attempt=0
  local max_attempts=3

  while [[ $attempt -lt $max_attempts ]]; do
    attempt=$((attempt + 1))

    echo "[$download_id] Attempt $attempt/$max_attempts"

    if yt-dlp -f "$format" \
        --output "$target_dir/%(title)s.%(ext)s" \
        --write-info-json \
        --write-thumbnail \
        --no-playlist \
        "$url" \
        2>&1 | tee "$SESSION_DIR/logs/$download_id.log"; then

      update_state_file "complete" "$download_id" "" "completed"
      return 0
    fi

    # Check error type
    local error_type=$(classify_error "$SESSION_DIR/logs/$download_id.log")

    if [[ "$error_type" == "disk_full" || "$error_type" == "permission_denied" ]]; then
      echo "FATAL: $error_type - aborting all downloads"
      pkill -P $$  # Kill all child processes
      exit 2
    fi

    if [[ $attempt -lt $max_attempts ]]; then
      local wait_time=$((5 * attempt))
      echo "Waiting ${wait_time}s before retry..."
      sleep "$wait_time"
    fi
  done

  update_state_file "fail" "$download_id" "" "failed"
  return 1
}

check_completed_downloads() {
  local new_active=()

  for entry in "${ACTIVE_DOWNLOADS[@]}"; do
    local pid="${entry#*:}"
    if kill -0 "$pid" 2>/dev/null; then
      new_active+=("$entry")
    fi
  done

  ACTIVE_DOWNLOADS=("${new_active[@]}")
}

5. Track Progress

# Real-time progress monitoring
monitor_progress() {
  local session_dir="$1"

  while true; do
    # Load current state
    local state_file="$session_dir/state.json"

    local total=$(jq '.downloads | length' "$state_file")
    local completed=$(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file")
    local in_progress=$(jq '[.downloads[] | select(.status == "in_progress")] | length' "$state_file")
    local failed=$(jq '[.downloads[] | select(.status == "failed")] | length' "$state_file")

    # Clear screen and display status
    clear
    cat <<EOF
ACQUISITION PROGRESS
====================
Session: $SESSION_ID
Started: $(jq -r '.started_at' "$state_file")

Status: $completed/$total completed ($failed failed, $in_progress in progress)

Active Downloads:
EOF

    # Show active download details
    jq -r '.downloads[] | select(.status == "in_progress") | "  [\(.progress_percent)%] \(.filename) @ \(.speed_mbps)MB/s (ETA: \(.eta_seconds)s)"' "$state_file"

    # Exit if all done
    if [[ $((completed + failed)) -eq $total ]]; then
      echo ""
      echo "All downloads completed."
      break
    fi

    sleep 5
  done
}

# Run in background
monitor_progress "$SESSION_DIR" &
MONITOR_PID=$!

6. Extract Audio (if requested)

if [[ "$EXTRACT_AUDIO" == "true" ]]; then
  echo "Extracting audio from video files..."

  find "$OUTPUT_DIR" -type d -name "video" | while read -r video_dir; do
    local audio_dir="${video_dir%/video}/audio"

    find "$video_dir" -type f \( -name "*.mkv" -o -name "*.mp4" -o -name "*.webm" \) | while read -r video_file; do
      local basename=$(basename "$video_file" | sed 's/\.[^.]*$//')
      local audio_file="$audio_dir/${basename}.opus"

      echo "  $video_file -> $audio_file"

      ffmpeg -i "$video_file" \
        -vn \
        -acodec libopus \
        -b:a 128K \
        "$audio_file" \
        2>&1 | tee -a "$SESSION_DIR/logs/audio-extraction.log"

      if [[ ${PIPESTATUS[0]} -eq 0 ]]; then
        echo "SUCCESS: $audio_file"
      else
        echo "FAILED: $video_file"
      fi
    done
  done
fi

7. Verify Downloads (if requested)

if [[ "$VERIFY_AFTER" == "true" ]]; then
  echo "Verifying downloaded files..."

  local verification_results="$SESSION_DIR/verification.json"
  echo '{"verified": [], "failed": []}' > "$verification_results"

  find "$OUTPUT_DIR" -type f \( -name "*.opus" -o -name "*.mkv" -o -name "*.mp4" -o -name "*.flac" \) | while read -r file; do
    echo "Verifying: $file"

    # Check file size
    local size=$(stat -c%s "$file" 2>/dev/null || stat -f%z "$file")
    if [[ $size -eq 0 ]]; then
      echo "FAILED: Zero-byte file"
      jq ".failed += [\"$file\"]" "$verification_results" > "$verification_results.tmp"
      mv "$verification_results.tmp" "$verification_results"
      continue
    fi

    # Check media integrity with ffmpeg
    if ffmpeg -v error -i "$file" -f null - 2>&1 | grep -q "error"; then
      echo "FAILED: Corrupted media file"
      jq ".failed += [\"$file\"]" "$verification_results" > "$verification_results.tmp"
      mv "$verification_results.tmp" "$verification_results"
      continue
    fi

    echo "VERIFIED: $file ($size bytes)"
    jq ".verified += [\"$file\"]" "$verification_results" > "$verification_results.tmp"
    mv "$verification_results.tmp" "$verification_results"
  done

  local verified_count=$(jq '.verified | length' "$verification_results")
  local failed_count=$(jq '.failed | length' "$verification_results")

  echo ""
  echo "Verification complete: $verified_count verified, $failed_count failed"
fi

8. Copy to Network Mount (if applicable)

if [[ -n "$LOCAL_WORK" && "$OUTPUT_DIR" != "$LOCAL_WORK"* ]]; then
  echo "Copying from local work directory to network mount..."

  # Batch copy with progress
  rsync -av --progress \
    --exclude='.DS_Store' \
    --exclude='Thumbs.db' \
    "$LOCAL_WORK/" "$OUTPUT_DIR/"

  # Verify checksums
  echo "Verifying network copy..."
  (cd "$LOCAL_WORK" && find . -type f -exec sha256sum {} \; | sort) > /tmp/local-checksums
  (cd "$OUTPUT_DIR" && find . -type f -exec sha256sum {} \; | sort) > /tmp/remote-checksums

  if diff /tmp/local-checksums /tmp/remote-checksums; then
    echo "Verification SUCCESS - removing local working copy"
    rm -rf "$LOCAL_WORK"
  else
    echo "WARNING: Checksum mismatch - keeping local copy at $LOCAL_WORK"
  fi
fi

Progress Display Format

Real-time console output during acquisition:

ACQUISITION PROGRESS
====================
Session: 20260214-143022
Started: 2026-02-14T14:30:22Z

Status: 5/12 completed (1 failed, 3 in progress)

Active Downloads:
  [67%] concert-1.mkv @ 12.5MB/s (ETA: 245s)
  [23%] album.flac @ 8.2MB/s (ETA: 680s)
  [91%] podcast-ep5.opus @ 2.1MB/s (ETA: 45s)

Recent Completions:
  [✓] documentary.mp4 (1.2GB, 8m 34s)
  [✓] live-set.opus (85MB, 1m 12s)

Recent Failures:
  [✗] unavailable-video.mkv (Video unavailable - marked for alternate source)

Total Downloaded: 4.8GB / ~12.5GB estimated
Average Speed: 8.9MB/s
Estimated Completion: 14:58 (28 minutes remaining)

Error Handling

Common Errors and Responses

ErrorDetectionResponseUser Action Required
URL inaccessiblePre-download validation failsSkip URL, mark for manual reviewCheck URL, update source plan
Network timeoutDownload stalls for 60sRetry with exponential backoff (3x)Check network connection
Rate limitedHTTP 429 responseWait 60s, retry (2x)Reduce parallel count
Disk fullWrite fails with ENOSPCSTOP all downloads, escalateFree disk space
Format unavailableyt-dlp format errorTry fallback formatsAccept lower quality or skip
Corrupted downloadffmpeg verification failsDelete and retry (2x)Report to source
Permission deniedWrite fails with EACCESSTOP, escalateFix permissions
Mount failureNetwork mount timeoutFall back to local-only modeCheck mount health

Error Log Structure

# Per-download error log: .curator/sessions/<session-id>/logs/<download-id>.log
[2026-02-14 14:35:22] Starting download: https://youtube.com/watch?v=abc123
[2026-02-14 14:35:23] Format selected: bestvideo[height<=1080]+bestaudio
[2026-02-14 14:37:45] ERROR: HTTP Error 429: Too Many Requests
[2026-02-14 14:37:45] Classified as: rate_limited
[2026-02-14 14:37:45] Applying retry strategy: wait 60s (attempt 1/2)
[2026-02-14 14:38:45] Retrying download...
[2026-02-14 14:42:10] Download completed: concert-1.mkv (1.2GB)

Session Failure Recovery

# Detect stale sessions and offer recovery
detect_incomplete_sessions() {
  find .curator/sessions -name "state.json" -mtime -7 | while read -r state_file; do
    local status=$(jq -r '.status' "$state_file")
    local session_id=$(jq -r '.session_id' "$state_file")

    if [[ "$status" == "in_progress" ]]; then
      local completed=$(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file")
      local total=$(jq '.downloads | length' "$state_file")

      echo "Incomplete session detected: $session_id ($completed/$total completed)"
      echo "Resume with: /acquire --resume $session_id"
    fi
  done
}

Report Generation

Final Session Report

generate_acquisition_report() {
  local session_dir="$1"
  local state_file="$session_dir/state.json"
  local report_file="$session_dir/report.md"

  cat > "$report_file" <<EOF
# Acquisition Session Report

**Session ID**: $(jq -r '.session_id' "$state_file")
**Started**: $(jq -r '.started_at' "$state_file")
**Completed**: $(date -Iseconds)
**Duration**: $(calculate_duration "$(jq -r '.started_at' "$state_file")" "$(date -Iseconds)")

## Summary

- **Total Downloads**: $(jq '.downloads | length' "$state_file")
- **Completed**: $(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file")
- **Failed**: $(jq '[.downloads[] | select(.status == "failed")] | length' "$state_file")
- **Total Size**: $(jq '[.downloads[] | select(.status == "completed") | .filesize_bytes] | add | . / 1073741824' "$state_file") GB

## Successful Downloads

$(jq -r '.downloads[] | select(.status == "completed") | "- \(.filename) (\(.filesize_bytes | tonumber / 1048576 | floor)MB)"' "$state_file")

## Failed Downloads

$(jq -r '.downloads[] | select(.status == "failed") | "- \(.url)\n  Error: \(.error)"' "$state_file")

## Parameters

- **Source Plan**: $(jq -r '.parameters.plan // "N/A"' "$state_file")
- **Format Preference**: $(jq -r '.parameters.format' "$state_file")
- **Output Directory**: $(jq -r '.parameters.output' "$state_file")
- **Parallel Downloads**: $(jq -r '.parameters.parallel' "$state_file")

## File Locations

- **Session Directory**: $session_dir
- **Download Logs**: $session_dir/logs/
- **Metadata**: $session_dir/metadata/
- **Verification Results**: $session_dir/verification.json

EOF

  echo "Report generated: $report_file"
  cat "$report_file"
}

Metadata Export

# Export metadata for external tools
export_metadata() {
  local session_dir="$1"
  local export_file="$session_dir/metadata-export.json"

  jq '{
    session_id: .session_id,
    started_at: .started_at,
    downloads: [
      .downloads[] | select(.status == "completed") | {
        url: .url,
        filename: .filename,
        format: .format,
        filesize_bytes: .filesize_bytes,
        duration_seconds: .duration_seconds,
        checksum_sha256: .checksum_sha256
      }
    ]
  }' "$session_dir/state.json" > "$export_file"

  echo "Metadata exported: $export_file"
}

Integration with Other Commands

Typical Workflow

# 1. Discover sources
/find-sources --artist "Pink Floyd" --era "1970s" --sources youtube,archive

# 2. Acquire media from discovered sources
/acquire --plan .curator/sources/plan-001.yaml \
  --format video \
  --extract-audio \
  --verify-after \
  --output /mnt/archive/pink-floyd

# 3. Extract metadata (runs automatically or manually)
/extract-metadata --source /mnt/archive/pink-floyd/The_Wall/audio

# 4. Organize final collection
/organize --source /mnt/archive/pink-floyd

Performance Considerations

Network Mount Optimization

  • Download to local first: Always use
    --local-work
    when output is network mount
  • Batch copy: Single rsync after all downloads complete
  • Verify checksums: Before deleting local copy
  • Concurrent limit: Reduce
    --parallel
    for network mounts (recommend 2)

Disk Space Management

  • Pre-flight check: Estimate total size from source plan
  • Monitor during: Watch for disk space warnings
  • Cleanup strategy: Remove verified local copies after network copy

Network Bandwidth

  • Parallel tuning: More concurrent downloads for high bandwidth
  • Rate limiting: Add
    --limit-rate
    to yt-dlp for shared networks
  • Peak hours: Schedule large downloads for off-peak times

Security Considerations

  • URL validation: Never execute arbitrary commands from URLs
  • Directory traversal: Sanitize all path components
  • Disk quota: Respect user/system quotas
  • Network access: Only connect to explicitly allowed domains
  • Credential handling: Never log or expose API keys/tokens

References

  • @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek explicit authorization before irreversible actions (overwrites, deletions)
  • @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research sources before deciding on acquisition strategy
  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/find-sources/SKILL.md — Source discovery skill used before acquisition
  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — Verify downloaded files after acquisition
  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Track origin and derivation of acquired media