Aiwg acquire
Download media from discovered sources with format selection and progress tracking
install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/media-curator/skills/acquire" ~/.claude/skills/jmagly-aiwg-acquire-ea91cb && rm -rf "$T"
manifest:
agentic/code/frameworks/media-curator/skills/acquire/SKILL.mdsource content
/acquire
Download media from discovered sources with intelligent format selection, parallel execution, and comprehensive progress tracking.
Purpose
The
/acquire command orchestrates media downloads from various sources (YouTube, Internet Archive, Bandcamp, direct links) using the Acquisition Manager agent. It handles:
- Single URL or batch downloads from source plans
- Automatic format selection based on content type
- Parallel download execution with resource limits
- Real-time progress tracking and status reporting
- Error recovery with retry strategies
- Quality verification and metadata recording
- Network mount awareness for optimal performance
Parameters
Required (one of)
--plan <sources.yaml>
- Source plan file generated by
command/find-sources - YAML format containing URLs, metadata, and acquisition strategy
- Example:
.curator/sources/plan-001.yaml
--url <URL>
- Single URL for one-off downloads
- Supports: YouTube, Internet Archive, Bandcamp, SoundCloud, Vimeo, direct links
- Example:
https://youtube.com/watch?v=abc123
Optional
--format <audio|video|best>
- Format preference for downloads
: Extract audio only (Opus 128K default)audio
: Best video up to 1080p with audiovideo
: Let agent decide based on content type (default)best- Example:
--format audio
--output <directory>
- Base output directory for downloads
- Default:
./downloads/<timestamp> - Structure:
and<output>/<artist>/<era>/audio//video/ - Example:
--output /mnt/archive/music
--parallel <N>
- Maximum concurrent downloads
- Range: 1-5 (default: 3)
- Lower for slow networks, higher for fast connections
- Example:
--parallel 5
--local-work <directory>
- Local working directory (required for network mount outputs)
- Downloads happen here first, then copied to output
- Default:
/tmp/curator-work-$$ - Example:
--local-work /fast-local-disk/tmp
--verify-after
- Enable post-download verification
- Checks file integrity, size, media validity
- Adds ~5% overhead but prevents corruption
- Example:
--verify-after
--extract-audio
- Automatically extract audio from downloaded videos
- Creates parallel audio files in audio/ directory
- Uses Opus 128K by default
- Example:
--extract-audio
--resume <session-id>
- Resume interrupted download session
- Loads state from
.curator/sessions/<session-id>/state.json - Skips completed downloads, retries failed
- Example:
--resume 20260214-143022
Usage Examples
Single URL Download
# Download single YouTube video (audio only) /acquire --url "https://youtube.com/watch?v=abc123" --format audio # Download single video with metadata /acquire --url "https://youtube.com/watch?v=abc123" \ --format video \ --output /mnt/archive/concerts \ --verify-after
Batch Download from Plan
# Download entire source plan (generated by /find-sources) /acquire --plan .curator/sources/plan-001.yaml # Download to network mount (uses local working dir) /acquire --plan .curator/sources/plan-001.yaml \ --output /mnt/network-storage/archive \ --local-work /fast-ssd/curator-work \ --parallel 3
Audio Extraction Workflow
# Download videos and auto-extract audio /acquire --plan .curator/sources/plan-001.yaml \ --format video \ --extract-audio \ --verify-after # Result structure: # downloads/artist/era/video/concert.mkv # downloads/artist/era/audio/concert.opus
Resume Interrupted Session
# Resume after network failure /acquire --resume 20260214-143022 # Resume with different output (move completed files) /acquire --resume 20260214-143022 \ --output /new/location
Workflow
1. Initialization
# Validate parameters - Check --plan exists OR --url provided - Verify output directory is writable - Check required tools (yt-dlp, wget, curl, ffmpeg) - Test network mount performance if applicable - Create session directory structure # Session setup SESSION_ID=$(date +%Y%m%d-%H%M%S) SESSION_DIR=".curator/sessions/$SESSION_ID" mkdir -p "$SESSION_DIR"/{logs,metadata} # Initialize state file cat > "$SESSION_DIR/state.json" <<EOF { "session_id": "$SESSION_ID", "started_at": "$(date -Iseconds)", "status": "initializing", "parameters": { "plan": "$PLAN_FILE", "format": "$FORMAT", "output": "$OUTPUT_DIR", "parallel": $PARALLEL }, "downloads": [] } EOF
2. Source Validation
# If --plan provided, parse YAML if [[ -n "$PLAN_FILE" ]]; then # Extract URLs and metadata SOURCES=$(yq eval '.sources[] | .url' "$PLAN_FILE") TOTAL_SOURCES=$(echo "$SOURCES" | wc -l) # Validate each URL is accessible while IFS= read -r url; do if yt-dlp --dump-json "$url" >/dev/null 2>&1; then echo "VALID: $url" else echo "WARNING: Cannot access $url" fi done <<< "$SOURCES" fi # If --url provided, validate single URL if [[ -n "$SINGLE_URL" ]]; then if ! yt-dlp --dump-json "$SINGLE_URL" >/dev/null 2>&1; then echo "ERROR: Cannot access URL: $SINGLE_URL" exit 1 fi fi
3. Directory Structure Creation
create_acquisition_structure() { local output_base="$1" local artist="$2" local era="$3" # Sanitize directory names local safe_artist=$(echo "$artist" | sed 's/[^a-zA-Z0-9_-]/_/g') local safe_era=$(echo "$era" | sed 's/[^a-zA-Z0-9_-]/_/g') # Create directory tree local audio_dir="$output_base/$safe_artist/$safe_era/audio" local video_dir="$output_base/$safe_artist/$safe_era/video" mkdir -p "$audio_dir/.curator" mkdir -p "$video_dir/.curator" mkdir -p "$output_base/$safe_artist/.curator" # Write artist metadata cat > "$output_base/$safe_artist/.curator/artist-info.json" <<EOF { "name": "$artist", "era": "$era", "created_at": "$(date -Iseconds)", "session_id": "$SESSION_ID" } EOF echo "$audio_dir:$video_dir" }
4. Launch Downloads
# Download orchestration with concurrency control ACTIVE_DOWNLOADS=() MAX_CONCURRENT=${PARALLEL:-3} launch_download() { local url="$1" local output_dir="$2" local format="$3" local download_id="dl-$SESSION_ID-$(date +%s)" # Wait for slot availability while [[ ${#ACTIVE_DOWNLOADS[@]} -ge $MAX_CONCURRENT ]]; do check_completed_downloads sleep 2 done # Determine target directory (audio vs video) local target_dir if [[ "$format" == "audio" || "$format" == "bestaudio" ]]; then target_dir="$output_dir/audio" else target_dir="$output_dir/video" fi # Launch download in background ( download_with_retry "$url" "$target_dir" "$format" "$download_id" echo "COMPLETED:$download_id:$?" >> "$SESSION_DIR/completion.log" ) & local pid=$! ACTIVE_DOWNLOADS+=("$download_id:$pid") # Update state file update_state_file "add" "$download_id" "$url" "in_progress" } download_with_retry() { local url="$1" local target_dir="$2" local format="$3" local download_id="$4" local attempt=0 local max_attempts=3 while [[ $attempt -lt $max_attempts ]]; do attempt=$((attempt + 1)) echo "[$download_id] Attempt $attempt/$max_attempts" if yt-dlp -f "$format" \ --output "$target_dir/%(title)s.%(ext)s" \ --write-info-json \ --write-thumbnail \ --no-playlist \ "$url" \ 2>&1 | tee "$SESSION_DIR/logs/$download_id.log"; then update_state_file "complete" "$download_id" "" "completed" return 0 fi # Check error type local error_type=$(classify_error "$SESSION_DIR/logs/$download_id.log") if [[ "$error_type" == "disk_full" || "$error_type" == "permission_denied" ]]; then echo "FATAL: $error_type - aborting all downloads" pkill -P $$ # Kill all child processes exit 2 fi if [[ $attempt -lt $max_attempts ]]; then local wait_time=$((5 * attempt)) echo "Waiting ${wait_time}s before retry..." sleep "$wait_time" fi done update_state_file "fail" "$download_id" "" "failed" return 1 } check_completed_downloads() { local new_active=() for entry in "${ACTIVE_DOWNLOADS[@]}"; do local pid="${entry#*:}" if kill -0 "$pid" 2>/dev/null; then new_active+=("$entry") fi done ACTIVE_DOWNLOADS=("${new_active[@]}") }
5. Track Progress
# Real-time progress monitoring monitor_progress() { local session_dir="$1" while true; do # Load current state local state_file="$session_dir/state.json" local total=$(jq '.downloads | length' "$state_file") local completed=$(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file") local in_progress=$(jq '[.downloads[] | select(.status == "in_progress")] | length' "$state_file") local failed=$(jq '[.downloads[] | select(.status == "failed")] | length' "$state_file") # Clear screen and display status clear cat <<EOF ACQUISITION PROGRESS ==================== Session: $SESSION_ID Started: $(jq -r '.started_at' "$state_file") Status: $completed/$total completed ($failed failed, $in_progress in progress) Active Downloads: EOF # Show active download details jq -r '.downloads[] | select(.status == "in_progress") | " [\(.progress_percent)%] \(.filename) @ \(.speed_mbps)MB/s (ETA: \(.eta_seconds)s)"' "$state_file" # Exit if all done if [[ $((completed + failed)) -eq $total ]]; then echo "" echo "All downloads completed." break fi sleep 5 done } # Run in background monitor_progress "$SESSION_DIR" & MONITOR_PID=$!
6. Extract Audio (if requested)
if [[ "$EXTRACT_AUDIO" == "true" ]]; then echo "Extracting audio from video files..." find "$OUTPUT_DIR" -type d -name "video" | while read -r video_dir; do local audio_dir="${video_dir%/video}/audio" find "$video_dir" -type f \( -name "*.mkv" -o -name "*.mp4" -o -name "*.webm" \) | while read -r video_file; do local basename=$(basename "$video_file" | sed 's/\.[^.]*$//') local audio_file="$audio_dir/${basename}.opus" echo " $video_file -> $audio_file" ffmpeg -i "$video_file" \ -vn \ -acodec libopus \ -b:a 128K \ "$audio_file" \ 2>&1 | tee -a "$SESSION_DIR/logs/audio-extraction.log" if [[ ${PIPESTATUS[0]} -eq 0 ]]; then echo "SUCCESS: $audio_file" else echo "FAILED: $video_file" fi done done fi
7. Verify Downloads (if requested)
if [[ "$VERIFY_AFTER" == "true" ]]; then echo "Verifying downloaded files..." local verification_results="$SESSION_DIR/verification.json" echo '{"verified": [], "failed": []}' > "$verification_results" find "$OUTPUT_DIR" -type f \( -name "*.opus" -o -name "*.mkv" -o -name "*.mp4" -o -name "*.flac" \) | while read -r file; do echo "Verifying: $file" # Check file size local size=$(stat -c%s "$file" 2>/dev/null || stat -f%z "$file") if [[ $size -eq 0 ]]; then echo "FAILED: Zero-byte file" jq ".failed += [\"$file\"]" "$verification_results" > "$verification_results.tmp" mv "$verification_results.tmp" "$verification_results" continue fi # Check media integrity with ffmpeg if ffmpeg -v error -i "$file" -f null - 2>&1 | grep -q "error"; then echo "FAILED: Corrupted media file" jq ".failed += [\"$file\"]" "$verification_results" > "$verification_results.tmp" mv "$verification_results.tmp" "$verification_results" continue fi echo "VERIFIED: $file ($size bytes)" jq ".verified += [\"$file\"]" "$verification_results" > "$verification_results.tmp" mv "$verification_results.tmp" "$verification_results" done local verified_count=$(jq '.verified | length' "$verification_results") local failed_count=$(jq '.failed | length' "$verification_results") echo "" echo "Verification complete: $verified_count verified, $failed_count failed" fi
8. Copy to Network Mount (if applicable)
if [[ -n "$LOCAL_WORK" && "$OUTPUT_DIR" != "$LOCAL_WORK"* ]]; then echo "Copying from local work directory to network mount..." # Batch copy with progress rsync -av --progress \ --exclude='.DS_Store' \ --exclude='Thumbs.db' \ "$LOCAL_WORK/" "$OUTPUT_DIR/" # Verify checksums echo "Verifying network copy..." (cd "$LOCAL_WORK" && find . -type f -exec sha256sum {} \; | sort) > /tmp/local-checksums (cd "$OUTPUT_DIR" && find . -type f -exec sha256sum {} \; | sort) > /tmp/remote-checksums if diff /tmp/local-checksums /tmp/remote-checksums; then echo "Verification SUCCESS - removing local working copy" rm -rf "$LOCAL_WORK" else echo "WARNING: Checksum mismatch - keeping local copy at $LOCAL_WORK" fi fi
Progress Display Format
Real-time console output during acquisition:
ACQUISITION PROGRESS ==================== Session: 20260214-143022 Started: 2026-02-14T14:30:22Z Status: 5/12 completed (1 failed, 3 in progress) Active Downloads: [67%] concert-1.mkv @ 12.5MB/s (ETA: 245s) [23%] album.flac @ 8.2MB/s (ETA: 680s) [91%] podcast-ep5.opus @ 2.1MB/s (ETA: 45s) Recent Completions: [✓] documentary.mp4 (1.2GB, 8m 34s) [✓] live-set.opus (85MB, 1m 12s) Recent Failures: [✗] unavailable-video.mkv (Video unavailable - marked for alternate source) Total Downloaded: 4.8GB / ~12.5GB estimated Average Speed: 8.9MB/s Estimated Completion: 14:58 (28 minutes remaining)
Error Handling
Common Errors and Responses
| Error | Detection | Response | User Action Required |
|---|---|---|---|
| URL inaccessible | Pre-download validation fails | Skip URL, mark for manual review | Check URL, update source plan |
| Network timeout | Download stalls for 60s | Retry with exponential backoff (3x) | Check network connection |
| Rate limited | HTTP 429 response | Wait 60s, retry (2x) | Reduce parallel count |
| Disk full | Write fails with ENOSPC | STOP all downloads, escalate | Free disk space |
| Format unavailable | yt-dlp format error | Try fallback formats | Accept lower quality or skip |
| Corrupted download | ffmpeg verification fails | Delete and retry (2x) | Report to source |
| Permission denied | Write fails with EACCES | STOP, escalate | Fix permissions |
| Mount failure | Network mount timeout | Fall back to local-only mode | Check mount health |
Error Log Structure
# Per-download error log: .curator/sessions/<session-id>/logs/<download-id>.log [2026-02-14 14:35:22] Starting download: https://youtube.com/watch?v=abc123 [2026-02-14 14:35:23] Format selected: bestvideo[height<=1080]+bestaudio [2026-02-14 14:37:45] ERROR: HTTP Error 429: Too Many Requests [2026-02-14 14:37:45] Classified as: rate_limited [2026-02-14 14:37:45] Applying retry strategy: wait 60s (attempt 1/2) [2026-02-14 14:38:45] Retrying download... [2026-02-14 14:42:10] Download completed: concert-1.mkv (1.2GB)
Session Failure Recovery
# Detect stale sessions and offer recovery detect_incomplete_sessions() { find .curator/sessions -name "state.json" -mtime -7 | while read -r state_file; do local status=$(jq -r '.status' "$state_file") local session_id=$(jq -r '.session_id' "$state_file") if [[ "$status" == "in_progress" ]]; then local completed=$(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file") local total=$(jq '.downloads | length' "$state_file") echo "Incomplete session detected: $session_id ($completed/$total completed)" echo "Resume with: /acquire --resume $session_id" fi done }
Report Generation
Final Session Report
generate_acquisition_report() { local session_dir="$1" local state_file="$session_dir/state.json" local report_file="$session_dir/report.md" cat > "$report_file" <<EOF # Acquisition Session Report **Session ID**: $(jq -r '.session_id' "$state_file") **Started**: $(jq -r '.started_at' "$state_file") **Completed**: $(date -Iseconds) **Duration**: $(calculate_duration "$(jq -r '.started_at' "$state_file")" "$(date -Iseconds)") ## Summary - **Total Downloads**: $(jq '.downloads | length' "$state_file") - **Completed**: $(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file") - **Failed**: $(jq '[.downloads[] | select(.status == "failed")] | length' "$state_file") - **Total Size**: $(jq '[.downloads[] | select(.status == "completed") | .filesize_bytes] | add | . / 1073741824' "$state_file") GB ## Successful Downloads $(jq -r '.downloads[] | select(.status == "completed") | "- \(.filename) (\(.filesize_bytes | tonumber / 1048576 | floor)MB)"' "$state_file") ## Failed Downloads $(jq -r '.downloads[] | select(.status == "failed") | "- \(.url)\n Error: \(.error)"' "$state_file") ## Parameters - **Source Plan**: $(jq -r '.parameters.plan // "N/A"' "$state_file") - **Format Preference**: $(jq -r '.parameters.format' "$state_file") - **Output Directory**: $(jq -r '.parameters.output' "$state_file") - **Parallel Downloads**: $(jq -r '.parameters.parallel' "$state_file") ## File Locations - **Session Directory**: $session_dir - **Download Logs**: $session_dir/logs/ - **Metadata**: $session_dir/metadata/ - **Verification Results**: $session_dir/verification.json EOF echo "Report generated: $report_file" cat "$report_file" }
Metadata Export
# Export metadata for external tools export_metadata() { local session_dir="$1" local export_file="$session_dir/metadata-export.json" jq '{ session_id: .session_id, started_at: .started_at, downloads: [ .downloads[] | select(.status == "completed") | { url: .url, filename: .filename, format: .format, filesize_bytes: .filesize_bytes, duration_seconds: .duration_seconds, checksum_sha256: .checksum_sha256 } ] }' "$session_dir/state.json" > "$export_file" echo "Metadata exported: $export_file" }
Integration with Other Commands
Typical Workflow
# 1. Discover sources /find-sources --artist "Pink Floyd" --era "1970s" --sources youtube,archive # 2. Acquire media from discovered sources /acquire --plan .curator/sources/plan-001.yaml \ --format video \ --extract-audio \ --verify-after \ --output /mnt/archive/pink-floyd # 3. Extract metadata (runs automatically or manually) /extract-metadata --source /mnt/archive/pink-floyd/The_Wall/audio # 4. Organize final collection /organize --source /mnt/archive/pink-floyd
Performance Considerations
Network Mount Optimization
- Download to local first: Always use
when output is network mount--local-work - Batch copy: Single rsync after all downloads complete
- Verify checksums: Before deleting local copy
- Concurrent limit: Reduce
for network mounts (recommend 2)--parallel
Disk Space Management
- Pre-flight check: Estimate total size from source plan
- Monitor during: Watch for disk space warnings
- Cleanup strategy: Remove verified local copies after network copy
Network Bandwidth
- Parallel tuning: More concurrent downloads for high bandwidth
- Rate limiting: Add
to yt-dlp for shared networks--limit-rate - Peak hours: Schedule large downloads for off-peak times
Security Considerations
- URL validation: Never execute arbitrary commands from URLs
- Directory traversal: Sanitize all path components
- Disk quota: Respect user/system quotas
- Network access: Only connect to explicitly allowed domains
- Credential handling: Never log or expose API keys/tokens
References
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek explicit authorization before irreversible actions (overwrites, deletions)
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research sources before deciding on acquisition strategy
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/find-sources/SKILL.md — Source discovery skill used before acquisition
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — Verify downloaded files after acquisition
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Track origin and derivation of acquired media