Aiwg Provenance Tracking
W3C PROV-O patterns for tracking media derivation chains and production history
install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/Provenance Tracking" ~/.claude/skills/jmagly-aiwg-provenance-tracking && rm -rf "$T"
manifest:
.agents/skills/Provenance Tracking/SKILL.mdsource content
Provenance Tracking
W3C PROV-compliant provenance tracking for media archives using the Entity-Activity-Agent model in JSON-LD format.
PROV Model for Media
Entity (prov:Entity)
A media file — the thing being tracked.
Activity (prov:Activity)
A transformation — download, transcode, tag, extract audio.
Agent (prov:Agent)
A tool or person — yt-dlp, ffmpeg, opustags, the curator.
Derivation Chains
Media files often go through multiple transformations:
YouTube video (Entity) → downloaded by yt-dlp (Activity, Agent: yt-dlp) → raw .mkv file (Entity) → audio extracted by ffmpeg (Activity, Agent: ffmpeg) → raw .opus file (Entity) → tagged by opustags (Activity, Agent: opustags) → final .opus file (Entity)
Each step is recorded with timestamps, tool versions, and parameters.
PROVENANCE.jsonld Template
{ "@context": { "prov": "http://www.w3.org/ns/prov#", "schema": "http://schema.org/", "premis": "http://www.loc.gov/premis/rdf/v3/", "dc": "http://purl.org/dc/terms/", "xsd": "http://www.w3.org/2001/XMLSchema#" }, "@graph": [ { "@id": "urn:archive:entity:collection", "@type": "prov:Collection", "dc:title": "Artist Name - Complete Collection", "prov:generatedAtTime": "2026-01-26T01:33:51.302755851Z", "schema:numberOfItems": 1109, "schema:size": "94GB", "premis:hasFixity": { "@type": "premis:Fixity", "premis:messageDigestAlgorithm": { "@id": "http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/sha256" }, "premis:messageDigestOriginator": "sha256sum (GNU coreutils)", "schema:url": "CHECKSUMS.sha256", "schema:numberOfItems": 1109 } }, { "@id": "urn:archive:activity:acquisition", "@type": "prov:Activity", "prov:startedAtTime": "2026-01-24T15:00:00.000000000Z", "prov:endedAtTime": "2026-01-26T01:33:51.302755851Z", "prov:wasAssociatedWith": [ {"@id": "urn:archive:agent:yt-dlp"}, {"@id": "urn:archive:agent:ffmpeg"}, {"@id": "urn:archive:agent:opustags"} ], "prov:generated": {"@id": "urn:archive:entity:collection"} }, { "@id": "urn:archive:agent:yt-dlp", "@type": ["prov:SoftwareAgent", "prov:Agent"], "schema:name": "yt-dlp", "schema:softwareVersion": "2024.12.23", "schema:description": "Video/audio downloader" }, { "@id": "urn:archive:agent:ffmpeg", "@type": ["prov:SoftwareAgent", "prov:Agent"], "schema:name": "ffmpeg", "schema:softwareVersion": "7.1", "schema:description": "Audio/video transcoder" }, { "@id": "urn:archive:agent:opustags", "@type": ["prov:SoftwareAgent", "prov:Agent"], "schema:name": "opustags", "schema:softwareVersion": "1.9.0", "schema:description": "Opus metadata tagger" } ] }
Per-File Provenance
For detailed tracking, record provenance per file:
{ "@id": "urn:archive:entity:car-radio-opus", "@type": "prov:Entity", "dc:title": "Car Radio", "schema:encodingFormat": "audio/opus", "prov:wasDerivedFrom": { "@id": "urn:archive:entity:car-radio-mkv" }, "prov:wasGeneratedBy": { "@id": "urn:archive:activity:extract-audio-001", "@type": "prov:Activity", "prov:used": {"@id": "urn:archive:entity:car-radio-mkv"}, "prov:wasAssociatedWith": {"@id": "urn:archive:agent:ffmpeg"}, "prov:startedAtTime": "2026-01-25T10:15:30.000000000Z", "schema:description": "ffmpeg -i car-radio.mkv -vn -acodec libopus -b:a 128k car-radio.opus" } }
Timestamp Standard
All timestamps MUST use ISO 8601 UTC with nanosecond precision:
Format: YYYY-MM-DDTHH:MM:SS.NNNNNNNNNZ Example: 2026-01-26T01:33:51.302755851Z Command: date -u +%Y-%m-%dT%H:%M:%S.%NZ
- Always UTC (Z suffix), never local timezone
- 9 decimal places (nanosecond precision)
Generation Commands
Get Tool Versions
yt-dlp --version ffmpeg -version | head -1 | awk '{print $3}' opustags --version 2>&1 | head -1
Generate Timestamp
date -u +%Y-%m-%dT%H:%M:%S.%NZ
Count Files
find . -type f ! -name "CHECKSUMS.sha256" ! -name "PROVENANCE.jsonld" | wc -l
Calculate Total Size
du -sh . | cut -f1
Standards Reference
| Standard | Version | Usage |
|---|---|---|
| W3C PROV-O | 2013-04-30 | Provenance ontology |
| PREMIS | 3.0 | Preservation fixity metadata |
| Dublin Core | 1.1 | Descriptive metadata |
| Schema.org | Latest | Structured data |
| JSON-LD | 1.1 | Serialization format |
| ISO 8601 | 2019 | Timestamps |
References
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — Integrity verification that generates fixity records tracked by provenance
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/acquire/SKILL.md — Acquisition skill whose download events are recorded as provenance activities
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/audio-extraction/SKILL.md — Audio extraction that produces derivation relationships tracked in provenance
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/verify-archive/SKILL.md — Archive verification skill that can generate provenance records via --provenance flag