Aiwg verify-archive
Verify archive integrity with self-verifying SHA-256 checksums, generate VERIFY.md, and optionally create W3C PROV provenance
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/verify-archive" ~/.claude/skills/jmagly-aiwg-verify-archive && rm -rf "$T"
.agents/skills/verify-archive/SKILL.mdverify-archive
Detect bit rot, tampering, and transfer errors in media archives through cryptographic checksum verification. Maintain provenance records and fixity information for long-term preservation.
Purpose
Media archives degrade over time through:
- Bit rot: Silent data corruption in storage media
- Transfer errors: Corruption during network transfer or backup
- Tampering: Unauthorized modifications
- Media failure: Gradual deterioration of physical storage
This command generates self-verifying checksum manifests, performs integrity verification, and optionally creates W3C PROV provenance records with PREMIS fixity metadata.
Parameters
| Parameter | Required | Description |
|---|---|---|
| Yes | Path to media archive directory |
| No | Generate new CHECKSUMS.sha256 manifest |
| No | Verify existing manifest against files |
| No | Generate PROVENANCE.jsonld with PREMIS fixity |
| No | Regenerate manifest after archive changes |
Mutually exclusive modes: Use
--generate for initial setup, --verify for routine checks, --fix after making changes.
CHECKSUMS.sha256 Format
Self-verifying manifest with cryptographic integrity protection:
# MANIFEST_HASH: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 # Generated: 2026-02-14T18:45:22.387654321Z # Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum 1a2b3c4d... ./audio/episode-001.opus 5e6f7g8h... ./audio/episode-002.opus 9i0j1k2l... ./video/recording-2026-02-14.mp4 3m4n5o6p... ./playlists/favorites.m3u 7q8r9s0t... ./README.md
Format specification:
- Line 1:
- SHA-256 hash of manifest content (lines 4+)# MANIFEST_HASH: <sha256> - Line 2:
- ISO 8601 UTC timestamp with nanosecond precision# Generated: <timestamp> - Line 3:
- Self-verification command# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum - Lines 4+:
- File checksums in sha256sum format<hash> <path>
Self-verification property: Modifying any hash entry invalidates the manifest hash. The manifest is tamper-evident.
Coverage: ALL files in the archive are checksummed:
- Audio files (opus, mp3, flac, m4a, aac)
- Video files (mp4, mkv, webm, avi)
- Images (jpg, png, webp, svg)
- Text files (md, txt, srt, vtt)
- Playlists (m3u, m3u8, pls)
- Scripts and metadata (json, yaml, sh)
Exclusions: Only
CHECKSUMS.sha256 itself is excluded to prevent circular dependency.
Verification Procedures
Quick Verification (Manifest Integrity)
Verify manifest has not been tampered with (sub-second):
cd /path/to/archive # Extract manifest hash from header EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}') # Compute hash of manifest content (lines 4+) ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}') # Compare if [ "$EXPECTED" = "$ACTUAL" ]; then echo "✓ Manifest integrity verified" else echo "✗ Manifest has been tampered with" exit 1 fi
Use case: Daily automated checks to detect manifest corruption without verifying all files.
Full Verification (All Files)
Verify all files match their checksums (minutes to hours depending on archive size):
cd /path/to/archive # First verify manifest integrity EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}') ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}') if [ "$EXPECTED" != "$ACTUAL" ]; then echo "✗ Manifest integrity check failed - stopping" exit 1 fi echo "✓ Manifest integrity verified" # Then verify all files tail -n +4 CHECKSUMS.sha256 | sha256sum -c
Use case: Weekly/monthly deep verification to detect bit rot.
Quiet Mode (Failures Only)
Show only files that failed verification:
cd /path/to/archive tail -n +4 CHECKSUMS.sha256 | sha256sum -c --quiet
Exit codes:
- All files verified successfully0
- One or more files failed verification1
Use case: Cron jobs and automated monitoring.
Timestamp Standard
All timestamps use ISO 8601 UTC with nanosecond precision:
Format:
YYYY-MM-DDTHH:MM:SS.NNNNNNNNNZ
Example:
2026-02-14T18:45:22.387654321Z
Generation command:
date -u +%Y-%m-%dT%H:%M:%S.%NZ
Requirements:
- Always UTC (trailing
), never local timezoneZ - Nanosecond precision (9 decimal places)
- ISO 8601 compliant
- Monotonic (later timestamps are lexicographically greater)
Workflow
Initial Setup (--generate)
aiwg verify-archive /path/to/archive --generate
Steps:
- Find all files recursively (excluding
)CHECKSUMS.sha256 - Compute SHA-256 hash for each file
- Sort results by path (deterministic order)
- Generate timestamp in ISO 8601 UTC format
- Write checksums to temporary file
- Compute SHA-256 of checksum content
- Add self-verifying header with manifest hash
- Write final
CHECKSUMS.sha256 - Generate
with human-readable instructionsVERIFY.md
Bash implementation:
ARCHIVE_PATH="/path/to/archive" TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ) cd "$ARCHIVE_PATH" # Generate checksums (exclude manifest itself) find . -type f ! -name "CHECKSUMS.sha256" -print0 | \ sort -z | \ xargs -0 sha256sum > /tmp/checksums.tmp # Compute manifest hash MANIFEST_HASH=$(sha256sum /tmp/checksums.tmp | awk '{print $1}') # Write final manifest with self-verifying header { echo "# MANIFEST_HASH: $MANIFEST_HASH" echo "# Generated: $TIMESTAMP" echo "# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum" cat /tmp/checksums.tmp } > CHECKSUMS.sha256 rm /tmp/checksums.tmp echo "✓ Generated CHECKSUMS.sha256 with $MANIFEST_HASH"
Routine Verification (--verify)
aiwg verify-archive /path/to/archive --verify
Steps:
- Read
from headerMANIFEST_HASH - Compute hash of manifest content (lines 4+)
- Compare hashes (quick integrity check)
- If manifest is intact, verify all files against checksums
- Report any mismatches or missing files
Regeneration After Changes (--fix)
aiwg verify-archive /path/to/archive --fix
Use case: After adding/removing/modifying files in the archive.
Steps:
- Backup existing
toCHECKSUMS.sha256CHECKSUMS.sha256.bak - Regenerate manifest (same as
)--generate - Report changes: added files, removed files, modified files
- Keep backup for comparison
Example output:
Backed up existing manifest to CHECKSUMS.sha256.bak Regenerating checksums... Changes detected: Added: ./audio/episode-003.opus Modified: ./README.md (hash changed) Removed: ./audio/episode-001-draft.opus ✓ Generated new CHECKSUMS.sha256
VERIFY.md Template
Human-readable verification instructions placed in archive root:
# Archive Integrity Verification This archive contains a self-verifying checksum manifest for detecting corruption, tampering, or transfer errors. ## Quick Verification (Manifest Integrity) Verify the manifest has not been tampered with (sub-second): \`\`\`bash cd "$(dirname "$0")" EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}') ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}') [ "$EXPECTED" = "$ACTUAL" ] && echo "✓ Manifest integrity verified" || echo "✗ Manifest tampered" \`\`\` ## Full Verification (All Files) Verify all files match their checksums: \`\`\`bash cd "$(dirname "$0")" tail -n +4 CHECKSUMS.sha256 | sha256sum -c \`\`\` ## Archive Information - **Generated**: {TIMESTAMP} - **File count**: {FILE_COUNT} - **Total size**: {TOTAL_SIZE} - **Manifest hash**: {MANIFEST_HASH} ## Recommended Verification Schedule - **Daily**: Quick manifest integrity check - **Weekly**: Full file verification - **After transfer**: Full verification immediately after copying/downloading - **Before backup**: Verify source integrity before creating backup ## If Verification Fails 1. **Manifest integrity failure**: Manifest has been tampered with or corrupted. Regenerate from source. 2. **File verification failure**: File has been corrupted or modified. Restore from backup or regenerate. ## Regeneration After making changes to the archive, regenerate the manifest: \`\`\`bash aiwg verify-archive . --fix \`\`\` --- *Self-verifying checksums using SHA-256 (NIST FIPS 180-4)*
Template variables:
- ISO 8601 generation timestamp{TIMESTAMP}
- Number of files in manifest{FILE_COUNT}
- Total archive size in human-readable format{TOTAL_SIZE}
- SHA-256 hash of manifest content{MANIFEST_HASH}
PROVENANCE.jsonld Integration
W3C PROV-O format with PREMIS fixity metadata:
{ "@context": { "prov": "http://www.w3.org/ns/prov#", "premis": "http://www.loc.gov/premis/rdf/v3/", "schema": "http://schema.org/", "xsd": "http://www.w3.org/2001/XMLSchema#" }, "@graph": [ { "@id": "urn:uuid:ARCHIVE_UUID", "@type": ["prov:Entity", "premis:IntellectualEntity"], "prov:generatedAtTime": { "@type": "xsd:dateTime", "@value": "2026-02-14T18:45:22.387654321Z" }, "premis:hasFixity": { "@type": "premis:Fixity", "premis:hasMessageDigest": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "premis:hasMessageDigestAlgorithm": "SHA-256", "premis:hasMessageDigestOriginator": "aiwg verify-archive v2026.2.6", "premis:fixityCheckDateTime": { "@type": "xsd:dateTime", "@value": "2026-02-14T18:45:22.387654321Z" } }, "schema:contentUrl": "./CHECKSUMS.sha256" }, { "@id": "urn:uuid:ACTIVITY_UUID", "@type": "prov:Activity", "prov:startedAtTime": { "@type": "xsd:dateTime", "@value": "2026-02-14T18:45:20.123456789Z" }, "prov:endedAtTime": { "@type": "xsd:dateTime", "@value": "2026-02-14T18:45:22.387654321Z" }, "prov:wasAssociatedWith": { "@id": "urn:aiwg:agent:media-curator" } }, { "@id": "urn:aiwg:agent:media-curator", "@type": "prov:SoftwareAgent", "schema:name": "AIWG Media Curator", "schema:softwareVersion": "2026.2.6" } ] }
Key relationships:
- Links entity to fixity informationpremis:hasFixity
- SHA-256 hash withpremis:hasMessageDigest
prefixsha256:
- When verification was performedpremis:fixityCheckDateTime
- Agent that performed verificationprov:wasAssociatedWith
Use case: Long-term digital preservation requiring audit trails and provenance chains.
Standards Compliance
| Standard | Purpose | Reference |
|---|---|---|
| SHA-256 | Cryptographic hash function | NIST FIPS 180-4 |
| ISO 8601 | Timestamp format | ISO 8601:2019 |
| PREMIS 3.0 | Preservation metadata | Library of Congress |
| W3C PROV-O | Provenance ontology | W3C Recommendation 2013 |
| JSON-LD 1.1 | Linked data format | W3C Recommendation 2020 |
Error Handling
Missing archive: Exit with error if archive path does not exist.
No CHECKSUMS.sha256: For
--verify, exit with error. For --generate, create new manifest.
Hash mismatch: Report which files failed, exit code 1.
Permission errors: Report files that cannot be read, continue verification for accessible files.
Corrupt manifest: If manifest cannot be parsed, exit with error and suggest regeneration.
Deliverables
When running
--generate or --fix:
- CHECKSUMS.sha256 - Self-verifying checksum manifest
- VERIFY.md - Human-readable verification instructions
When running
--provenance:
- PROVENANCE.jsonld - W3C PROV-O provenance record with PREMIS fixity
Examples
Initial setup:
aiwg verify-archive ~/media/podcast-archive --generate
Routine verification:
aiwg verify-archive ~/media/podcast-archive --verify
After adding new episodes:
aiwg verify-archive ~/media/podcast-archive --fix
Generate with provenance:
aiwg verify-archive ~/media/podcast-archive --generate --provenance
Automated monitoring (cron):
0 2 * * * aiwg verify-archive /media/archives/podcast --verify --quiet || echo "Archive verification failed" | mail -s "Alert" admin@example.com
References
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek explicit authorization before regenerating or fixing checksums (--generate, --fix)
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — Core integrity verification patterns used by this skill
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Provenance records generated via --provenance flag
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/curate/SKILL.md — Orchestration skill that invokes verify-archive as a curation phase