Aiwg Integrity Verification
SHA-256 checksum manifest generation, self-verification, and PREMIS fixity patterns
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/Integrity Verification" ~/.claude/skills/jmagly-aiwg-integrity-verification && rm -rf "$T"
.agents/skills/Integrity Verification/SKILL.mdIntegrity Verification
Cryptographic checksum verification patterns for detecting bit rot, tampering, and transfer errors in media archives. Implements self-verifying manifests with PREMIS fixity metadata.
Manifest Generation Script
Complete bash implementation for generating self-verifying checksum manifests:
#!/bin/bash set -euo pipefail # Archive Checksum Manifest Generator # Generates self-verifying SHA-256 checksum manifest # Usage: ./generate-checksums.sh /path/to/archive ARCHIVE_PATH="${1:-.}" CHECKSUM_FILE="CHECKSUMS.sha256" TEMP_FILE="/tmp/checksums-$$.tmp" # Validate archive exists if [ ! -d "$ARCHIVE_PATH" ]; then echo "Error: Archive directory not found: $ARCHIVE_PATH" >&2 exit 1 fi cd "$ARCHIVE_PATH" echo "Generating checksums for: $ARCHIVE_PATH" # Find all files, exclude checksum manifest itself # Use null-terminated strings for handling filenames with spaces find . -type f ! -name "$CHECKSUM_FILE" -print0 | \ sort -z | \ xargs -0 sha256sum > "$TEMP_FILE" # Count files FILE_COUNT=$(wc -l < "$TEMP_FILE") echo "Found $FILE_COUNT files" # Compute manifest hash (hash of the checksum content) MANIFEST_HASH=$(sha256sum "$TEMP_FILE" | awk '{print $1}') # Generate timestamp (ISO 8601 UTC with nanosecond precision) TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ) # Write final manifest with self-verifying header { echo "# MANIFEST_HASH: $MANIFEST_HASH" echo "# Generated: $TIMESTAMP" echo "# Verify with: tail -n +4 $CHECKSUM_FILE | sha256sum" cat "$TEMP_FILE" } > "$CHECKSUM_FILE" # Clean up rm "$TEMP_FILE" echo "✓ Generated $CHECKSUM_FILE" echo " Manifest hash: $MANIFEST_HASH" echo " Timestamp: $TIMESTAMP" echo " Files: $FILE_COUNT"
Key features:
- Handles filenames with spaces via null-terminated strings (
,-print0
,-z
)-0 - Deterministic output (sorted by path)
- Self-verifying header with manifest hash
- ISO 8601 UTC timestamps with nanosecond precision
- Exit on error (
)set -euo pipefail
Verification Commands
Quick Manifest Integrity Check
Verify manifest has not been tampered with (sub-second):
#!/bin/bash # Quick verification - manifest integrity only CHECKSUM_FILE="CHECKSUMS.sha256" if [ ! -f "$CHECKSUM_FILE" ]; then echo "✗ Checksum manifest not found: $CHECKSUM_FILE" >&2 exit 1 fi # Extract expected hash from header EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}') if [ -z "$EXPECTED" ]; then echo "✗ Manifest header missing or malformed" >&2 exit 1 fi # Compute actual hash of manifest content (lines 4+) ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}') # Compare if [ "$EXPECTED" = "$ACTUAL" ]; then echo "✓ Manifest integrity verified" echo " Hash: $EXPECTED" exit 0 else echo "✗ Manifest has been tampered with" >&2 echo " Expected: $EXPECTED" >&2 echo " Actual: $ACTUAL" >&2 exit 1 fi
Use case: Daily automated checks. Fast execution regardless of archive size.
Exit codes:
- Manifest integrity verified0
- Manifest corrupted or tampered1
Full File Verification
Verify all files match their checksums:
#!/bin/bash # Full verification - manifest integrity + all files CHECKSUM_FILE="CHECKSUMS.sha256" # Step 1: Verify manifest integrity echo "Step 1: Verifying manifest integrity..." EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}') ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}') if [ "$EXPECTED" != "$ACTUAL" ]; then echo "✗ Manifest integrity check failed - stopping" >&2 exit 1 fi echo "✓ Manifest integrity verified" # Step 2: Verify all files echo "Step 2: Verifying all files..." if tail -n +4 "$CHECKSUM_FILE" | sha256sum -c; then echo "✓ All files verified successfully" exit 0 else echo "✗ One or more files failed verification" >&2 exit 1 fi
Output format (from
sha256sum -c):
./audio/episode-001.opus: OK ./audio/episode-002.opus: OK ./video/recording.mp4: FAILED sha256sum: WARNING: 1 computed checksum did NOT match
Exit codes:
- All files verified successfully0
- Verification failed (manifest or files)1
Quiet Mode
Show only failures:
#!/bin/bash # Quiet verification - only show failures CHECKSUM_FILE="CHECKSUMS.sha256" # Quick manifest check (silent) EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}') ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}') if [ "$EXPECTED" != "$ACTUAL" ]; then echo "MANIFEST: FAILED" >&2 exit 1 fi # Verify files (quiet mode - only show failures) tail -n +4 "$CHECKSUM_FILE" | sha256sum -c --quiet EXIT_CODE=$? if [ $EXIT_CODE -eq 0 ]; then # Silent success exit 0 else # sha256sum already printed failures to stderr exit 1 fi
Use case: Cron jobs, automated monitoring, CI/CD pipelines.
Output: Nothing on success, only failed files on failure.
Regeneration After Changes
Script for regenerating manifest after archive modifications:
#!/bin/bash set -euo pipefail # Regenerate checksum manifest after archive changes # Usage: ./fix-checksums.sh /path/to/archive ARCHIVE_PATH="${1:-.}" CHECKSUM_FILE="CHECKSUMS.sha256" BACKUP_FILE="CHECKSUMS.sha256.bak" TEMP_FILE="/tmp/checksums-$$.tmp" cd "$ARCHIVE_PATH" # Backup existing manifest if [ -f "$CHECKSUM_FILE" ]; then cp "$CHECKSUM_FILE" "$BACKUP_FILE" echo "Backed up existing manifest to $BACKUP_FILE" fi # Generate new manifest echo "Regenerating checksums..." find . -type f ! -name "$CHECKSUM_FILE" ! -name "$BACKUP_FILE" -print0 | \ sort -z | \ xargs -0 sha256sum > "$TEMP_FILE" FILE_COUNT=$(wc -l < "$TEMP_FILE") MANIFEST_HASH=$(sha256sum "$TEMP_FILE" | awk '{print $1}') TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ) { echo "# MANIFEST_HASH: $MANIFEST_HASH" echo "# Generated: $TIMESTAMP" echo "# Verify with: tail -n +4 $CHECKSUM_FILE | sha256sum" cat "$TEMP_FILE" } > "$CHECKSUM_FILE" rm "$TEMP_FILE" # Detect changes if [ -f "$BACKUP_FILE" ]; then echo "" echo "Changes detected:" # Extract file paths from old and new manifests tail -n +4 "$BACKUP_FILE" | awk '{print $2}' | sort > /tmp/old-files-$$.txt tail -n +4 "$CHECKSUM_FILE" | awk '{print $2}' | sort > /tmp/new-files-$$.txt # Added files ADDED=$(comm -13 /tmp/old-files-$$.txt /tmp/new-files-$$.txt) if [ -n "$ADDED" ]; then echo " Added:" echo "$ADDED" | sed 's/^/ /' fi # Removed files REMOVED=$(comm -23 /tmp/old-files-$$.txt /tmp/new-files-$$.txt) if [ -n "$REMOVED" ]; then echo " Removed:" echo "$REMOVED" | sed 's/^/ /' fi # Modified files (different hash for same path) # This requires comparing hashes, not just paths COMMON_FILES=$(comm -12 /tmp/old-files-$$.txt /tmp/new-files-$$.txt) if [ -n "$COMMON_FILES" ]; then while IFS= read -r file; do OLD_HASH=$(grep -F "$file" "$BACKUP_FILE" | awk '{print $1}') NEW_HASH=$(grep -F "$file" "$CHECKSUM_FILE" | awk '{print $1}') if [ "$OLD_HASH" != "$NEW_HASH" ]; then echo " Modified: $file" fi done <<< "$COMMON_FILES" fi rm /tmp/old-files-$$.txt /tmp/new-files-$$.txt fi echo "" echo "✓ Generated new $CHECKSUM_FILE" echo " Manifest hash: $MANIFEST_HASH" echo " Files: $FILE_COUNT"
Features:
- Backs up existing manifest to
file.bak - Regenerates checksums for all current files
- Reports added, removed, and modified files
- Preserves backup for comparison
VERIFY.md Template
Human-readable instructions placed in archive root:
# Archive Integrity Verification This archive contains a self-verifying checksum manifest (`CHECKSUMS.sha256`) for detecting corruption, tampering, or transfer errors. ## Archive Information - **Generated**: {TIMESTAMP} - **Total files**: {FILE_COUNT} - **Total size**: {TOTAL_SIZE} - **Manifest hash**: {MANIFEST_HASH} ## Quick Verification (30 seconds) Verify the manifest has not been tampered with: \`\`\`bash EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}') ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}') [ "$EXPECTED" = "$ACTUAL" ] && echo "✓ Verified" || echo "✗ Tampered" \`\`\` ## Full Verification (10-60 minutes) Verify all files match their checksums: \`\`\`bash tail -n +4 CHECKSUMS.sha256 | sha256sum -c \`\`\` ## Recommended Schedule | Frequency | Verification Type | Purpose | |-----------|-------------------|---------| | Daily | Quick (manifest only) | Detect tampering | | Weekly | Full (all files) | Detect bit rot | | After transfer | Full | Verify transfer integrity | | Before backup | Full | Ensure source integrity | ## Automated Monitoring Add to crontab for daily verification: \`\`\`cron # Daily quick check at 2am 0 2 * * * cd /path/to/archive && tail -n +4 CHECKSUMS.sha256 | sha256sum -c --quiet || echo "Verification failed" | mail -s "Archive Alert" admin@example.com \`\`\` ## If Verification Fails ### Manifest Integrity Failure The manifest itself has been corrupted or tampered with. **Recovery**: 1. Restore `CHECKSUMS.sha256` from backup 2. If no backup, regenerate manifest (see below) ### File Verification Failure One or more files have been corrupted or modified. **Identify failures**: \`\`\`bash tail -n +4 CHECKSUMS.sha256 | sha256sum -c 2>&1 | grep FAILED \`\`\` **Recovery**: 1. Restore failed files from backup 2. If intentional modification, regenerate manifest ## Regenerate Manifest After making changes to the archive (add/remove/modify files): \`\`\`bash # Backup existing manifest cp CHECKSUMS.sha256 CHECKSUMS.sha256.bak # Regenerate find . -type f ! -name "CHECKSUMS.sha256" -print0 | \\ sort -z | \\ xargs -0 sha256sum > /tmp/checksums.tmp MANIFEST_HASH=$(sha256sum /tmp/checksums.tmp | awk '{print $1}') TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ) { echo "# MANIFEST_HASH: $MANIFEST_HASH" echo "# Generated: $TIMESTAMP" echo "# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum" cat /tmp/checksums.tmp } > CHECKSUMS.sha256 rm /tmp/checksums.tmp \`\`\` ## Technical Details - **Hash algorithm**: SHA-256 (NIST FIPS 180-4) - **Timestamp format**: ISO 8601 UTC with nanosecond precision - **Self-verification**: Manifest hash prevents undetected tampering - **Coverage**: All files except `CHECKSUMS.sha256` itself ## Support For questions or issues: - Documentation: https://aiwg.io/media-curator - Issues: https://github.com/jmagly/aiwg/issues - Command reference: `aiwg verify-archive --help` --- *Generated by AIWG Media Curator v{VERSION}*
Template variables:
- ISO 8601 generation timestamp{TIMESTAMP}
- Number of files in manifest{FILE_COUNT}
- Archive size (e.g., "4.2 GB"){TOTAL_SIZE}
- SHA-256 hash of manifest{MANIFEST_HASH}
- AIWG version{VERSION}
Scheduled Verification
Cron Pattern for Daily Checks
# Quick verification daily at 2am 0 2 * * * cd /media/archives/podcast && /usr/local/bin/aiwg verify-archive . --verify --quiet || echo "Archive verification failed: $(pwd)" | mail -s "Archive Alert" admin@example.com # Full verification weekly on Sunday at 3am 0 3 * * 0 cd /media/archives/podcast && /usr/local/bin/aiwg verify-archive . --verify 2>&1 | mail -s "Weekly Archive Verification" admin@example.com
Systemd Timer
Service file (
/etc/systemd/system/archive-verify.service):
[Unit] Description=Verify media archive checksums After=network.target [Service] Type=oneshot User=media WorkingDirectory=/media/archives/podcast ExecStart=/usr/local/bin/aiwg verify-archive . --verify --quiet StandardOutput=journal StandardError=journal
Timer file (
/etc/systemd/system/archive-verify.timer):
[Unit] Description=Daily archive verification [Timer] OnCalendar=daily Persistent=true [Install] WantedBy=timers.target
Enable timer:
sudo systemctl enable archive-verify.timer sudo systemctl start archive-verify.timer
Monitoring Integration
Prometheus exporter pattern:
#!/bin/bash # Export verification status as Prometheus metrics CHECKSUM_FILE="CHECKSUMS.sha256" METRICS_FILE="/var/lib/prometheus/node_exporter/archive_integrity.prom" # Quick verification EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}') ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}') if [ "$EXPECTED" = "$ACTUAL" ]; then MANIFEST_OK=1 else MANIFEST_OK=0 fi # Write metrics cat > "$METRICS_FILE" <<EOF # HELP archive_manifest_integrity Archive manifest integrity status (1=ok, 0=failed) # TYPE archive_manifest_integrity gauge archive_manifest_integrity{path="$PWD"} $MANIFEST_OK # HELP archive_manifest_check_timestamp Unix timestamp of last verification # TYPE archive_manifest_check_timestamp gauge archive_manifest_check_timestamp{path="$PWD"} $(date +%s) EOF
Grafana alert rule:
groups: - name: archive_integrity interval: 5m rules: - alert: ArchiveManifestCorrupted expr: archive_manifest_integrity == 0 for: 5m labels: severity: critical annotations: summary: "Archive manifest integrity check failed" description: "Archive at {{ $labels.path }} has corrupted or tampered manifest"
Standards Reference
| Standard | Specification | Purpose |
|---|---|---|
| SHA-256 | NIST FIPS 180-4 | Cryptographic hash function for file integrity |
| PREMIS 3.0 | Library of Congress | Preservation metadata for digital objects |
| W3C PROV-O | W3C Recommendation 2013 | Provenance ontology for tracking derivation |
| ISO 8601 | ISO 8601:2019 | Date and time format (UTC timestamps) |
| JSON-LD 1.1 | W3C Recommendation 2020 | Linked data format for provenance records |
SHA-256 (NIST FIPS 180-4)
Properties:
- 256-bit hash output (64 hexadecimal characters)
- Collision resistance: 2^128 operations
- Pre-image resistance: 2^256 operations
- Deterministic: same input always produces same output
Command:
sha256sum <file>
Format:
<64-char-hex> <path>
PREMIS 3.0 Fixity
Purpose: Record fixity information for digital preservation.
Key elements:
- Hash value with algorithm prefix (e.g.,messageDigest
)sha256:abc123...
- Algorithm name (messageDigestAlgorithm
)SHA-256
- Software that computed hashmessageDigestOriginator
- When fixity was verifiedfixityCheckDateTime
Use case: Long-term digital preservation requiring audit trails.
W3C PROV-O
Entity-Activity-Agent model:
- Entity: Archive or checksum manifest
- Activity: Checksum generation or verification
- Agent: Software (AIWG Media Curator) that performed activity
Key relationships:
- Entity generated by activitywasGeneratedBy
- Activity used entityused
- Activity performed by agentwasAssociatedWith
- Entity attributed to agentwasAttributedTo
ISO 8601 Timestamps
Format:
YYYY-MM-DDTHH:MM:SS.NNNNNNNNNZ
Requirements:
- Always UTC (trailing
)Z - Nanosecond precision (9 decimal places)
- Monotonic (lexicographically sortable)
Bash command:
date -u +%Y-%m-%dT%H:%M:%S.%NZ
Example:
2026-02-14T18:45:22.387654321Z
Performance Characteristics
| Operation | Time Complexity | Example Duration (100 GB archive) |
|---|---|---|
| Quick verify (manifest only) | O(1) | < 1 second |
| Full verify (all files) | O(n × file_size) | 10-60 minutes (disk-bound) |
| Generate manifest | O(n × file_size) | 10-60 minutes (disk-bound) |
Optimization tips:
- Use SSD storage for faster checksumming
- Enable parallel hashing for multi-core systems (GNU parallel)
- Quick verification detects 99% of tampering instantly
- Schedule full verification during off-peak hours
Error Scenarios
| Error | Cause | Recovery |
|---|---|---|
| Manifest integrity failure | Manifest file corrupted/tampered | Restore from backup or regenerate |
| File verification failure | File corrupted or modified | Restore file from backup |
| Missing manifest | New archive or manifest deleted | Generate new manifest |
| Permission denied | Cannot read files | Fix permissions, run as appropriate user |
| Disk full | Cannot write manifest | Free disk space |
| Hash mismatch | File changed since manifest generated | Regenerate manifest if intentional |
Integration Examples
Git Pre-Commit Hook
Verify archive integrity before committing:
#!/bin/bash # .git/hooks/pre-commit if [ -f CHECKSUMS.sha256 ]; then echo "Verifying archive integrity..." if ! aiwg verify-archive . --verify --quiet; then echo "Error: Archive verification failed" >&2 echo "Run 'aiwg verify-archive . --fix' to regenerate checksums" >&2 exit 1 fi echo "✓ Archive integrity verified" fi
CI/CD Pipeline
GitHub Actions workflow:
name: Verify Archive Integrity on: [push, pull_request] jobs: verify: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Install AIWG run: npm install -g aiwg - name: Verify archive run: aiwg verify-archive media/archives/podcast --verify
Backup Verification
Verify backup integrity after rsync:
#!/bin/bash # Backup and verify script SOURCE="/media/source/podcast" DEST="/media/backup/podcast" # Sync files rsync -av --delete "$SOURCE/" "$DEST/" # Verify destination cd "$DEST" if aiwg verify-archive . --verify --quiet; then echo "✓ Backup verified successfully" else echo "✗ Backup verification failed" >&2 exit 1 fi
References
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek explicit authorization before regenerating checksums (overwrites existing CHECKSUMS.sha256)
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/acquire/SKILL.md — Acquisition skill that invokes integrity verification via --verify-after flag
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Provenance tracking that records fixity events alongside checksums
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/verify-archive/SKILL.md — High-level verify-archive skill that uses these integrity patterns