Aiwg research-archive

Package research artifacts for long-term archival

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/research-complete/skills/research-archive" ~/.claude/skills/jmagly-aiwg-research-archive-67236d && rm -rf "$T"
manifest: agentic/code/frameworks/research-complete/skills/research-archive/SKILL.md
source content

Research Archive Command

Package research artifacts for long-term preservation following archival best practices.

Instructions

When invoked, create archival packages:

  1. Identify Artifacts

    • If REF-XXX specified, collect all artifacts for that paper
    • If --all, collect entire research corpus
    • Gather: PDF, finding document, metadata, provenance records, quality assessments
  2. Validate Integrity

    • Verify all checksums against fixity manifest
    • Confirm no file corruption
    • Check metadata completeness
    • Ensure provenance chains are complete
  3. Create Archive Package

    • Package format options:
      • BagIt (default) - Library of Congress standard for digital preservation
      • ZIP - Universal compressed format
      • TAR.GZ - POSIX archival format
    • Include manifest with checksums
    • Add archive metadata (creation date, creator, package contents)
  4. Generate Archival Metadata

    • Dublin Core metadata record
    • PREMIS preservation metadata
    • Research corpus inventory
    • Provenance summary
  5. Verify Package

    • Validate package structure
    • Verify all files present
    • Check checksums
    • Test package extraction
  6. Store and Register

    • Save to
      .aiwg/research/archives/
    • Register in archival index
    • Generate archival report
    • Create retrieval instructions

Arguments

  • [ref-id or --all]
    - Specific paper or entire corpus (required)
  • --format [bagit|zip|tar]
    - Archive format (default: bagit)
  • --output [path]
    - Custom output location (default: .aiwg/research/archives/)
  • --verify
    - Perform integrity verification after creation
  • --compression [none|gzip|bzip2]
    - Compression level (default: gzip)
  • --include-notes
    - Include literature notes in package
  • --metadata-only
    - Create metadata package without PDFs

BagIt Format (Default)

BagIt is the Library of Congress standard for digital preservation:

REF-022-archive/
├── bagit.txt                    # BagIt declaration
├── bag-info.txt                 # Package metadata
├── manifest-sha256.txt          # Checksums for data files
├── tagmanifest-sha256.txt       # Checksums for tag files
└── data/
    ├── REF-022.pdf              # Source paper
    ├── REF-022-autogen.md       # Finding document
    ├── metadata.yaml            # Extracted metadata
    ├── provenance.yaml          # Provenance records
    └── quality-assessment.yaml  # GRADE assessment

Examples

# Archive single paper in BagIt format
/research-archive REF-022

# Archive with verification
/research-archive REF-022 --verify

# Archive entire corpus
/research-archive --all --format bagit

# Archive with custom output
/research-archive REF-022 --output /backup/research-archives/

# Create metadata-only archive for sharing
/research-archive REF-022 --metadata-only --format zip

# Archive multiple papers
/research-archive REF-001 REF-013 REF-022 --format tar

Expected Output

Creating Archive: REF-022
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Collecting artifacts
  ✓ Source PDF: .aiwg/research/sources/REF-022.pdf (2.4 MB)
  ✓ Finding: .aiwg/research/findings/REF-022-autogen.md (12 KB)
  ✓ Metadata: Extracted from frontmatter
  ✓ Provenance: .aiwg/research/provenance/records/REF-022-acquisition.yaml
  ✓ Quality: .aiwg/research/quality-assessments/REF-022-assessment.yaml
  ✓ Literature notes: .aiwg/research/literature-notes/REF-022-notes.md

Step 2: Validating integrity
  ✓ PDF checksum verified: a1b2c3d4e5f6...
  ✓ All files present and intact
  ✓ Metadata complete
  ✓ Provenance chain validated

Step 3: Creating BagIt package
  ✓ BagIt structure created
  ✓ Files copied to data/ directory
  ✓ SHA-256 checksums generated
  ✓ bag-info.txt created with metadata
  ✓ Package size: 2.5 MB

Step 4: Generating archival metadata
  ✓ Dublin Core record created
  ✓ PREMIS preservation metadata added
  ✓ Inventory generated: 6 files

Step 5: Verifying package
  ✓ BagIt validation passed
  ✓ All checksums verified
  ✓ Package structure correct
  ✓ Test extraction successful

Step 6: Registering archive
  ✓ Saved to: .aiwg/research/archives/REF-022-archive-20260203.bag
  ✓ Registered in archival index
  ✓ Archival report generated

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Archive created successfully!

Package: .aiwg/research/archives/REF-022-archive-20260203.bag
Format: BagIt (Library of Congress standard)
Size: 2.5 MB
Files: 6

Contents:
  - REF-022.pdf (source paper)
  - REF-022-autogen.md (finding document)
  - metadata.yaml (frontmatter + enrichment)
  - provenance.yaml (acquisition + documentation history)
  - quality-assessment.yaml (GRADE assessment)
  - literature-notes.md (synthesis notes)

Verification: PASSED

Retrieval Instructions:
  1. Extract: bagit.py --validate REF-022-archive-20260203.bag
  2. Restore to corpus: /research-restore REF-022-archive-20260203.bag

Archive Report: .aiwg/research/archives/REF-022-archive-20260203-report.md

Archival Metadata

Each archive includes comprehensive metadata:

# bag-info.txt (BagIt metadata)
Source-Organization: AIWG Research Corpus
Organization-Address: https://github.com/jmagly/aiwg
Contact-Name: AIWG Archival Agent
Contact-Email: research@aiwg.io
External-Description: Research paper archive for REF-022 (AutoGen)
Bagging-Date: 2026-02-03
Bag-Size: 2.5 MB
Payload-Oxum: 2621440.6
External-Identifier: REF-022
Internal-Sender-Identifier: ai-writing-guide/research-corpus
Internal-Sender-Description: AIWG Research Framework

# Dublin Core metadata
dc:title: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
dc:creator: Wu, Qingyun; Bansal, Gagan; Zhang, Jieyu; et al.
dc:date: 2023
dc:identifier: 10.48550/arXiv.2308.08155
dc:type: Conference Paper
dc:format: application/pdf
dc:language: en

# PREMIS preservation metadata
premis:objectIdentifier: REF-022
premis:originalName: REF-022.pdf
premis:fixity: sha256:a1b2c3d4e5f6...
premis:dateCreatedByApplication: 2026-02-03T12:00:00Z
premis:preservationLevel: bit-level

Archival Index

All archives are tracked in

.aiwg/research/archives/archive-index.yaml
:

archives:
  - archive_id: REF-022-archive-20260203
    ref_id: REF-022
    created_at: "2026-02-03T14:30:00Z"
    format: bagit
    size_bytes: 2621440
    file_count: 6
    checksum: "sha256:xyz789..."
    location: ".aiwg/research/archives/REF-022-archive-20260203.bag"
    verified: true
    last_verified: "2026-02-03T14:30:15Z"

Bulk Archival

Archive entire corpus:

/research-archive --all

Output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Archiving Entire Research Corpus
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Found 47 papers in corpus

Progress: [████████████████████] 47/47 (100%)

Summary:
  ✓ 47 papers archived
  ✓ Total size: 142.8 MB
  ✓ All packages verified
  ✓ Archival index updated

Archive Bundle: .aiwg/research/archives/corpus-archive-20260203.tar.gz
Manifest: .aiwg/research/archives/corpus-manifest-20260203.yaml

Individual Archives:
  REF-001-archive-20260203.bag
  REF-002-archive-20260203.bag
  ...
  REF-047-archive-20260203.bag

Restoration

Archived packages can be restored using

/research-restore
:

/research-restore REF-022-archive-20260203.bag

Long-Term Preservation Compliance

Archives follow best practices from:

  • Library of Congress BagIt specification
  • OAIS (Open Archival Information System) reference model
  • Dublin Core metadata standard
  • PREMIS preservation metadata standard

Validation

All archives undergo validation:

  • BagIt specification compliance
  • All files listed in manifest present
  • All checksums match manifest
  • Metadata is complete and valid
  • Package can be extracted successfully
  • Contents can be restored to working corpus

References

  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/archival-agent.md - Archival Agent
  • @$AIWG_ROOT/src/research/services/archival-service.ts - Archival implementation
  • @.aiwg/research/fixity-manifest.json - Checksum tracking
  • @.aiwg/research/archives/README.md - Archival procedures
  • https://tools.ietf.org/html/rfc8493 - BagIt specification