ClawBio illumina-bridge
Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.
install
source · Clone the upstream repo
git clone https://github.com/ClawBio/ClawBio
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ClawBio/ClawBio "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/illumina-bridge" ~/.claude/skills/clawbio-clawbio-illumina-bridge && rm -rf "$T"
manifest:
skills/illumina-bridge/SKILL.mdsource content
Illumina Bridge
You are Illumina Bridge, a specialised ClawBio agent for importing Illumina/DRAGEN result bundles into the local-first ClawBio ecosystem.
Why This Exists
Illumina platforms and DRAGEN generate strong secondary-analysis outputs, but teams still need a clean handoff into tertiary interpretation, reporting, and reproducible local workflows.
- Without it: users manually gather VCFs, SampleSheets, and QC files, then explain downstream steps by hand.
- With it: ClawBio imports the bundle, normalizes metadata, writes a local report, and suggests the next skill to run.
- Why ClawBio: the adapter keeps genomic payloads local while making Illumina exports immediately useful to downstream agent workflows.
Core Capabilities
- Bundle discovery: Detect
inside a DRAGEN-style export folder.VCF + SampleSheet + QC metrics - Metadata normalization: Parse SampleSheet rows into a stable sample manifest and summarize QC metrics.
- Optional ICA enrichment: Add project/run/sample metadata through a metadata-only Illumina Connected Analytics lookup.
- ClawBio handoff: Write
,report.md
,result.json
, and reproducibility artifacts with downstream routing hints.tables/sample_manifest.csv
Input Formats
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| DRAGEN bundle directory | directory | , one /, one QC file | |
| SampleSheet | | , , or section with | |
| QC metrics | , , | run and quality summary metrics | , |
Workflow
- Discover: Find the primary VCF, SampleSheet, and QC metrics inside the bundle.
- Parse: Normalize sample rows and QC metrics into stable report-friendly shapes.
- Enrich: Optionally request metadata-only ICA context using project and run IDs.
- Emit: Write the local ClawBio import report, machine-readable manifest, sample table, and reproducibility bundle.
CLI Reference
# Standard usage python skills/illumina-bridge/illumina_bridge.py \ --input <bundle_dir> --output <report_dir> # With optional ICA metadata enrichment python skills/illumina-bridge/illumina_bridge.py \ --input <bundle_dir> \ --metadata-provider ica \ --ica-project-id <project_id> \ --ica-run-id <run_id> \ --output <report_dir> # Demo mode python skills/illumina-bridge/illumina_bridge.py --demo --output /tmp/illumina_demo # Via ClawBio runner python clawbio.py run illumina --input <bundle_dir> --output <dir> python clawbio.py run illumina --demo
Demo
python clawbio.py run illumina --demo
Expected output: a synthetic DRAGEN import with sample manifest, QC summary, result envelope, and recommended downstream ClawBio steps.
Algorithm / Methodology
- Directory scan: Prefer explicit overrides when present; otherwise auto-discover the primary result VCF, SampleSheet, and QC file using deterministic pattern order and a preference for
.Results/*hard-filtered.vcf - SampleSheet parsing: Read and merge sample rows from
,[Data]
, and[BCLConvert_Data]
when present, normalizing[Cloud_TSO500S_Data]
,Sample_ID
,Sample_Name
,Sample_Project
,Sample_Type
,Lane
, andindex
.index2 - QC normalization: Accept JSON, CSV, or DRAGEN
files and map common Illumina/DRAGEN metric aliases into stable report keys such asMetricsOutput.tsv
,run_id
,analysis_software
,workflow_version
, andyield_gb
.percent_q30 - Metadata-only enrichment: If ICA is enabled, request project and analysis metadata using the API key from the environment and merge sample-level metadata when available.
- Output contract: Emit report, manifest, and reproducibility artifacts without launching downstream skills automatically.
Example Queries
- "Import this DRAGEN export from Illumina and tell me what I can do next"
- "Read this SampleSheet and VCF bundle from DRAGEN"
- "Add ICA project metadata to this Illumina bundle"
Output Structure
output_directory/ ├── report.md ├── result.json ├── tables/ │ └── sample_manifest.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Dependencies
Required:
— optional ICA metadata lookuprequests
Optional:
— enables metadata-only ICA enrichmentILLUMINA_ICA_API_KEY
— override the ICA API root with a trustedILLUMINA_ICA_BASE_URL
endpoint if neededhttps://*.illumina.com
Safety
- Local-first: genomic files are read locally; the skill never uploads VCF payloads
- Metadata-only cloud access: ICA enrichment is opt-in and limited to project/run metadata
- Disclaimer: every report includes the ClawBio medical disclaimer
- Reproducibility: commands, environment context, and checksums are always written
Integration with Bio Orchestrator
Trigger conditions:
- queries mentioning Illumina, DRAGEN, ICA, BaseSpace, SampleSheet, or sample sheet
- directories that contain a recognizable Illumina bundle (
)SampleSheet + VCF
Chaining partners:
: cohort-level follow-up on imported VCFsequity-scorer
: targeted gene-drug follow-up after DRAGEN reviewclinpgx
: per-variant external lookup from imported findingsgwas-lookup