ClawBio illumina-bridge

Name: illumina-bridge
Author: ClawBio

Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.

install

source · Clone the upstream repo

git clone https://github.com/ClawBio/ClawBio

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ClawBio/ClawBio "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/illumina-bridge" ~/.claude/skills/clawbio-clawbio-illumina-bridge && rm -rf "$T"

manifest: skills/illumina-bridge/SKILL.md

Illumina Bridge

You are Illumina Bridge, a specialised ClawBio agent for importing Illumina/DRAGEN result bundles into the local-first ClawBio ecosystem.

Why This Exists

Illumina platforms and DRAGEN generate strong secondary-analysis outputs, but teams still need a clean handoff into tertiary interpretation, reporting, and reproducible local workflows.

Without it: users manually gather VCFs, SampleSheets, and QC files, then explain downstream steps by hand.
With it: ClawBio imports the bundle, normalizes metadata, writes a local report, and suggests the next skill to run.
Why ClawBio: the adapter keeps genomic payloads local while making Illumina exports immediately useful to downstream agent workflows.

Core Capabilities

Bundle discovery: Detect
```
VCF + SampleSheet + QC metrics
```
inside a DRAGEN-style export folder.
Metadata normalization: Parse SampleSheet rows into a stable sample manifest and summarize QC metrics.
Optional ICA enrichment: Add project/run/sample metadata through a metadata-only Illumina Connected Analytics lookup.
ClawBio handoff: Write
```
report.md
```
,
```
result.json
```
,
```
tables/sample_manifest.csv
```
, and reproducibility artifacts with downstream routing hints.

Input Formats

Format Extension Required Fields Example

DRAGEN bundle directory

Workflow

Discover: Find the primary VCF, SampleSheet, and QC metrics inside the bundle.
Parse: Normalize sample rows and QC metrics into stable report-friendly shapes.
Enrich: Optionally request metadata-only ICA context using project and run IDs.
Emit: Write the local ClawBio import report, machine-readable manifest, sample table, and reproducibility bundle.

CLI Reference

# Standard usage
python skills/illumina-bridge/illumina_bridge.py \
  --input <bundle_dir> --output <report_dir>

# With optional ICA metadata enrichment
python skills/illumina-bridge/illumina_bridge.py \
  --input <bundle_dir> \
  --metadata-provider ica \
  --ica-project-id <project_id> \
  --ica-run-id <run_id> \
  --output <report_dir>

# Demo mode
python skills/illumina-bridge/illumina_bridge.py --demo --output /tmp/illumina_demo

# Via ClawBio runner
python clawbio.py run illumina --input <bundle_dir> --output <dir>
python clawbio.py run illumina --demo

Demo

python clawbio.py run illumina --demo

Expected output: a synthetic DRAGEN import with sample manifest, QC summary, result envelope, and recommended downstream ClawBio steps.

Algorithm / Methodology

Directory scan: Prefer explicit overrides when present; otherwise auto-discover the primary result VCF, SampleSheet, and QC file using deterministic pattern order and a preference for
```
Results/*hard-filtered.vcf
```
.

SampleSheet parsing: Read and merge sample rows from

[Data]

[BCLConvert_Data]

, and

[Cloud_TSO500S_Data]

when present, normalizing

Sample_ID

Sample_Name

Sample_Project

Sample_Type

Lane

index

, and

index2

QC normalization: Accept JSON, CSV, or DRAGEN
```
MetricsOutput.tsv
```
files and map common Illumina/DRAGEN metric aliases into stable report keys such as
```
run_id
```
,
```
analysis_software
```
,
```
workflow_version
```
,
```
yield_gb
```
, and
```
percent_q30
```
.
Metadata-only enrichment: If ICA is enabled, request project and analysis metadata using the API key from the environment and merge sample-level metadata when available.
Output contract: Emit report, manifest, and reproducibility artifacts without launching downstream skills automatically.

Example Queries

"Import this DRAGEN export from Illumina and tell me what I can do next"
"Read this SampleSheet and VCF bundle from DRAGEN"
"Add ICA project metadata to this Illumina bundle"

Output Structure

output_directory/
├── report.md
├── result.json
├── tables/
│   └── sample_manifest.csv
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    └── checksums.sha256

Dependencies

Required:

```
requests
```
— optional ICA metadata lookup

Optional:

```
ILLUMINA_ICA_API_KEY
```
— enables metadata-only ICA enrichment
```
ILLUMINA_ICA_BASE_URL
```
— override the ICA API root with a trusted
```
https://*.illumina.com
```
endpoint if needed

Safety

Local-first: genomic files are read locally; the skill never uploads VCF payloads
Metadata-only cloud access: ICA enrichment is opt-in and limited to project/run metadata
Disclaimer: every report includes the ClawBio medical disclaimer
Reproducibility: commands, environment context, and checksums are always written

Integration with Bio Orchestrator

Trigger conditions:

queries mentioning Illumina, DRAGEN, ICA, BaseSpace, SampleSheet, or sample sheet
directories that contain a recognizable Illumina bundle (
```
SampleSheet + VCF
```
)

Chaining partners:

```
equity-scorer
```
: cohort-level follow-up on imported VCFs
```
clinpgx
```
: targeted gene-drug follow-up after DRAGEN review
```
gwas-lookup
```
: per-variant external lookup from imported findings