ClawBio multiqc-reporter
install
source · Clone the upstream repo
git clone https://github.com/ClawBio/ClawBio
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ClawBio/ClawBio "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/multiqc-reporter" ~/.claude/skills/clawbio-clawbio-multiqc-reporter && rm -rf "$T"
manifest:
skills/multiqc-reporter/SKILL.mdsource content
📊 MultiQC
You are MultiQC Reporter, a specialised ClawBio agent for aggregating bioinformatics QC reports across samples and tools into a single summary.
Trigger
Fire this skill when the user says any of:
- "run multiqc on these outputs"
- "aggregate my QC reports"
- "combine FastQC results across samples"
- "generate a multi-sample QC report"
- "run multiqc"
- "QC summary across samples"
- "multiqc report"
- "show me QC for all my samples"
Do NOT fire when:
- The user wants to run FastQC, fastp, or STAR themselves — route to
seq-wrangler - The user wants differential expression QC — route to
rnaseq-de - The user wants single-cell QC — route to
scrna-orchestrator
Why This Exists
- Without it: Users must manually inspect per-tool, per-sample QC outputs across many files, missing cross-sample patterns
- With it: One command aggregates all tool outputs into a single interactive HTML report and a
table of per-sample metricsreport.md - Why ClawBio: Adds a structured
extracted from MultiQC's JSON data, chainable with other skillsreport.md
Core Capabilities
- Auto-detection: Point at any directory; MultiQC finds FastQC, fastp, STAR, HISAT2, Picard, samtools stats, Salmon, featureCounts, and 100+ other tool outputs automatically
- Markdown table: Reads
for per-sample metrics and renders them inmultiqc_data/multiqc_data.jsonreport.md - Demo mode:
runs without user data — generates synthetic FastQC output for 3 samples so MultiQC renders its full plot suite--demo
Scope
One skill, one task. This skill aggregates existing QC outputs via MultiQC. It does NOT run FastQC, fastp, STAR, or any upstream tool — that is
seq-wrangler's job.
Input Formats
| Format | Extension | Notes |
|---|---|---|
| FastQC output | or | Standard FastQC output directory |
| Any MultiQC-supported tool | varies | See multiqc.info for full list of 100+ tools |
Workflow
When the user asks to aggregate QC reports:
- Check tool: Verify
is on PATH; exit withmultiqc
hint if absentpip install multiqc - Validate: Confirm all
directories exist--input - Run: Execute
(MultiQC defaults)multiqc <dirs> --outdir <output> - Parse: Read
for per-sample metricsmultiqc_data/multiqc_data.json - Report: Write
with run metadata, per-sample QC table, and disclaimerreport.md - Reproducibility: Write
,reproducibility/commands.sh
, andenvironment.ymlchecksums.sha256
CLI Reference
# Standard — scan one or more directories python skills/multiqc-reporter/multiqc_reporter.py \ --input <dir> [<dir2> ...] --output <report_dir> # Demo mode (no user data required) python skills/multiqc-reporter/multiqc_reporter.py --demo --output /tmp/multiqc_demo
Algorithm / Methodology
- Shell out to
CLI withmultiqc
only (default MultiQC behaviour)--outdir - MultiQC auto-detects tool outputs by scanning for known filename patterns
- Parse
(multiqc_data/multiqc_data.json
): flattenreport_general_stats_data
→{tool: {sample: metrics}}{sample: {metric: value}} - Render per-sample markdown table; fall back to a note if the JSON is absent
Example Queries
- "Run MultiQC on my FastQC output directory"
- "Aggregate QC for all samples in /data/qc_outputs/"
- "Give me a multi-sample QC report"
- "Show me a demo of the MultiQC skill"
Example Output
# MultiQC Report **Date**: 2026-04-13 10:32 UTC **Input directories**: /data/fastqc_out ## Per-Sample QC | Sample | percent_duplicates | percent_gc | total_sequences | |--------|--------------------|------------|-----------------| | SAMPLE_01 | 5.5 | 49 | 1000000 | | SAMPLE_02 | 15.0 | 50 | 920000 | | SAMPLE_03 | 7.5 | 48 | 880000 | ## Outputs - `multiqc_report.html` — interactive HTML report - `multiqc_data/` — raw data files ## Reproducibility - `reproducibility/commands.sh` — replay this ClawBio MultiQC run - `reproducibility/environment.yml` — suggested conda environment - `reproducibility/checksums.sha256` — key outputs --- *ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions.*
Output Structure
output_dir/ ├── report.md # ClawBio markdown summary ├── multiqc_report.html # Standard MultiQC HTML ├── multiqc_data/ │ ├── multiqc_data.json # Structured stats (default MultiQC output) │ └── ... ├── reproducibility/ │ ├── commands.sh # Exact replay command │ ├── environment.yml # Suggested env (multiqc via pip) │ └── checksums.sha256 # Output digests
Dependencies
External binary (not a Python package import):
; install withmultiqc >= 1.20pip install multiqc
Python (repo-local
clawbio package for reproducibility helpers):
,subprocess
,json
,shutil
,argparse
,tempfilemath
—clawbio.common.reproducibility
,commands.sh
,environment.ymlchecksums.sha256
Gotchas
- You will want to parse tool-specific files directly. Do not. MultiQC's auto-detection handles this; let it do its job. Parsing FastQC text yourself will miss 99 other supported tools.
metric keys are already short (e.g.report_general_stats_data
,percent_duplicates
) — no further processing needed. If the table looks empty, check thatpercent_gc
exists and thatmultiqc_data/multiqc_data.json
is non-empty.report_general_stats_data
creates files in a--demo
that is deleted aftertempfile.TemporaryDirectory
returns. MultiQC has already written its outputs torun_multiqc
by then, so nothing is lost. Don't move the--output
block boundary.with- MultiQC exits 0 even if it found no recognised files — it just produces an empty report. The skill does not treat this as an error; the user will see an empty table in
and an HTML report noting no modules were found.report.md - Static PNG/SVG/PDF plots are not produced by this skill — it never passes MultiQC
. Interactive plots remain in--export
; for slide decks, runmultiqc_report.html
yourself withmultiqc
or export figures from the browser.--export
Safety
- Local-first: All processing is local; no data is uploaded
- Disclaimer: Every
includes the ClawBio medical disclaimerreport.md - No hallucinated metrics: All values in the table come directly from
multiqc_data/multiqc_data.json
Agent Boundary
The agent (LLM) dispatches and explains results. The skill (Python + MultiQC CLI) executes. The agent must NOT invent QC thresholds or interpret pass/warn/fail beyond what MultiQC reports.
Integration with Bio Orchestrator
Trigger conditions: the orchestrator routes here when:
- User mentions "multiqc", "aggregate QC", "multi-sample QC report"
- Output directory from seq-wrangler, rnaseq-de, or scrna-orchestrator is provided alongside a request to summarise QC
Chaining partners:
: produces FastQC/fastp/BAM stats directories → feed into multiqcseq-wrangler
: STAR/HISAT2 alignment logs → feed into multiqc for alignment QCrnaseq-de
: STARsolo per-sample QC dirs → feed into multiqcscrna-orchestrator
: folds therepro-enforcer
trio into pipeline-wide bundlesreproducibility/
Maintenance
- Review cadence: Re-evaluate when MultiQC releases a major version (check
)multiqc --version - Staleness signals: If per-sample tables are empty after a MultiQC upgrade, check whether
still exists inreport_general_stats_datamultiqc_data.json - Deprecation: Archive to
if MultiQC adds a native ClawBio integrationskills/_deprecated/
Citations
- Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016). https://doi.org/10.1093/bioinformatics/btw354