SciAgent-Skills multiqc-qc-reports
Aggregates QC outputs from 150+ bioinformatics tools into a single interactive HTML report. Scans directories for FastQC, samtools, STAR, HISAT2, Trim Galore, featureCounts, Kallisto, Salmon, Picard, and GATK logs; merges statistics across samples with interactive plots. Essential for NGS pipeline QC review. Use FastQC directly instead for single-sample initial assessment; MultiQC is for multi-sample pipeline-wide reporting.
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/multiqc-qc-reports" ~/.claude/skills/jaechang-hits-sciagent-skills-multiqc-qc-reports && rm -rf "$T"
skills/genomics-bioinformatics/multiqc-qc-reports/SKILL.mdMultiQC — Multi-Sample QC Report Aggregator
Overview
MultiQC automatically searches directories for QC log files from 150+ bioinformatics tools and aggregates statistics across all samples into a single interactive HTML report. It parses outputs from FastQC, samtools flagstat, STAR, HISAT2, Trim Galore, Salmon, Kallisto, featureCounts, Picard, GATK, and many more — eliminating the need to manually review per-sample QC files. Reports include interactive bar plots, scatter plots, heatmaps, and tables with configurable warnings and pass/fail thresholds.
When to Use
- Reviewing QC metrics across 10+ samples at once after FastQC, alignment, or quantification
- Final QC checkpoint before differential expression or variant analysis
- Sharing QC summaries with collaborators or including in publications
- Identifying batch effects, outlier samples, or failed sequencing runs
- Combining QC from multi-step pipelines (trimming → alignment → quantification) into one view
- Use FastQC directly instead for initial single-sample QC exploration
- For custom QC metrics not from standard tools, use Python/R directly; MultiQC parses tool outputs only
Prerequisites
- Python packages:
multiqc - Input requirements: Output files from bioinformatics tools (FastQC
, samtools.zip
, STAR.flagstat
, etc.) — MultiQC finds them automaticallyLog.final.out - Environment: Python 3.8+
Check before installing: The tool may already be available in the current environment (e.g., inside a
/pixienv). Runcondafirst and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool viacommand -v multiqcrather than barepixi run multiqc.multiqc
pip install multiqc # Verify multiqc --version # MultiQC v1.25.0 # With conda (recommended for bioinformatics) conda install -c bioconda multiqc
Workflow
Step 1: Generate Tool-Specific QC Files
MultiQC aggregates existing output — first run your QC tools.
# FastQC on all FASTQ files mkdir -p qc/fastqc fastqc data/*.fastq.gz -o qc/fastqc/ -t 8 # samtools flagstat on all BAM files for bam in results/*.bam; do samtools flagstat $bam > qc/$(basename $bam .bam).flagstat done echo "QC files generated: $(ls qc/ | wc -l)"
Step 2: Run MultiQC on a Directory
MultiQC recursively scans for recognized QC files.
# Basic run: scan current directory recursively multiqc . # Specify output directory and report name multiqc . -o reports/ -n project_qc_report # Scan specific subdirectories only multiqc qc/fastqc/ results/star/ logs/trimming/ -o reports/ # Output: reports/project_qc_report.html echo "Report: reports/project_qc_report.html"
Step 3: Configure Report Behavior
Use
multiqc_config.yaml to set custom thresholds, sample naming, and module order.
# multiqc_config.yaml — place in working directory title: "RNA-seq QC Report — Project X" subtitle: "Analysis date: 2026-02" intro_text: "Quality control summary for all 48 samples." # Sample name cleaning: remove path prefixes and suffixes fn_clean_exts: - ".fastq.gz" - "_R1" - ".sorted" # Thresholds for pass/warn/fail coloring general_stats_addcols: FastQC: pct_duplication: max: 40 warn: 30 # Module run order module_order: - fastqc - trimgalore - star - featurecounts - samtools
# Run with config file multiqc . --config multiqc_config.yaml -o reports/
Step 4: Use MultiQC Modules and Filters
Control which tools and samples are included.
# Run only specific modules multiqc . --module fastqc --module samtools # Exclude specific modules multiqc . --exclude fastqc # Include only files matching a pattern multiqc . --filename "*.flagstat" --filename "*_fastqc.zip" # Ignore specific directories or files multiqc . --ignore "tmp/" --ignore "*.bam" # Add sample name regex substitution multiqc . --replace-names "sample_" ""
Step 5: Export Data for Downstream Analysis
Extract machine-readable statistics from the MultiQC report.
# Export data tables (CSV, JSON, YAML, TSV) multiqc . -o reports/ --data-format json # Generates: reports/multiqc_data/multiqc_data.json # Export flat CSV tables per tool multiqc . -o reports/ --export ls reports/multiqc_data/ # multiqc_fastqc.txt, multiqc_samtools_stats.txt, ... # Extract general stats as pandas DataFrame python3 - << 'EOF' import json import pandas as pd with open("reports/multiqc_data/multiqc_general_stats.json") as f: data = json.load(f) df = pd.DataFrame(data).T print(df.head()) print(f"Shape: {df.shape}") EOF
Step 6: Automate in Pipeline Scripts
Integrate MultiQC as the final step of any QC pipeline.
#!/bin/bash # Complete RNA-seq QC pipeline → MultiQC summary SAMPLES=(ctrl_rep1 ctrl_rep2 treat_rep1 treat_rep2) OUTDIR="pipeline_output" mkdir -p $OUTDIR/{fastqc,star,featurecounts,flagstat} for sample in "${SAMPLES[@]}"; do # FastQC fastqc data/${sample}.fastq.gz -o $OUTDIR/fastqc/ -t 4 # STAR alignment STAR --runThreadN 8 --genomeDir refs/star_index \ --readFilesIn data/${sample}.fastq.gz \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix $OUTDIR/star/${sample}/ # samtools flagstat samtools flagstat $OUTDIR/star/${sample}/Aligned.sortedByCoord.out.bam \ > $OUTDIR/flagstat/${sample}.flagstat done # Final MultiQC report multiqc $OUTDIR/ -o $OUTDIR/qc_report/ -n "full_pipeline_qc" echo "Report ready: $OUTDIR/qc_report/full_pipeline_qc.html"
Key Parameters
| Parameter | Default | Range/Options | Effect |
|---|---|---|---|
| | directory path | Output directory for report and data |
| | any string | Report filename (without extension) |
| all | tool name | Run only specified module(s) |
| — | glob pattern | Ignore matching files or directories |
| False | flag | Export flat tab-delimited data files |
| | , , | Format for exported data files |
| auto-detected | YAML file path | Custom config file with thresholds and naming |
| — | regex, replacement | Clean sample names in report |
| (built-in) | list in config | File extensions to strip from sample names |
| False | flag | Show per-module runtime profiling |
Common Recipes
Recipe: Add MultiQC to a Snakemake Pipeline
# In Snakefile: collect all QC outputs, then run MultiQC rule multiqc: input: expand("qc/fastqc/{sample}_fastqc.zip", sample=SAMPLES), expand("qc/flagstat/{sample}.flagstat", sample=SAMPLES) output: html="reports/multiqc_report.html", data=directory("reports/multiqc_data") shell: "multiqc qc/ -o reports/ -n multiqc_report"
Recipe: Parse MultiQC Output in Python
import json import pandas as pd # Load general stats from JSON export with open("reports/multiqc_data/multiqc_general_stats.json") as f: stats = json.load(f) df = pd.DataFrame(stats).T print(f"Samples: {len(df)}") print(f"Metrics: {list(df.columns[:5])}") # Flag samples with low mapping rate if "STAR_mqc-generalstats-star-uniquely_mapped_percent" in df.columns: low_mapping = df[df["STAR_mqc-generalstats-star-uniquely_mapped_percent"] < 70] print(f"Samples with <70% mapping: {list(low_mapping.index)}")
Recipe: Compare QC Before and After Trimming
# Run FastQC on raw and trimmed reads, then combine in one report mkdir -p qc/{raw,trimmed} fastqc data/*.fastq.gz -o qc/raw/ -t 8 trim_galore data/*.fastq.gz --paired -o trimmed/ fastqc trimmed/*_trimmed.fastq.gz -o qc/trimmed/ -t 8 multiqc qc/raw/ qc/trimmed/ \ -o reports/ -n raw_vs_trimmed \ --dirs --dirs-depth 1 # use directory names in sample labels
Expected Outputs
| Output | Format | Description |
|---|---|---|
| HTML | Interactive report with all plots and tables |
| TSV | Per-sample summary statistics (all tools) |
| TSV | Per-tool detailed statistics tables |
| JSON | Full data (if ) |
| TSV | Mapping of source files to samples |
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Empty report (no modules found) | QC files not in scanned directories | Specify directories explicitly: |
| Wrong sample names in report | File extensions or paths not cleaned | Add to config or use |
| Module missing from report | Log file format changed in tool version | Update MultiQC: ; check GitHub issues |
| Duplicate sample names | Multiple files map to same sample name | Use or fix in config |
| Report very slow to open | Too many samples (>500) in one report | Split by project or condition; use for simpler rendering |
| FastQC data not parsed | FastQC ZIP not in expected location | Run MultiQC from root of project; ensure files exist |
| Missing optional module dependencies | for all extras |
References
- MultiQC documentation — official user guide and module list
- GitHub: ewels/MultiQC — source code and community modules
- Ewels et al. (2016) "MultiQC: summarize analysis results for multiple tools and samples in a single report" — Bioinformatics 32(19)
- MultiQC supported tools — full list of 150+ supported modules