SciAgent-Skills multiqc-qc-reports

Aggregates QC outputs from 150+ bioinformatics tools into a single interactive HTML report. Scans directories for FastQC, samtools, STAR, HISAT2, Trim Galore, featureCounts, Kallisto, Salmon, Picard, and GATK logs; merges statistics across samples with interactive plots. Essential for NGS pipeline QC review. Use FastQC directly instead for single-sample initial assessment; MultiQC is for multi-sample pipeline-wide reporting.

install
source · Clone the upstream repo
git clone https://github.com/jaechang-hits/SciAgent-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/multiqc-qc-reports" ~/.claude/skills/jaechang-hits-sciagent-skills-multiqc-qc-reports && rm -rf "$T"
manifest: skills/genomics-bioinformatics/multiqc-qc-reports/SKILL.md
source content

MultiQC — Multi-Sample QC Report Aggregator

Overview

MultiQC automatically searches directories for QC log files from 150+ bioinformatics tools and aggregates statistics across all samples into a single interactive HTML report. It parses outputs from FastQC, samtools flagstat, STAR, HISAT2, Trim Galore, Salmon, Kallisto, featureCounts, Picard, GATK, and many more — eliminating the need to manually review per-sample QC files. Reports include interactive bar plots, scatter plots, heatmaps, and tables with configurable warnings and pass/fail thresholds.

When to Use

  • Reviewing QC metrics across 10+ samples at once after FastQC, alignment, or quantification
  • Final QC checkpoint before differential expression or variant analysis
  • Sharing QC summaries with collaborators or including in publications
  • Identifying batch effects, outlier samples, or failed sequencing runs
  • Combining QC from multi-step pipelines (trimming → alignment → quantification) into one view
  • Use FastQC directly instead for initial single-sample QC exploration
  • For custom QC metrics not from standard tools, use Python/R directly; MultiQC parses tool outputs only

Prerequisites

  • Python packages:
    multiqc
  • Input requirements: Output files from bioinformatics tools (FastQC
    .zip
    , samtools
    .flagstat
    , STAR
    Log.final.out
    , etc.) — MultiQC finds them automatically
  • Environment: Python 3.8+

Check before installing: The tool may already be available in the current environment (e.g., inside a

pixi
/
conda
env). Run
command -v multiqc
first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via
pixi run multiqc
rather than bare
multiqc
.

pip install multiqc

# Verify
multiqc --version
# MultiQC v1.25.0

# With conda (recommended for bioinformatics)
conda install -c bioconda multiqc

Workflow

Step 1: Generate Tool-Specific QC Files

MultiQC aggregates existing output — first run your QC tools.

# FastQC on all FASTQ files
mkdir -p qc/fastqc
fastqc data/*.fastq.gz -o qc/fastqc/ -t 8

# samtools flagstat on all BAM files
for bam in results/*.bam; do
    samtools flagstat $bam > qc/$(basename $bam .bam).flagstat
done
echo "QC files generated: $(ls qc/ | wc -l)"

Step 2: Run MultiQC on a Directory

MultiQC recursively scans for recognized QC files.

# Basic run: scan current directory recursively
multiqc .

# Specify output directory and report name
multiqc . -o reports/ -n project_qc_report

# Scan specific subdirectories only
multiqc qc/fastqc/ results/star/ logs/trimming/ -o reports/

# Output: reports/project_qc_report.html
echo "Report: reports/project_qc_report.html"

Step 3: Configure Report Behavior

Use

multiqc_config.yaml
to set custom thresholds, sample naming, and module order.

# multiqc_config.yaml — place in working directory
title: "RNA-seq QC Report — Project X"
subtitle: "Analysis date: 2026-02"
intro_text: "Quality control summary for all 48 samples."

# Sample name cleaning: remove path prefixes and suffixes
fn_clean_exts:
  - ".fastq.gz"
  - "_R1"
  - ".sorted"

# Thresholds for pass/warn/fail coloring
general_stats_addcols:
  FastQC:
    pct_duplication:
      max: 40
      warn: 30

# Module run order
module_order:
  - fastqc
  - trimgalore
  - star
  - featurecounts
  - samtools
# Run with config file
multiqc . --config multiqc_config.yaml -o reports/

Step 4: Use MultiQC Modules and Filters

Control which tools and samples are included.

# Run only specific modules
multiqc . --module fastqc --module samtools

# Exclude specific modules
multiqc . --exclude fastqc

# Include only files matching a pattern
multiqc . --filename "*.flagstat" --filename "*_fastqc.zip"

# Ignore specific directories or files
multiqc . --ignore "tmp/" --ignore "*.bam"

# Add sample name regex substitution
multiqc . --replace-names "sample_" ""

Step 5: Export Data for Downstream Analysis

Extract machine-readable statistics from the MultiQC report.

# Export data tables (CSV, JSON, YAML, TSV)
multiqc . -o reports/ --data-format json
# Generates: reports/multiqc_data/multiqc_data.json

# Export flat CSV tables per tool
multiqc . -o reports/ --export
ls reports/multiqc_data/
# multiqc_fastqc.txt, multiqc_samtools_stats.txt, ...

# Extract general stats as pandas DataFrame
python3 - << 'EOF'
import json
import pandas as pd
with open("reports/multiqc_data/multiqc_general_stats.json") as f:
    data = json.load(f)
df = pd.DataFrame(data).T
print(df.head())
print(f"Shape: {df.shape}")
EOF

Step 6: Automate in Pipeline Scripts

Integrate MultiQC as the final step of any QC pipeline.

#!/bin/bash
# Complete RNA-seq QC pipeline → MultiQC summary
SAMPLES=(ctrl_rep1 ctrl_rep2 treat_rep1 treat_rep2)
OUTDIR="pipeline_output"
mkdir -p $OUTDIR/{fastqc,star,featurecounts,flagstat}

for sample in "${SAMPLES[@]}"; do
    # FastQC
    fastqc data/${sample}.fastq.gz -o $OUTDIR/fastqc/ -t 4
    # STAR alignment
    STAR --runThreadN 8 --genomeDir refs/star_index \
         --readFilesIn data/${sample}.fastq.gz \
         --outSAMtype BAM SortedByCoordinate \
         --outFileNamePrefix $OUTDIR/star/${sample}/
    # samtools flagstat
    samtools flagstat $OUTDIR/star/${sample}/Aligned.sortedByCoord.out.bam \
        > $OUTDIR/flagstat/${sample}.flagstat
done

# Final MultiQC report
multiqc $OUTDIR/ -o $OUTDIR/qc_report/ -n "full_pipeline_qc"
echo "Report ready: $OUTDIR/qc_report/full_pipeline_qc.html"

Key Parameters

ParameterDefaultRange/OptionsEffect
-o, --outdir
.
directory pathOutput directory for report and data
-n, --filename
multiqc_report
any stringReport filename (without extension)
-m, --module
alltool nameRun only specified module(s)
--ignore
glob patternIgnore matching files or directories
--export
FalseflagExport flat tab-delimited data files
--data-format
tsv
tsv
,
json
,
yaml
Format for exported data files
--config
auto-detectedYAML file pathCustom config file with thresholds and naming
--replace-names
regex, replacementClean sample names in report
--fn_clean_exts
(built-in)list in configFile extensions to strip from sample names
--profile-runtime
FalseflagShow per-module runtime profiling

Common Recipes

Recipe: Add MultiQC to a Snakemake Pipeline

# In Snakefile: collect all QC outputs, then run MultiQC
rule multiqc:
    input:
        expand("qc/fastqc/{sample}_fastqc.zip", sample=SAMPLES),
        expand("qc/flagstat/{sample}.flagstat", sample=SAMPLES)
    output:
        html="reports/multiqc_report.html",
        data=directory("reports/multiqc_data")
    shell:
        "multiqc qc/ -o reports/ -n multiqc_report"

Recipe: Parse MultiQC Output in Python

import json
import pandas as pd

# Load general stats from JSON export
with open("reports/multiqc_data/multiqc_general_stats.json") as f:
    stats = json.load(f)

df = pd.DataFrame(stats).T
print(f"Samples: {len(df)}")
print(f"Metrics: {list(df.columns[:5])}")

# Flag samples with low mapping rate
if "STAR_mqc-generalstats-star-uniquely_mapped_percent" in df.columns:
    low_mapping = df[df["STAR_mqc-generalstats-star-uniquely_mapped_percent"] < 70]
    print(f"Samples with <70% mapping: {list(low_mapping.index)}")

Recipe: Compare QC Before and After Trimming

# Run FastQC on raw and trimmed reads, then combine in one report
mkdir -p qc/{raw,trimmed}

fastqc data/*.fastq.gz -o qc/raw/ -t 8
trim_galore data/*.fastq.gz --paired -o trimmed/
fastqc trimmed/*_trimmed.fastq.gz -o qc/trimmed/ -t 8

multiqc qc/raw/ qc/trimmed/ \
    -o reports/ -n raw_vs_trimmed \
    --dirs --dirs-depth 1  # use directory names in sample labels

Expected Outputs

OutputFormatDescription
multiqc_report.html
HTMLInteractive report with all plots and tables
multiqc_data/multiqc_general_stats.txt
TSVPer-sample summary statistics (all tools)
multiqc_data/multiqc_*.txt
TSVPer-tool detailed statistics tables
multiqc_data/multiqc_data.json
JSONFull data (if
--data-format json
)
multiqc_data/multiqc_sources.txt
TSVMapping of source files to samples

Troubleshooting

ProblemCauseSolution
Empty report (no modules found)QC files not in scanned directoriesSpecify directories explicitly:
multiqc qc/ logs/ results/
Wrong sample names in reportFile extensions or paths not cleanedAdd
fn_clean_exts
to config or use
--replace-names
Module missing from reportLog file format changed in tool versionUpdate MultiQC:
pip install --upgrade multiqc
; check GitHub issues
Duplicate sample namesMultiple files map to same sample nameUse
--sample-names
or fix
fn_clean_exts
in config
Report very slow to openToo many samples (>500) in one reportSplit by project or condition; use
--flat
for simpler rendering
FastQC data not parsedFastQC ZIP not in expected locationRun MultiQC from root of project; ensure
*_fastqc.zip
files exist
ModuleNotFoundError
Missing optional module dependencies
pip install multiqc[all]
for all extras

References