Awesome-omni-skill bio-read-qc-quality-reports

Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/bio-read-qc-quality-reports" ~/.claude/skills/diegosouzapw-awesome-omni-skill-bio-read-qc-quality-reports-46bbda && rm -rf "$T"
manifest: skills/development/bio-read-qc-quality-reports/SKILL.md
source content

Quality Reports

Generate quality reports for FASTQ files using FastQC and aggregate multiple reports with MultiQC.

FastQC - Single Sample Reports

Basic Usage

# Single file
fastqc sample.fastq.gz

# Multiple files
fastqc *.fastq.gz

# Specify output directory
fastqc -o qc_reports/ sample_R1.fastq.gz sample_R2.fastq.gz

# Set threads
fastqc -t 4 *.fastq.gz

Output Files

FastQC produces two files per input:

  • sample_fastqc.html
    - Interactive HTML report
  • sample_fastqc.zip
    - Data files and images

Key Modules

ModuleWhat It ShowsWarning Signs
Per base sequence qualityQuality scores across readDrop below Q20 at 3' end
Per sequence qualityQuality score distributionBimodal distribution
Per base sequence contentNucleotide compositionImbalance at start (normal)
Per sequence GC contentGC distributionSecondary peak (contamination)
Per base N contentUnknown basesHigh N content
Sequence length distributionRead lengthsUnexpected variation
Sequence duplicationDuplicate readsHigh duplication (PCR)
Overrepresented sequencesCommon sequencesAdapter contamination
Adapter contentAdapter sequencesVisible adapter curves

Extract Data from ZIP

# Unzip to access raw data
unzip sample_fastqc.zip

# View summary
cat sample_fastqc/summary.txt

# Get per-base quality
cat sample_fastqc/fastqc_data.txt | grep -A 50 ">>Per base sequence quality"

MultiQC - Aggregate Reports

Basic Usage

# Aggregate all FastQC reports in current directory
multiqc .

# Specify input and output
multiqc qc_reports/ -o multiqc_output/

# Custom report name
multiqc . -n my_project_qc

# Force overwrite
multiqc . -f

Common Options

# Flat directory (no sample subdirs)
multiqc --flat .

# Export data as TSV
multiqc . --export

# Only specific modules
multiqc . -m fastqc

# Exclude patterns
multiqc . --ignore '*_trimmed*'

# Include patterns
multiqc . --ignore-samples '*negative*'

Output Files

  • multiqc_report.html
    - Interactive HTML report
  • multiqc_data/
    - Directory with data tables
    • multiqc_fastqc.txt
      - FastQC metrics
    • multiqc_general_stats.txt
      - Summary statistics
    • multiqc_sources.txt
      - Source files used

Extract Data Programmatically

import pandas as pd

general_stats = pd.read_csv('multiqc_data/multiqc_general_stats.txt', sep='\t')
print(general_stats.columns)

fastqc_data = pd.read_csv('multiqc_data/multiqc_fastqc.txt', sep='\t')

Batch Processing

Process Multiple Samples

# All FASTQ files in parallel
fastqc -t 8 -o qc_reports/ raw_data/*.fastq.gz

# Then aggregate
multiqc qc_reports/ -o multiqc_output/

Before and After Trimming

# Create separate directories
mkdir -p qc_reports/raw qc_reports/trimmed

# QC raw reads
fastqc -o qc_reports/raw/ raw_data/*.fastq.gz

# After trimming (using fastp, cutadapt, etc.)
fastqc -o qc_reports/trimmed/ trimmed_data/*.fastq.gz

# Compare with MultiQC
multiqc qc_reports/ -o qc_comparison/

Interpretation Guide

Quality Scores

Phred ScoreError RateInterpretation
Q400.0001Excellent
Q300.001Good (Illumina target)
Q200.01Acceptable
Q100.1Poor

Common Issues

IssueLikely CauseAction
Low quality at 3' endNormal degradationTrim 3' end
Adapter contaminationShort insertsTrim adapters
GC biasLibrary prepConsider correction
High duplicationLow complexity, PCRMark/remove duplicates
Overrepresented seqsAdapters, primersCheck sequences

Configuration

Custom Adapters

Create

~/.fastqc/Configuration/adapter_list.txt
:

Custom_Adapter_Name    ACGTACGTACGT

Custom Limits

Create

~/.fastqc/Configuration/limits.txt
to customize thresholds:

# Warn if mean quality below 25
quality_sequence    warn    25
quality_sequence    error   20

Related Skills

  • adapter-trimming - Remove adapters detected by FastQC
  • fastp-workflow - All-in-one QC and trimming
  • sequence-io/read-sequences - FASTQ file reading/writing