LLMs-Universal-Life-Science-and-Clinical-Skills- fastp-workflow

<!--

install

source · Clone the upstream repo

git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/NGS_QC/read-qc/fastp-workflow" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-fastp-workflow && rm -rf "$T"

manifest: Skills/NGS_QC/read-qc/fastp-workflow/SKILL.md

source content

name: bio-read-qc-fastp-workflow description: All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps. tool_type: cli primary_tool: fastp measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

fastp Workflow

All-in-one preprocessing tool that handles adapter trimming, quality filtering, deduplication, and report generation in a single pass.

Basic Usage

Single-End

fastp -i input.fastq.gz -o output.fastq.gz

Paired-End

fastp -i R1.fastq.gz -I R2.fastq.gz -o R1_clean.fastq.gz -O R2_clean.fastq.gz

With Custom HTML/JSON Reports

fastp -i R1.fq.gz -I R2.fq.gz \
      -o R1_clean.fq.gz -O R2_clean.fq.gz \
      -h sample_report.html \
      -j sample_report.json

Adapter Trimming

fastp auto-detects Illumina adapters by default.

# Auto-detect (default)
fastp -i in.fq -o out.fq

# Specify adapters manually
fastp -i in.fq -o out.fq \
      --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

# Paired-end with manual adapters
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
      --adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

# Disable adapter trimming
fastp -i in.fq -o out.fq --disable_adapter_trimming

# Adapter FASTA file
fastp -i in.fq -o out.fq --adapter_fasta adapters.fa

Quality Filtering

# Per-base quality threshold (default Q15)
fastp -i in.fq -o out.fq -q 20

# Mean read quality threshold
fastp -i in.fq -o out.fq -e 25

# Max unqualified bases percent (default 40)
fastp -i in.fq -o out.fq -q 20 --unqualified_percent_limit 30

# Disable quality filtering
fastp -i in.fq -o out.fq --disable_quality_filtering

Quality Trimming

# Sliding window from 3' end (recommended)
fastp -i in.fq -o out.fq \
      --cut_right \
      --cut_right_window_size 4 \
      --cut_right_mean_quality 20

# Sliding window from 5' end
fastp -i in.fq -o out.fq \
      --cut_front \
      --cut_front_window_size 4 \
      --cut_front_mean_quality 20

# Both ends
fastp -i in.fq -o out.fq \
      --cut_front --cut_tail \
      --cut_front_window_size 4 \
      --cut_front_mean_quality 20 \
      --cut_tail_window_size 4 \
      --cut_tail_mean_quality 20

Length Filtering

# Minimum length (default 15)
fastp -i in.fq -o out.fq -l 36

# Maximum length
fastp -i in.fq -o out.fq --length_limit 150

# Required length (discard shorter AND longer)
fastp -i in.fq -o out.fq -l 100 --length_limit 100

Poly-X Trimming

# Trim poly-G (NovaSeq/NextSeq artifacts) - auto-enabled for these platforms
fastp -i in.fq -o out.fq --trim_poly_g

# Disable poly-G trimming
fastp -i in.fq -o out.fq --disable_trim_poly_g

# Trim poly-X (any homopolymer)
fastp -i in.fq -o out.fq --trim_poly_x

# Custom poly-G minimum length (default 10)
fastp -i in.fq -o out.fq --trim_poly_g --poly_g_min_len 5

N Base Handling

# Max N bases (default 5)
fastp -i in.fq -o out.fq -n 3

# Disable N filtering
fastp -i in.fq -o out.fq --n_base_limit 50

Deduplication

# Enable deduplication
fastp -i in.fq -o out.fq --dedup

# Accuracy level (1-6, higher = more memory, default 3)
fastp -i in.fq -o out.fq --dedup --dup_calc_accuracy 4

Base Correction (Paired-End Only)

# Enable overlap-based correction
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq --correction

# Required overlap length (default 30)
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --correction --overlap_len_require 20

Paired-End Merge

# Merge overlapping paired reads
fastp -i R1.fq -I R2.fq \
      --merge --merged_out merged.fq \
      -o R1_unmerged.fq -O R2_unmerged.fq

UMI Processing

# UMI in read (extract to header)
fastp -i in.fq -o out.fq \
      --umi --umi_loc read1 --umi_len 8

# UMI in separate read
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --umi --umi_loc index1

# UMI locations: index1, index2, read1, read2, per_index, per_read

Complete Workflow Example

Standard Illumina Pipeline

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 36 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

NovaSeq/NextSeq Pipeline

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --trim_poly_g \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 36 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

RNA-seq Pipeline

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 50 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

Output Files

File	Description
`*.html`	Interactive HTML report
`*.json`	Machine-readable statistics
Output FASTQ	Processed reads

JSON Report Structure

import json

with open('sample_fastp.json') as f:
    report = json.load(f)

summary = report['summary']
print(f"Total reads: {summary['before_filtering']['total_reads']}")
print(f"Passed reads: {summary['after_filtering']['total_reads']}")
print(f"Q20 rate: {summary['after_filtering']['q20_rate']:.2%}")
print(f"Q30 rate: {summary['after_filtering']['q30_rate']:.2%}")

Performance

# Set threads (default 3)
fastp -i in.fq -o out.fq --thread 8

# Disable HTML report (faster)
fastp -i in.fq -o out.fq --html /dev/null

# Process from stdin
zcat in.fq.gz | fastp --stdin -o out.fq

Related Skills

quality-reports - MultiQC can aggregate fastp JSON reports
adapter-trimming - Cutadapt for complex adapter scenarios
quality-filtering - Trimmomatic alternative