BioSkills bio-workflows-methylation-pipeline
End-to-end bisulfite sequencing workflow from FASTQ to differentially methylated regions. Covers Bismark alignment, methylation calling, and DMR detection with methylKit. Use when analyzing bisulfite sequencing data.
git clone https://github.com/GPTomics/bioSkills
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/workflows/methylation-pipeline" ~/.claude/skills/gptomics-bioskills-bio-workflows-methylation-pipeline && rm -rf "$T"
workflows/methylation-pipeline/SKILL.mdVersion Compatibility
Reference examples tested with: Bismark 0.24+, Bowtie2 2.5.3+, FastQC 0.12+, Trim Galore 0.6.10+, fastp 0.23+, methylKit 1.28+
Before using code patterns, verify installed versions match. If versions differ:
- R:
thenpackageVersion('<pkg>')
to verify parameters?function_name - CLI:
then<tool> --version
to confirm flags<tool> --help
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Methylation Pipeline
"Analyze my bisulfite sequencing data from FASTQ to DMRs" → Orchestrate Bismark alignment, methylation calling, methylKit analysis, DMR detection, annotation with genomic features, and visualization of methylation patterns.
Complete workflow from bisulfite sequencing FASTQ to differentially methylated regions.
Workflow Overview
FASTQ files | v [1. QC & Trimming] -----> fastp/Trim Galore | v [2. Alignment] ---------> Bismark | v [3. Deduplication] -----> deduplicate_bismark | v [4. Methylation Calling] -> bismark_methylation_extractor | v [5. Per-CpG Analysis] ---> methylKit (R) or scipy (Python) | v [6. DMR Detection] ------> methylKit/DSS | v Differentially methylated regions
Primary Path: Bismark + methylKit
Step 1: Quality Control
# Trim Galore recommended for bisulfite data (handles adapter bias) trim_galore --paired --fastqc \ -o trimmed/ \ sample_R1.fastq.gz sample_R2.fastq.gz # Or fastp with conservative settings fastp -i sample_R1.fastq.gz -I sample_R2.fastq.gz \ -o trimmed/sample_R1.fq.gz -O trimmed/sample_R2.fq.gz \ --detect_adapter_for_pe \ --qualified_quality_phred 20 \ --length_required 35 \ --html qc/sample_fastp.html
Step 2: Bismark Alignment
# Prepare genome (once) bismark_genome_preparation --bowtie2 genome/ # Align bismark --genome genome/ \ -1 trimmed/sample_R1_val_1.fq.gz \ -2 trimmed/sample_R2_val_2.fq.gz \ -o aligned/ \ --parallel 4 \ --temp_dir tmp/ # Output: sample_R1_val_1_bismark_bt2_pe.bam
QC Checkpoint: Check Bismark report
- Mapping efficiency >50% (BS-seq has lower rates)
- Bisulfite conversion rate >99%
Step 3: Deduplication
deduplicate_bismark \ --bam \ -p \ -o deduplicated/ \ aligned/sample_R1_val_1_bismark_bt2_pe.bam
Step 4: Methylation Calling
bismark_methylation_extractor \ --paired-end \ --comprehensive \ --bedGraph \ --cytosine_report \ --genome_folder genome/ \ -o methylation/ \ deduplicated/sample_R1_val_1_bismark_bt2_pe.deduplicated.bam # Generate summary report bismark2report bismark2summary
Step 5: Analysis with methylKit
library(methylKit) # Read methylation calls files <- list( 'methylation/control_1.CpG_report.txt', 'methylation/control_2.CpG_report.txt', 'methylation/treated_1.CpG_report.txt', 'methylation/treated_2.CpG_report.txt' ) sample_ids <- c('control_1', 'control_2', 'treated_1', 'treated_2') treatment <- c(0, 0, 1, 1) # Read cytosine reports meth_obj <- methRead( location = as.list(files), sample.id = as.list(sample_ids), assembly = 'hg38', treatment = treatment, context = 'CpG', pipeline = 'bismarkCytosineReport' ) # Filter by coverage meth_filtered <- filterByCoverage(meth_obj, lo.count = 10, hi.perc = 99.9) # Normalize coverage meth_norm <- normalizeCoverage(meth_filtered) # Merge samples (keep sites covered in all) meth_merged <- unite(meth_norm, destrand = TRUE) # Sample statistics getMethylationStats(meth_obj[[1]], plot = TRUE) getCoverageStats(meth_obj[[1]], plot = TRUE)
Step 5b: Python Alternative for Per-CpG Testing
When methylKit is unavailable or a Python-only workflow is preferred, per-CpG testing can be performed with scipy and statsmodels on beta values computed from the coverage files.
import pandas as pd from scipy.stats import ttest_ind from statsmodels.stats.multitest import multipletests import numpy as np # Read Bismark coverage files and compute beta values # beta = count_methylated / (count_methylated + count_unmethylated) # Filter CpGs with < 10x coverage in any sample # Run per-CpG Welch's t-test between groups # Apply BH FDR correction: multipletests(pvals, method='fdr_bh') # See methylation-analysis/differential-cpg-testing for full pipeline
This approach is appropriate for large sample sizes (>10 per group). For small sample sizes (3-5 per group), use limma on M-values instead (also covered in the differential-cpg-testing skill).
Step 6: DMR Detection
# Calculate differential methylation (per CpG) diff_meth <- calculateDiffMeth(meth_merged) # Get significant DMCs dmc <- getMethylDiff(diff_meth, difference = 25, qvalue = 0.01) # Tile into regions (DMRs) tiles <- tileMethylCounts(meth_merged, win.size = 1000, step.size = 1000) diff_tiles <- calculateDiffMeth(tiles) dmr <- getMethylDiff(diff_tiles, difference = 25, qvalue = 0.01) # Export write.csv(as.data.frame(dmc), 'dmc_results.csv') write.csv(as.data.frame(dmr), 'dmr_results.csv') # Annotate with genomic features library(genomation) gene_obj <- readTranscriptFeatures('genes.bed') annotateWithGeneParts(as(dmr, 'GRanges'), gene_obj)
Parameter Recommendations
| Step | Parameter | Value |
|---|---|---|
| Trim Galore | default | Recommended for BS-seq |
| Bismark | --parallel | 4 (per sample parallelization) |
| methylKit | lo.count | 10 (minimum coverage) |
| methylKit | difference | 25 (% methylation difference) |
| methylKit | qvalue | 0.01 |
| DMR tiles | win.size | 500-1000 bp |
Troubleshooting
| Issue | Likely Cause | Solution |
|---|---|---|
| Low mapping rate | Normal for BS-seq | Expect 40-70% |
| Low conversion | Failed bisulfite treatment | Check spike-in controls |
| Few DMRs | Low coverage, small differences | Increase sequencing, relax thresholds |
| Biased positions | M-bias | Trim 10bp from read ends |
Complete Pipeline Script
#!/bin/bash set -e THREADS=4 GENOME="genome/" SAMPLES="control_1 control_2 treated_1 treated_2" OUTDIR="methylation_results" mkdir -p ${OUTDIR}/{trimmed,aligned,deduplicated,methylation,qc} # Step 1: QC for sample in $SAMPLES; do trim_galore --paired --fastqc -o ${OUTDIR}/trimmed/ \ ${sample}_R1.fastq.gz ${sample}_R2.fastq.gz done # Step 2: Alignment for sample in $SAMPLES; do bismark --genome ${GENOME} \ -1 ${OUTDIR}/trimmed/${sample}_R1_val_1.fq.gz \ -2 ${OUTDIR}/trimmed/${sample}_R2_val_2.fq.gz \ -o ${OUTDIR}/aligned/ \ --parallel ${THREADS} --temp_dir tmp/ done # Step 3: Deduplication for sample in $SAMPLES; do deduplicate_bismark --bam -p \ -o ${OUTDIR}/deduplicated/ \ ${OUTDIR}/aligned/${sample}_R1_val_1_bismark_bt2_pe.bam done # Step 4: Methylation calling for sample in $SAMPLES; do bismark_methylation_extractor --paired-end --comprehensive \ --bedGraph --cytosine_report \ --genome_folder ${GENOME} \ -o ${OUTDIR}/methylation/ \ ${OUTDIR}/deduplicated/${sample}_R1_val_1_bismark_bt2_pe.deduplicated.bam done bismark2report echo "Pipeline complete. Run R script for DMR analysis."
Related Skills
- methylation-analysis/bismark-alignment - Bismark parameters
- methylation-analysis/methylation-calling - Calling details
- methylation-analysis/methylkit-analysis - methylKit functions
- methylation-analysis/differential-cpg-testing - Per-CpG testing (Python/R alternatives)
- methylation-analysis/dmr-detection - DMR algorithms