Claude-skill-registry methylation-variability-analysis
This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/25-methylation-variability" ~/.claude/skills/majiayu000-claude-skill-registry-methylation-variability-analysis && rm -rf "$T"
manifest:
skills/data/25-methylation-variability/SKILL.mdsource content
SKILL: Methylation Variability & Heterogeneity Analysis
Overview
Main steps include:
- Refer to the Inputs & Outputs section to check available inputs and design the output structure.
- Always prompt user for genome assembly used.
- Always prompt user for which columns in the BED files are methylation fraction/percent and coverage and strand.
- Building a multi-sample CpG methylation matrix from WGBS coverage files.
- Computing between-sample variability at CpG level (variance, MAD, CV).
When to use this skill
Use this methylKit-based variability pipeline when you want to:
- Quantify between-sample variability at CpG level (e.g., across replicates, cell types, conditions).
- Identify highly variable CpGs (HVCs) as candidate epigenetically heterogeneous loci.
- Explore epigenetic heterogeneity between groups (e.g., GM12878 vs K562, disease vs control).
Inputs & Outputs
Inputs
<sample1>.bed
<sample2>.bed
Outputs
methylation_variability/ stats/ top_variable_CpGs.tsv CpG_variability_stats.tsv plots/ heatmap_top_variable_CpGs.pdf distribution_CpG_variance.pdf mean_vs_variance_scatter.pdf temp/
Decision Tree
Step 1: Prepare the sample meta data
library(methylKit) file.list <- list( "sample1.cov", "sample2.cov", "sample3.cov" ) sample.id <- list("S1", "S2", "S3") treatment <- c(0, 1, 1) # e.g. 0 = control, 1 = treated # Read methylation data myobj <- methRead( location = file.list, sample.id = sample.id, assembly = "hg38", # provided by user treatment = treatment, context = "CpG", pipeline = list( fraction = FALSE, # percMeth is 0–100, fraction is 0-1, depend on inputs chr.col = 1, start.col = 2, end.col = 3, strand.col = 6, # provided by user coverage.col = 10, # provided by user freqC.col = 11 # provided by user ) ) # Optional filtering: remove low / extremely high coverage CpGs filtered.myobj <- filterByCoverage( myobj, lo.count = 10, lo.perc = NULL, hi.count = 99.9, hi.perc = TRUE ) # Unite CpGs across samples (common CpG sites) meth <- unite(filtered.myobj, destrand = TRUE)
Step 2: Statistical analysis
d <- getData(meth.united) numCs.cols <- grep("numCs", colnames(d), value = TRUE) cov.cols <- grep("coverage", colnames(d), value = TRUE) pmat01 <- d[, numCs.cols] / d[, cov.cols] pmat01 <- as.matrix(data.frame(pmat01)) var.cpg <- rowVars(pmat01, na.rm = TRUE) # Variance across samples mad.cpg <- rowMads(pmat01, na.rm = TRUE) # Median absolute deviation (MAD) # Coefficient of variation (CV = sd / mean) mean.cpg <- rowMeans(pmat01, na.rm = TRUE) sd.cpg <- sqrt(var.cpg) cv.cpg <- sd.cpg / (mean.cpg + 1e-6) # add small constant to avoid division by zero # Assemble statistics table var.stats <- data.frame( chr = d$chr, start = d$start, end = d$end, mean = mean.cpg, variance = var.cpg, MAD = mad.cpg, CV = cv.cpg, stringsAsFactors = FALSE ) var.stats <- var.stats[order(-var.stats$variance), ] # Sort by variance (descending) # Save full table write.table( var.stats, file = "CpG_variability_stats.tsv", sep = " ", quote = FALSE, row.names = FALSE )
Step 3: high variable CpG selection
topN <- 1000 top.idx <- head(order(-var.cpg), topN) pmat.top <- pmat01[top.idx, , drop = FALSE] # Save top-variable CpGs table write.table( var.stats[match(rownames(pmat.top), rownames(var.stats)), ], file = "top_variable_CpGs.tsv", sep = " ", quote = FALSE, row.names = FALSE )
Step 4: Visualization
group.factor <- factor(ifelse(treatment == 0, "GM12878", "K562")) ha <- HeatmapAnnotation(Group = group.factor) Heatmap( pmat.top, name = "methylation", show_row_names = FALSE, show_column_names = TRUE, top_annotation = ha, cluster_rows = TRUE, cluster_columns = TRUE ) # Distribution of the CpG variability var.df <- data.frame( variance = var.cpg, log10_variance = log10(var.cpg + 1e-8) ) ggplot(var.df, aes(x = log10_variance)) + geom_histogram(bins = 50) + theme_minimal() + labs( title = "CpG-wise methylation variance (log10 scale)", x = "log10(variance + 1e-8)", y = "Count of CpGs" ) # 3. Mean vs Variance scatter plot ggplot(var.stats, aes(x = mean_methylation, y = variance)) + geom_hex(bins = 50) + scale_fill_viridis_c(trans = "log10") + theme_minimal() + labs( title = "Mean Methylation vs Variance", x = "Mean Methylation", y = "Variance", fill = "Count (log10)" ) + theme( plot.title = element_text(hjust = 0.5, size = 14, face = "bold") )
Recommended Extensions
- You can change 'lo.count', 'hi.perc', and 'topN' depending on coverage and dataset size.
- If you want group-wise differential variability (e.g., GM12878 vs K562),
- you can apply variance/Bartlett/Levene tests per CpG using 'pmat01' and 'treatment'.
- Add region-level annotation (promoters, gene bodies, CpG islands) using
and TxDb annotations, then compute variability at region level by aggregating CpG variability.GenomicRanges - Implement differential variability tests between groups (e.g., variance comparison between GM12878 and K562).
- Combine this variability pipeline with DMR analysis from methylKit to simultaneously look at mean shifts and heterogeneity.