Claude-skill-registry chromatin-state-inference
This skill should be used when users need to infer chromatin states from histone modification ChIP-seq data using chromHMM. It provides workflows for chromatin state segmentation, model training, state annotation.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/15-chromatin-state-inference" ~/.claude/skills/majiayu000-claude-skill-registry-chromatin-state-inference && rm -rf "$T"
skills/data/15-chromatin-state-inference/SKILL.mdChromHMM Chromatin State Inference
Overview
This skill enables comprehensive chromatin state analysis using chromHMM for histone modification ChIP-seq data. ChromHMM uses a multivariate Hidden Markov Model to segment the genome into discrete chromatin states based on combinatorial patterns of histone modifications.
Main steps include:
- Refer to Inputs & Outputs to verify necessary files.
- Always prompt user if required files are missing.
- Always prompt user for genome assembly used.
- Always prompt user for the bin size for generating binarized files.
- Always prompt user for the bin size for the number of states the ChromHMM target.
- Run chromHMM workflow: Binarization → Learning.
When to use this skill
Use this skill when you need to infer chromatin states from histone modification ChIP-seq data using chromHMM.
Inputs & Outputs
Inputs
(1) Option 1: BED files of aligned reads
<mark1>.bed <mark2>.bed ... # Other marks
(1) Option 2: BAM files of aligned reads
<mark1>.bam <mark2>.bam ... # Other marks
Outputs
chromhmm_output/ binarized/ *.txt model/ *.txt ... # other files output by the ChromHMM
Decision Tree
Step 0: Initialize Project
Call:
mcp__project-init-tools__project_init
with:
: allsample
: chromhmmtask
Step 1: Prepare the cellmarkfile
(skip this step if signal files are provided)
cellmarkfile-
Prepare a .txt file (without header) containing following three columns:
- sample name
- marker name
- name of the BED/BAM file
- control file of the sample (only provided if the input/control file is available)
-
example of the cellmark.txt file
cell1 mark1 cell1_mark2.bam cell1_control.bam cell1 mark2 cell1_mark2.bam cell1/control.bam
Step 2: Data Binarization
-
For BAM inputs:
Call:
with:mcp__chromhmm-tools__binarize_bam
: Provide by user or detect from the working directorypath_chrom_sized
: Directory containing BAM filesinput_dir
: Cell mark file defining histone modificationscellmarkfile
: (e.g.output_dir
)binarized/
: Provided by userbin_size
-
For BED inputs:
Call
instead.mcp__chromhmm-tools__binarize_bed -
For Signal inputs:
Call:
with:mcp__chromhmm-tools__binarize_signal
: Directory of signalsinput_dir
: (e.g.output_dir
)binarized/
Step 3: Model Learning
Call
mcp__chromhmm-tools__learn_model
with:
: Directory binarized file located inbinarized_dir
: Provide by user (e.g. 15)num_states
: (e.g.output_model_dir
)model_15_states/
: Provide by user (e.g.genome
)hg38
: Provide by user (e.g. 16)threads
Parameter Optimization
Number of States
- 8 states: Basic chromatin states
- 15 states: Standard comprehensive states
- 25 states: High-resolution states
- Optimization: Use Bayesian Information Criterion (BIC)
Bin Size
- 200bp: Standard resolution
- 100bp: High resolution (requires more memory)
- 500bp: Low resolution (faster computation)
State Interpretation
Common Chromatin States
- Active Promoter: H3K4me3, H3K27ac
- Weak Promoter: H3K4me3
- Poised Promoter: H3K4me3, H3K27me3
- Strong Enhancer: H3K27ac, H3K4me1
- Weak Enhancer: H3K4me1
- Insulator: CTCF
- Transcribed: H3K36me3
- Repressed: H3K27me3
- Heterochromatin: Low signal across marks
Troubleshooting
- Memory errors: Reduce bin size or number of states
- Convergence problems: Increase iterations or adjust learning rate
- Uninterpretable states: Check input data quality and mark combinations
- Missing chromosomes: Verify chromosome naming consistency