Claude-skill-registry chromatin-state-inference

This skill should be used when users need to infer chromatin states from histone modification ChIP-seq data using chromHMM. It provides workflows for chromatin state segmentation, model training, state annotation.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/15-chromatin-state-inference" ~/.claude/skills/majiayu000-claude-skill-registry-chromatin-state-inference && rm -rf "$T"

manifest: skills/data/15-chromatin-state-inference/SKILL.md

ChromHMM Chromatin State Inference

Overview

This skill enables comprehensive chromatin state analysis using chromHMM for histone modification ChIP-seq data. ChromHMM uses a multivariate Hidden Markov Model to segment the genome into discrete chromatin states based on combinatorial patterns of histone modifications.

Main steps include:

Refer to Inputs & Outputs to verify necessary files.
Always prompt user if required files are missing.
Always prompt user for genome assembly used.
Always prompt user for the bin size for generating binarized files.
Always prompt user for the bin size for the number of states the ChromHMM target.
Run chromHMM workflow: Binarization → Learning.

When to use this skill

Use this skill when you need to infer chromatin states from histone modification ChIP-seq data using chromHMM.

Inputs & Outputs

Inputs

(1) Option 1: BED files of aligned reads

<mark1>.bed
<mark2>.bed
... # Other marks

(1) Option 2: BAM files of aligned reads

<mark1>.bam
<mark2>.bam
... # Other marks

Outputs

chromhmm_output/
  binarized/
    *.txt 
  model/
    *.txt
    ... # other files output by the ChromHMM

Decision Tree

Step 0: Initialize Project

Call:

```
mcp__project-init-tools__project_init
```

with:

```
sample
```
: all
```
task
```
: chromhmm

Step 1: Prepare the

cellmarkfile

(skip this step if signal files are provided)

Prepare a .txt file (without header) containing following three columns:
- sample name
- marker name
- name of the BED/BAM file
- control file of the sample (only provided if the input/control file is available)
example of the cellmark.txt file

cell1    mark1    cell1_mark2.bam    cell1_control.bam
cell1   mark2    cell1_mark2.bam    cell1/control.bam

Step 2: Data Binarization

For BAM inputs:
Call:
- ```
mcp__chromhmm-tools__binarize_bam
```
  with:
- ```
path_chrom_sized
```
  : Provide by user or detect from the working directory
- ```
input_dir
```
  : Directory containing BAM files
- ```
cellmarkfile
```
  : Cell mark file defining histone modifications
- ```
output_dir
```
  : (e.g.
```
binarized/
```
  )
- ```
bin_size
```
  : Provided by user
For BED inputs:
Call
```
mcp__chromhmm-tools__binarize_bed
```
instead.

For Signal inputs:
Call:

mcp__chromhmm-tools__binarize_signal

with:

```
input_dir
```
: Directory of signals
```
output_dir
```
: (e.g.
```
binarized/
```
)

Step 3: Model Learning

Call

```
mcp__chromhmm-tools__learn_model
```

with:

```
binarized_dir
```
: Directory binarized file located in
```
num_states
```
: Provide by user (e.g. 15)
```
output_model_dir
```
: (e.g.
```
model_15_states/
```
)
```
genome
```
: Provide by user (e.g.
```
hg38
```
)
```
threads
```
: Provide by user (e.g. 16)

Parameter Optimization

Number of States

8 states: Basic chromatin states
15 states: Standard comprehensive states
25 states: High-resolution states
Optimization: Use Bayesian Information Criterion (BIC)

Bin Size

200bp: Standard resolution
100bp: High resolution (requires more memory)
500bp: Low resolution (faster computation)

State Interpretation

Common Chromatin States

Active Promoter: H3K4me3, H3K27ac
Weak Promoter: H3K4me3
Poised Promoter: H3K4me3, H3K27me3
Strong Enhancer: H3K27ac, H3K4me1
Weak Enhancer: H3K4me1
Insulator: CTCF
Transcribed: H3K36me3
Repressed: H3K27me3
Heterochromatin: Low signal across marks

Troubleshooting

Memory errors: Reduce bin size or number of states
Convergence problems: Increase iterations or adjust learning rate
Uninterpretable states: Check input data quality and mark combinations
Missing chromosomes: Verify chromosome naming consistency