Claude-skill-registry chromatin-state-inference

This skill should be used when users need to infer chromatin states from histone modification ChIP-seq data using chromHMM. It provides workflows for chromatin state segmentation, model training, state annotation.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/15-chromatin-state-inference" ~/.claude/skills/majiayu000-claude-skill-registry-chromatin-state-inference && rm -rf "$T"
manifest: skills/data/15-chromatin-state-inference/SKILL.md
source content

ChromHMM Chromatin State Inference

Overview

This skill enables comprehensive chromatin state analysis using chromHMM for histone modification ChIP-seq data. ChromHMM uses a multivariate Hidden Markov Model to segment the genome into discrete chromatin states based on combinatorial patterns of histone modifications.

Main steps include:

  • Refer to Inputs & Outputs to verify necessary files.
  • Always prompt user if required files are missing.
  • Always prompt user for genome assembly used.
  • Always prompt user for the bin size for generating binarized files.
  • Always prompt user for the bin size for the number of states the ChromHMM target.
  • Run chromHMM workflow: Binarization → Learning.

When to use this skill

Use this skill when you need to infer chromatin states from histone modification ChIP-seq data using chromHMM.


Inputs & Outputs

Inputs

(1) Option 1: BED files of aligned reads

<mark1>.bed
<mark2>.bed
... # Other marks

(1) Option 2: BAM files of aligned reads

<mark1>.bam
<mark2>.bam
... # Other marks

Outputs

chromhmm_output/
  binarized/
    *.txt 
  model/
    *.txt
    ... # other files output by the ChromHMM

Decision Tree

Step 0: Initialize Project

Call:

  • mcp__project-init-tools__project_init

with:

  • sample
    : all
  • task
    : chromhmm

Step 1: Prepare the
cellmarkfile
(skip this step if signal files are provided)

  • Prepare a .txt file (without header) containing following three columns:

    • sample name
    • marker name
    • name of the BED/BAM file
    • control file of the sample (only provided if the input/control file is available)
  • example of the cellmark.txt file

cell1    mark1    cell1_mark2.bam    cell1_control.bam
cell1   mark2    cell1_mark2.bam    cell1/control.bam

Step 2: Data Binarization

  • For BAM inputs:
    Call:

    • mcp__chromhmm-tools__binarize_bam
      with:
    • path_chrom_sized
      : Provide by user or detect from the working directory
    • input_dir
      : Directory containing BAM files
    • cellmarkfile
      : Cell mark file defining histone modifications
    • output_dir
      : (e.g.
      binarized/
      )
    • bin_size
      : Provided by user
  • For BED inputs:
    Call

    mcp__chromhmm-tools__binarize_bed
    instead.

  • For Signal inputs:
    Call:

    mcp__chromhmm-tools__binarize_signal
    with:

    • input_dir
      : Directory of signals
    • output_dir
      : (e.g.
      binarized/
      )

Step 3: Model Learning

Call

  • mcp__chromhmm-tools__learn_model

with:

  • binarized_dir
    : Directory binarized file located in
  • num_states
    : Provide by user (e.g. 15)
  • output_model_dir
    : (e.g.
    model_15_states/
    )
  • genome
    : Provide by user (e.g.
    hg38
    )
  • threads
    : Provide by user (e.g. 16)

Parameter Optimization

Number of States

  • 8 states: Basic chromatin states
  • 15 states: Standard comprehensive states
  • 25 states: High-resolution states
  • Optimization: Use Bayesian Information Criterion (BIC)

Bin Size

  • 200bp: Standard resolution
  • 100bp: High resolution (requires more memory)
  • 500bp: Low resolution (faster computation)

State Interpretation

Common Chromatin States

  1. Active Promoter: H3K4me3, H3K27ac
  2. Weak Promoter: H3K4me3
  3. Poised Promoter: H3K4me3, H3K27me3
  4. Strong Enhancer: H3K27ac, H3K4me1
  5. Weak Enhancer: H3K4me1
  6. Insulator: CTCF
  7. Transcribed: H3K36me3
  8. Repressed: H3K27me3
  9. Heterochromatin: Low signal across marks

Troubleshooting

  • Memory errors: Reduce bin size or number of states
  • Convergence problems: Increase iterations or adjust learning rate
  • Uninterpretable states: Check input data quality and mark combinations
  • Missing chromosomes: Verify chromosome naming consistency