Claude-skill-registry genomic-feature-annotation
This skill is used to perform genomic feature annotation and visualization for any file containing genomic region information using Homer (Hypergeometric Optimization of Motif EnRichment). It annotates regions such as promoters, exons, introns, intergenic regions, and TSS proximity, and generates visual summaries of feature distributions. ChIPseeker mode is also supported according to requirements.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/10-toolbased-genomic-feature-annotation" ~/.claude/skills/majiayu000-claude-skill-registry-genomic-feature-annotation && rm -rf "$T"
skills/data/10-toolbased-genomic-feature-annotation/SKILL.mdGenomic Feature Annotation and Visualization with Homer
Overview
- Prepare genomic region files in BED or other supported formats. Ensure that the input genomic regions are provided in a valid BED format (chromosome, start, end). If the file does not meet this format, extract the required columns to create a valid BED file.regions file.
- Identify and specify the correct genome assembly for annotation.
- Always prompt user for the tool to use, choose from ChIPseeker or HOMER
- If the user choose HOMER, then:
- Annotate the genomic regions using Homer's
.annotatePeaks.pl - Generate annotation statistics and feature distribution summaries.
- Visualize annotation results (e.g., pie charts, barplots).
- Annotate the genomic regions using Homer's
When to use this skill
- Find target genes of a certain TF. This skill will return an annotated peak file with the nearby genes of the TF. Genes whose promoter annotated to the TF peaks could be candidate target genes of the TF.
- Annotate the genomic regions like TF peaks, histone modification peaks, ATAC-seq peaks, etc.
- Generate annotation statistics and feature distribution summaries.
- Visualize annotation results (e.g., pie charts, barplots).
Inputs & Outputs
Inputs
Genomic region formats supported:
- BED files: Standard genomic interval format
- narrowPeak: narrow peak format
- broadPeak: broad peak format
Outputs
genomic_feature_annotation/ results/ ${sample}.anno_genomic_features.txt ${sample}.anno_genomic_features_stats.txt logs/ ${sample}.anno_genomic_features.log plots/ ${sample}.anno_genomic_features.pdf
Decision Tree
Step 0 — Gather Required Information from the User
Before calling any tool, ask the user:
- Sample name (
): used as prefix and for the output directorysample
.${sample}_genomic_feature_annotation - Genome assembly (
): e.g.genome
,hg38
,mm10
.danRer11- Never guess or auto-detect.
Step 1: Initialize Project
- Make director for this project:
Call:
mcp__project-init-tools__project_init
with:
: the user-provided sample namesample
: de_novo_motif_discoverytask
The tool will:
- Create
directory.${sample}_genomic_feature_annotation - Get the full path of the
directory, which will be used as${sample}_genomic_feature_annotation
.${proj_dir}
Step 2 (Optional): Standardize chromosome names for BED files
This step is optional. Only perform this step if the input file is a BED file. If the input file is a gene list, skip this step.
From
1 format to chr1 format
From MT format to chrM format
Call:
mcp__file-format-tools__standardize_bed_chrom_names
with:
: the user-provided BED fileinput_bed
: the path to save the standardized BED fileoutput_bed
The tool will:
- Standardize the chromosome names in the BED file.
- Return the path of the standardized BED file.
Step 3: Genomic Feature Annotation
- (Option 1) HOMER mode
Call:
mcp__homer-tools__annotate_genomic_features
With:
: the user-provided sample namesample
: directory to save the genomic feature annotation results. In this skill, it is the full path of theproj_dir
directory returned by${sample}_genomic_feature_annotationmcp__project-init-tools__project_init
: the user-provided regions file in BED format. May end withregions_bed
,.bed
,.narrowPeak
, etc..broadPeak
: the user-provided genome assembly, e.g.genome
,hg38
,mm10danRer11
: "custom homer annotation file (created by assignGenomeAnnotation.pl), (default: None).ann
: keep original region sizes (default: True)size_given
: include CpG information (default: False)cpg
The tool will:
-
Annotate the genomic regions using Homer's
.annotatePeaks.pl -
Return the path of the annotated regions file under
directory, and the path to the log file under${proj_dir}/results/
directory.${proj_dir}/logs/${proj_dir}/results/${sample}.anno_genomic_features.txt${proj_dir}/results/${sample}.anno_genomic_features_stats.txt${proj_dir}/logs/${sample}.anno_genomic_features.log
-
(Option 2) ChIPseeker mode
library(ChIPseeker) library(TxDb.Mmusculus.UCSC.mm10.knownGene) # ajust this depend on species library(org.Mm.eg.db) # ajust this depend on species txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene # ajust this depend on species peak_file <- "$sample.narrowPeak" peak_anno <- annotatePeak( peak_file, TxDb = txdb, tssRegion = c(-3000, 3000), # define "promoter" window around TSS annoDb = "org.Mm.eg.db" # adds SYMBOL, GENENAME, etc. ) pdf("plots/${sample}_anno_ChIPseeker.pdf", width = 6, height = 5) plotAnnoPie(peak_anno) plotAnnoBar(peak_anno) plotDistToTSS(peak_anno) dev.off()
Step 4: Visualize the annotation results (executed only in HOMER mode)
Call:
mcp__plot-anno-tools__visualize_annotation_results
With:
: the user-provided sample namesample
: directory to save the annotation results. In this skill, it is the full path of theproj_dir
directory returned by${sample}_genomic_feature_annotationmcp__project-init-tools__project_init
: Type of plot: 'pie' for pie chart, 'bar' for barplot. Default: 'pie'.chart_type
The tool will:
- Visualize the annotation results.
- Return the path of the plot file under
directory, and ends with${proj_dir}/plots/
..pdf
Step 5. Interpretation of Results
Typical annotation categories:
- Promoter: -1 kb to +100 bp from TSS
- 5' UTR, Exon, Intron, 3' UTR, Intergenic, TTS
Quality indicators:
- Annotation rate: % of peaks successfully annotated.
- Promoter fraction: Often high in TF ChIP-seq.
- Intergenic fraction: Reflects enhancer-rich or noncoding regions.
Best Practices
- Use high-confidence regions (e.g., IDR-filtered peaks).
- Ensure genome naming convention matches input files.
- Use visualization to assess annotation patterns across datasets.
- Save annotation parameters and plots for reproducibility.