Claude-skill-registry bio-phylogenomics

Build marker gene alignments and phylogenetic trees.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bio-phylogenomics" ~/.claude/skills/majiayu000-claude-skill-registry-bio-phylogenomics && rm -rf "$T"

manifest: skills/data/bio-phylogenomics/SKILL.md

Bio Phylogenomics

When to use

Build marker gene alignments and phylogenetic trees.

Prerequisites

Tools installed via pixi (see pixi.toml).
Marker gene set or alignments available.

Inputs

markers.faa (marker genes) or alignments.fasta

Outputs

results/bio-phylogenomics/alignments/
results/bio-phylogenomics/trees/
results/bio-phylogenomics/phylo_report.md
results/bio-phylogenomics/logs/

Steps

Extract marker genes or SSU rRNA sequences.
Align and trim sequences using project-standard workflows.
Build ML trees with bootstraps:
- Standard accuracy: Use IQ-TREE (comprehensive model selection, publication-quality)
- Fast mode: Use IQ-TREE -fast (exploratory analysis, large datasets >10K sequences)
- Very large datasets: Use VeryFastTree (>100K sequences, ultra-fast)
Post-process trees with ETE Toolkit:
- Calculate tree statistics (branch lengths, distances, topology metrics)
- Root, prune, or collapse nodes as needed
- Filter by bootstrap support values
- Add taxonomic or trait annotations
- Generate publication-quality visualizations

QC gates

Alignment length and missingness meet project thresholds.
Bootstrap support summary meets project thresholds.
On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.

Validation

Verify markers.faa is non-empty and aligned sequences are consistent.

Tools

iqtree v3.0.1 (standard mode and -fast mode for rapid inference)
veryfasttree v4.0+ (ultra-fast tree inference for massive datasets)
ete3/ete4 (ETE Toolkit for tree processing, statistics, and visualization)

Tool Selection Workflow

Tree Inference

Choose tree inference tool based on dataset size and accuracy requirements:

Dataset Size	Accuracy Required	Recommended Tool
<1K sequences	High (publication)	IQ-TREE standard
<1K sequences	Medium (exploratory)	IQ-TREE -fast
1K-10K sequences	High (publication)	IQ-TREE standard
1K-10K sequences	Medium (exploratory)	IQ-TREE -fast
10K-100K sequences	High	IQ-TREE standard
10K-100K sequences	Medium/Fast	IQ-TREE -fast
>100K sequences	Any	VeryFastTree

Post-Tree Processing

Use ETE Toolkit for all tree post-processing tasks:

Tree statistics and metrics
Tree manipulation (rooting, pruning, filtering)
Taxonomic annotation
Visualization and figure generation

Paper summaries (2023-2025)

summaries/ (include example use cases and tool settings used)

Tool documentation

IQ-TREE - Maximum likelihood phylogenetic inference with model selection
VeryFastTree - Ultra-fast approximate maximum likelihood trees
ETE Toolkit - Tree manipulation, statistics, and visualization

References

See ../bio-skills-references.md