Claude-skill-registry bio-phylogenomics
Build marker gene alignments and phylogenetic trees.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bio-phylogenomics" ~/.claude/skills/majiayu000-claude-skill-registry-bio-phylogenomics && rm -rf "$T"
manifest:
skills/data/bio-phylogenomics/SKILL.mdsource content
Bio Phylogenomics
When to use
- Build marker gene alignments and phylogenetic trees.
Prerequisites
- Tools installed via pixi (see pixi.toml).
- Marker gene set or alignments available.
Inputs
- markers.faa (marker genes) or alignments.fasta
Outputs
- results/bio-phylogenomics/alignments/
- results/bio-phylogenomics/trees/
- results/bio-phylogenomics/phylo_report.md
- results/bio-phylogenomics/logs/
Steps
- Extract marker genes or SSU rRNA sequences.
- Align and trim sequences using project-standard workflows.
- Build ML trees with bootstraps:
- Standard accuracy: Use IQ-TREE (comprehensive model selection, publication-quality)
- Fast mode: Use IQ-TREE -fast (exploratory analysis, large datasets >10K sequences)
- Very large datasets: Use VeryFastTree (>100K sequences, ultra-fast)
- Post-process trees with ETE Toolkit:
- Calculate tree statistics (branch lengths, distances, topology metrics)
- Root, prune, or collapse nodes as needed
- Filter by bootstrap support values
- Add taxonomic or trait annotations
- Generate publication-quality visualizations
QC gates
- Alignment length and missingness meet project thresholds.
- Bootstrap support summary meets project thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
Validation
- Verify markers.faa is non-empty and aligned sequences are consistent.
Tools
- iqtree v3.0.1 (standard mode and -fast mode for rapid inference)
- veryfasttree v4.0+ (ultra-fast tree inference for massive datasets)
- ete3/ete4 (ETE Toolkit for tree processing, statistics, and visualization)
Tool Selection Workflow
Tree Inference
Choose tree inference tool based on dataset size and accuracy requirements:
| Dataset Size | Accuracy Required | Recommended Tool |
|---|---|---|
| <1K sequences | High (publication) | IQ-TREE standard |
| <1K sequences | Medium (exploratory) | IQ-TREE -fast |
| 1K-10K sequences | High (publication) | IQ-TREE standard |
| 1K-10K sequences | Medium (exploratory) | IQ-TREE -fast |
| 10K-100K sequences | High | IQ-TREE standard |
| 10K-100K sequences | Medium/Fast | IQ-TREE -fast |
| >100K sequences | Any | VeryFastTree |
Post-Tree Processing
Use ETE Toolkit for all tree post-processing tasks:
- Tree statistics and metrics
- Tree manipulation (rooting, pruning, filtering)
- Taxonomic annotation
- Visualization and figure generation
Paper summaries (2023-2025)
- summaries/ (include example use cases and tool settings used)
Tool documentation
- IQ-TREE - Maximum likelihood phylogenetic inference with model selection
- VeryFastTree - Ultra-fast approximate maximum likelihood trees
- ETE Toolkit - Tree manipulation, statistics, and visualization
References
- See ../bio-skills-references.md