Claude-skill-registry bio-phylogenomics

Build marker gene alignments and phylogenetic trees.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bio-phylogenomics" ~/.claude/skills/majiayu000-claude-skill-registry-bio-phylogenomics && rm -rf "$T"
manifest: skills/data/bio-phylogenomics/SKILL.md
source content

Bio Phylogenomics

When to use

  • Build marker gene alignments and phylogenetic trees.

Prerequisites

  • Tools installed via pixi (see pixi.toml).
  • Marker gene set or alignments available.

Inputs

  • markers.faa (marker genes) or alignments.fasta

Outputs

  • results/bio-phylogenomics/alignments/
  • results/bio-phylogenomics/trees/
  • results/bio-phylogenomics/phylo_report.md
  • results/bio-phylogenomics/logs/

Steps

  1. Extract marker genes or SSU rRNA sequences.
  2. Align and trim sequences using project-standard workflows.
  3. Build ML trees with bootstraps:
    • Standard accuracy: Use IQ-TREE (comprehensive model selection, publication-quality)
    • Fast mode: Use IQ-TREE -fast (exploratory analysis, large datasets >10K sequences)
    • Very large datasets: Use VeryFastTree (>100K sequences, ultra-fast)
  4. Post-process trees with ETE Toolkit:
    • Calculate tree statistics (branch lengths, distances, topology metrics)
    • Root, prune, or collapse nodes as needed
    • Filter by bootstrap support values
    • Add taxonomic or trait annotations
    • Generate publication-quality visualizations

QC gates

  • Alignment length and missingness meet project thresholds.
  • Bootstrap support summary meets project thresholds.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.

Validation

  • Verify markers.faa is non-empty and aligned sequences are consistent.

Tools

  • iqtree v3.0.1 (standard mode and -fast mode for rapid inference)
  • veryfasttree v4.0+ (ultra-fast tree inference for massive datasets)
  • ete3/ete4 (ETE Toolkit for tree processing, statistics, and visualization)

Tool Selection Workflow

Tree Inference

Choose tree inference tool based on dataset size and accuracy requirements:

Dataset SizeAccuracy RequiredRecommended Tool
<1K sequencesHigh (publication)IQ-TREE standard
<1K sequencesMedium (exploratory)IQ-TREE -fast
1K-10K sequencesHigh (publication)IQ-TREE standard
1K-10K sequencesMedium (exploratory)IQ-TREE -fast
10K-100K sequencesHighIQ-TREE standard
10K-100K sequencesMedium/FastIQ-TREE -fast
>100K sequencesAnyVeryFastTree

Post-Tree Processing

Use ETE Toolkit for all tree post-processing tasks:

  • Tree statistics and metrics
  • Tree manipulation (rooting, pruning, filtering)
  • Taxonomic annotation
  • Visualization and figure generation

Paper summaries (2023-2025)

  • summaries/ (include example use cases and tool settings used)

Tool documentation

  • IQ-TREE - Maximum likelihood phylogenetic inference with model selection
  • VeryFastTree - Ultra-fast approximate maximum likelihood trees
  • ETE Toolkit - Tree manipulation, statistics, and visualization

References

  • See ../bio-skills-references.md