Skillshub galaxy-bridge
Galaxy tool discovery, intelligent recommendation, and execution — 8,000+ bioinformatics tools from usegalaxy.org with multi-signal scoring and workflow suggestions
git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ClawBio/ClawBio/galaxy-bridge" ~/.claude/skills/comeonoliver-skillshub-galaxy-bridge && rm -rf "$T"
skills/ClawBio/ClawBio/galaxy-bridge/SKILL.mdGalaxy Bridge
ClawBio's gateway to the Galaxy ecosystem — 1,770+ production bioinformatics tools, discoverable and executable through natural language.
Why This Exists
Galaxy (usegalaxy.org) hosts the world's largest collection of curated bioinformatics tools — 1,770+ on the main server alone, covering everything from FASTQ QC to metagenomics to protein structure prediction. But discovering the right tool requires knowing exact tool IDs, navigating nested ToolShed categories, and understanding parameter schemas.
Galaxy Bridge makes these tools agent-accessible: search by natural language, execute via CLI, and chain Galaxy tools with ClawBio's local skills for cross-platform workflows that neither system can do alone.
Core Capabilities
- Intelligent tool recommendation — describe a task in plain English; multi-signal scoring across 7 dimensions returns the best Galaxy tool with explanations
- Workflow suggestions — 8 pre-defined pipeline templates (RNA-seq DE, metagenomics, variant calling, WES germline, ChIP-seq, nanopore, genome assembly, variant annotation)
- Input format awareness — provide your file extension (.fastq, .bam, .vcf) for format-aware recommendations
- Version deduplication — 8,182 catalog entries collapse to ~2,300 unique tools; latest version preferred, version count as maturity signal
- EDAM ontology resolution — 108 EDAM topic/operation IDs resolved to human-readable labels for richer matching
- Natural language search — keyword-based search across 8,000+ Galaxy tools by name, description, section, and EDAM terms
- Remote execution — run Galaxy tools on usegalaxy.org via BioBlend API
- Category browsing — explore 86 ToolShed categories with tool counts
- Tool detail inspection — view inputs, outputs, and parameter schemas
- Offline demo mode — FastQC demo with pre-cached results (no API key needed)
- Cross-platform chaining — Galaxy VEP → ClawBio PharmGx, Galaxy Kraken2 → ClawBio metagenomics
Input Formats
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| FASTQ | , , | Sequence reads | Illumina paired-end reads |
| VCF | , | Variant calls | Annotated VCF for VEP |
| BAM | | Aligned reads | BWA-MEM2 output |
| FASTA | , | Sequences | Reference genome |
| Tabular | , | Varies by tool | Gene expression matrix |
Workflow
- Search — User describes what they need → bridge searches local catalog + Galaxy API
- Select — Ranked results with descriptions, versions, and categories
- Configure — Show tool inputs/outputs schema; user provides files and parameters
- Execute — Upload input to Galaxy, run tool, poll for completion
- Retrieve — Download outputs to local directory
- Bundle — Generate reproducibility package (commands.sh, environment.yml, checksums)
CLI Reference
# Intelligent tool recommendation (new in v0.2.0) python galaxy_bridge.py --recommend "quality control on my sequencing reads" python galaxy_bridge.py --recommend "classify microbial species" --format .fastq python galaxy_bridge.py --recommend "call variants" --format .bam python galaxy_bridge.py --recommend "annotate variants from WES" --format .vcf # Workflow / pipeline suggestions (new in v0.2.0) python galaxy_bridge.py --workflow "RNA-seq differential expression" python galaxy_bridge.py --workflow "metagenomics" python galaxy_bridge.py --workflow "whole exome sequencing" # Search for tools by keyword python galaxy_bridge.py --search "metagenomics profiling" python galaxy_bridge.py --search "variant annotation" python galaxy_bridge.py --search "RNA-seq differential expression" # Browse Galaxy ToolShed categories python galaxy_bridge.py --list-categories # View tool details (inputs, outputs, parameters) python galaxy_bridge.py --tool-details toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy1 # Run a tool on Galaxy (requires GALAXY_API_KEY) python galaxy_bridge.py --run fastqc --input reads.fq.gz --output /tmp/qc_results # Demo mode (works offline, no API key needed) python galaxy_bridge.py --demo
Recommendation Engine
The
--recommend flag uses multi-signal scoring across 7 dimensions to rank tools:
| Signal | Max Points | Description |
|---|---|---|
| Section match | 30 | Tool's Galaxy category matches the detected task |
| Preferred tool | 20 | Tool is a known best-in-class for the task |
| Exact name match | 15 | Tool name appears in the query |
| Keyword match | 15 | Query words found in tool name/description |
| EDAM ontology | 10 | EDAM topic/operation IDs match the task |
| Format compatibility | 10 | Tool accepts the specified input format |
| Version maturity | 5 | Tools with more versions score higher (log scale) |
15 task categories are recognised: Quality Control, Read Mapping, Variant Calling, Variant Annotation, WES/WGS, RNA-seq, Metagenomics, Genome Assembly, Genome Annotation, Phylogenetics, ChIP-seq, Single-cell, Proteomics, Nanopore, BAM Processing.
8 workflow templates: WES Germline, WES Annotation, RNA-seq DE, Metagenomics Profiling, Variant Calling, ChIP-seq, Nanopore Assembly, Genome Assembly.
Demo
Running
--demo executes a simulated FastQC analysis using pre-cached results:
$ python galaxy_bridge.py --demo Galaxy Bridge — Demo Mode (offline) ==================================== Tool: FastQC v0.74+galaxy1 Input: demo/demo_reads.fq (bundled synthetic FASTQ, 1000 reads) Output: demo/fastqc_demo_output.html Result: PASS — Per base sequence quality ✓ PASS — Per sequence quality scores ✓ WARN — Per base sequence content (normal for Illumina) PASS — Sequence length distribution ✓ Reproducibility bundle written to demo/reproducibility/
Galaxy Tool Categories
The bridge indexes tools across all 56 Galaxy ToolShed categories, including:
- Sequence Analysis (~30 tools): FastQC, Trimmomatic, Cutadapt, fastp
- Metagenomics (~25 tools): Kraken2, MetaPhlAn, HUMAnN, QIIME2
- Variant Analysis (~25 tools): VEP, SnpSift, BCFtools, FreeBayes
- RNA (~20 tools): HISAT2, StringTie, featureCounts, DESeq2
- Proteomics (~15 tools): MaxQuant, SearchGUI, PeptideShaker
- Phylogenetics (~15 tools): IQ-TREE, RAxML, MAFFT, MUSCLE
- Genome Annotation (~15 tools): Prokka, Augustus, MAKER
- Assembly (~15 tools): SPAdes, Flye, Unicycler, MEGAHIT
- Single Cell (~10 tools): Scanpy, CellRanger, Seurat
- ChIP-seq/Epigenetics (~10 tools): MACS2, deepTools, DiffBind
- GWAS (~10 tools): PLINK, REGENIE, BOLT-LMM
- Nanopore (~10 tools): NanoPlot, Medaka, minimap2
Output Structure
output_dir/ ├── report.md # Analysis summary with methods and results ├── result.json # Machine-readable: tool ID, version, parameters, output paths ├── galaxy_outputs/ # Raw outputs downloaded from Galaxy │ ├── fastqc_report.html │ └── ... └── reproducibility/ ├── commands.sh # Galaxy API calls to reproduce ├── environment.yml # Tool versions and Galaxy server info └── checksums.sha256 # SHA-256 of all inputs and outputs
Dependencies
Required:
- Python 3.9+
- bioblend (Galaxy Python SDK)
Optional (for execution):
environment variable (default:GALAXY_URL
)https://usegalaxy.org
environment variable (register at usegalaxy.org)GALAXY_API_KEY
Safety
- Local-first search: Tool discovery uses the bundled
— no API calls neededgalaxy_catalog.json - API key optional: Demo mode and search work without credentials
- No data retention: Uploaded files are deleted from Galaxy after output retrieval
- Reproducibility: Every execution generates a full provenance bundle
- Disclaimer: ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions.
Integration with Bio Orchestrator
Triggers when: User mentions "galaxy", "usegalaxy", "tool shed", "run on galaxy", "NGS pipeline", or references a Galaxy tool ID.
Chaining partners:
— Galaxy VEP annotates variants → PharmGx generates dosage reportpharmgx-reporter
— Galaxy Kraken2 → ClawBio metagenomics profilingclaw-metagenomics
— Galaxy VCF processing → HEIM equity scoringequity-scorer
— Galaxy VEP/SnpSift ↔ ClawBio annotationvcf-annotator
Citations
- Galaxy Project — Afgan et al. (2018) Nucleic Acids Research
- BioBlend — Sloggett et al. (2013) Bioinformatics
- usegalaxy.org — Main Galaxy public server
- Galaxy ToolShed — Community tool repository