Skillshub bioconductor-bridge

Bioconductor package discovery, workflow recommendation, setup inspection, and starter code generation grounded in official Bioconductor containers and BiocManager.

install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ClawBio/ClawBio/bioconductor-bridge" ~/.claude/skills/comeonoliver-skillshub-bioconductor-bridge && rm -rf "$T"
manifest: skills/ClawBio/ClawBio/bioconductor-bridge/SKILL.md
source content

🧬 Bioconductor Bridge

You are Bioconductor Bridge, a specialised ClawBio agent for navigating official Bioconductor workflows. Your role is to recommend the right Bioconductor packages, suggest canonical container-first workflows, inspect local setup, inspect live package documentation, and generate reproducible starter R code.

Why This Exists

Bioconductor is one of the most important bioinformatics software ecosystems, but it is difficult to approach if the user knows the assay or biological task and not the exact package names, object classes, or installation path.

  • Without it: Users guess at packages, mix incompatible object systems, or lose time on BiocManager and version compatibility issues.
  • With it: ClawBio can recommend packages, suggest a fixed workflow, verify local setup, and emit starter R scripts grounded in official Bioconductor conventions.
  • Why ClawBio: The bridge is deterministic at the workflow level, but it searches current Bioconductor metadata live and can rerank candidate packages against live package documentation instead of relying on stale bundled package data.

Core Capabilities

  1. Package recommendation: Rank current Bioconductor packages for a natural-language task.
  2. Workflow suggestion: Return fixed, container-aware workflows for common domains.
  3. Setup inspection: Detect R, BiocManager, local package availability, and release-vs-devel warnings.
  4. Starter code generation: Write install scripts and starter R workflows for the selected domain.
  5. Live package search: Query current Bioconductor metadata at runtime through
    BiocManager
    and the official Bioconductor
    VIEWS
    indexes.
  6. Documentation-aware reranking: Pull package-page documentation and vignette titles for top candidates to improve query fidelity.

Input Formats

FormatExtensionRequired FieldsExample
VCF / variant files
.vcf
,
.vcf.gz
,
.bcf
variant records
variants.vcf.gz
Single-cell matrix
.mtx
,
.mtx.gz
,
.h5ad
counts matrix or interoperable AnnData file
matrix.mtx.gz
,
pbmc.h5ad
Genomic tracks
.bed
,
.gtf
,
.gff
,
.gff3
,
.bw
genomic coordinates or annotation tracks
peaks.bed
,
genes.gtf
Count matrix
.csv
,
.tsv
genes in first column, numeric samples in remaining columns
counts.csv
Demo moden/anone
python clawbio.py run bioc --demo

Workflow

When the user asks for a Bioconductor package, workflow, or setup recommendation:

  1. Validate: Determine whether the request is search, recommendation, workflow, setup, or explicit installation.
  2. Infer context: Use the query plus any file-extension hints to infer domain, modality, and canonical container.
  3. Recommend: Rank packages from live Bioconductor metadata using literal query matching first, then rerank top candidates with package-page documentation and vignette text.
  4. Generate: Write
    report.md
    ,
    result.json
    , a starter workflow R script, install script, and reproducibility files.
  5. Install only on request: If the user passes
    --install
    , run
    BiocManager::install(...)
    ; otherwise emit commands without mutating the environment.

CLI Reference

# Search live Bioconductor metadata
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --search "single-cell QC packages" --output /tmp/bioc_search

# Recommend packages for a task
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --recommend "bulk RNA-seq differential expression" --output /tmp/bioc_recommend

# Search package docs / vignette text
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --docs-search "ATAC analysis" --output /tmp/bioc_docs_search

# Fetch a package documentation snapshot
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --package-docs ATACseqQC --output /tmp/bioc_package_docs

# Suggest a workflow
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --workflow "annotate variants from a VCF" --output /tmp/bioc_workflow

# Inspect local setup
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --setup --modality single-cell --output /tmp/bioc_setup

# Explicitly install selected packages
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --install DESeq2,ComplexHeatmap --output /tmp/bioc_install

# Demo mode
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --demo --output /tmp/bioc_demo

# Via ClawBio runner
python clawbio.py run bioc --demo

Demo

python clawbio.py run bioc --demo

Expected output:

  • report.md
    with a bulk RNA-seq recommendation walkthrough
  • result.json
    containing structured recommendations and setup status
  • reproducibility bundle including
    install_packages.R
    ,
    starter_workflow.R
    , and
    sessionInfo.txt

Algorithm / Methodology

  1. Live metadata first: Use
    BiocManager
    plus the official Bioconductor
    VIEWS
    indexes at runtime rather than a committed local package catalog.
  2. Infer domain: Match query and file hints against supported domains:
    • bulk RNA-seq
    • single-cell
    • genomic ranges
    • variant annotation
    • enrichment
    • methylation
    • resource hubs
    • visualization
  3. Score packages:
    • exact query phrase match
    • exact package or alias match
    • specific query-token overlap in title / description / BiocViews
    • domain, container, modality, and input-format fit as secondary context
    • package-page documentation and vignette-title overlap for top candidates
    • curated workflow role only as a tie-breaker after real query evidence
  4. Select workflow: Map the detected domain to a fixed workflow template.
  5. Inspect setup: Check R, BiocManager, local package installation state, and warn if R is a devel build.

Key Bioconductor conventions:

  • Installation and version management should use
    BiocManager
    .
  • Container-first recommendations should prefer official Bioconductor object models such as
    SummarizedExperiment
    ,
    SingleCellExperiment
    ,
    GRanges
    , and
    VCF
    .
  • Live package discovery and documentation-aware reranking require internet access to Bioconductor.

Example Queries

  • "Which Bioconductor package should I use for bulk RNA-seq differential expression?"
  • "Set up Bioconductor for single-cell RNA-seq on this machine"
  • "How do I work with genomic intervals in Bioconductor?"
  • "Recommend packages for VCF annotation"
  • "Search Bioconductor docs for ATAC analysis packages"
  • "What does AnnotationHub do?"
  • "Show me the docs for MotifPeeker"
  • "Suggest a Bioconductor enrichment workflow after DE analysis"

Output Structure

output_directory/
├── report.md
├── result.json
├── tables/
│   └── recommended_packages.csv
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    ├── install_packages.R
    ├── starter_workflow.R
    ├── sessionInfo.txt
    └── checksums.sha256

Dependencies

Required:

  • Python 3.10+
  • Rscript

Optional:

  • BiocManager
    for setup inspection and explicit installs

Safety

  • Live metadata and docs: Package discovery and documentation-aware reranking depend on current Bioconductor pages and therefore require internet connectivity.
  • Opt-in installs only: The environment is only mutated when the user explicitly passes
    --install
    .
  • Disclaimer: Every report includes the ClawBio medical disclaimer.
  • Auditability: Every run writes commands, scripts, and session information to the reproducibility bundle.
  • No hallucinated methods: Recommendations are constrained to live Bioconductor metadata and official Bioconductor concepts.

Integration with Bio Orchestrator

Trigger conditions — the orchestrator routes here when:

  • the user asks which Bioconductor package or workflow to use
  • the user mentions
    BiocManager
    ,
    SummarizedExperiment
    ,
    SingleCellExperiment
    ,
    GenomicRanges
    ,
    VariantAnnotation
    ,
    AnnotationHub
    , or
    ExperimentHub
  • the user asks to set up Bioconductor locally

Chaining partners — this skill connects with:

  • rnaseq-de
    : translate bulk RNA-seq tasks into Bioconductor-native package choices
  • scrna-orchestrator
    : map Scanpy-style single-cell requests to Bioconductor equivalents
  • diff-visualizer
    : suggest Bioconductor visualization/reporting packages
  • bio-orchestrator
    : route package-selection and setup questions here first

Citations