Medical-research-skills clinvar-database
Utilities for querying the NCBI ClinVar database to retrieve variant records, clinical significance, and phenotype relationships; use when searching variants by gene/condition/significance, interpreting Pathogenic/Benign/VUS classifications, or annotating VCF files with ClinVar annotations.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/clinvar-database" ~/.claude/skills/aipoch-medical-research-skills-clinvar-database && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/clinvar-database/SKILL.mdsource content
When to Use
- You need to find ClinVar variant records by gene, condition/phenotype, or clinical significance (e.g., BRCA1 + pathogenic).
- You want to interpret a variant’s clinical significance (Pathogenic/Benign/VUS) and review status for reporting or triage.
- You need to annotate a VCF with ClinVar identifiers and interpretation fields as part of a variant annotation pipeline.
- You want to perform bulk retrieval of ClinVar datasets for offline analysis or periodic database refresh.
- You are building a workflow that relies on NCBI E-utilities to programmatically query ClinVar.
Key Features
- ClinVar search via NCBI E-utilities using flexible query terms (gene/condition/significance).
- Clinical interpretation retrieval, including clinical significance categories and review status.
- VCF annotation workflow integration (leveraging
) to enrich variants with ClinVar data.bcftools - Bulk data access through ClinVar FTP downloads for large-scale processing.
- Reference documentation:
- API details:
references/api_reference.md - Clinical significance definitions:
references/clinical_significance.md
- API details:
Dependencies
- Python
>=3.8
(Python package)requests
(system dependency; required for VCF annotation)bcftools
(Python package; optional for downstream data processing)pandas
Example Usage
1) Search ClinVar for pathogenic variants in a gene
python scripts/search.py --term "BRCA1[gene] AND pathogenic[CLNSIG]"
2) Annotate a VCF with ClinVar data
python scripts/annotate.py --input input.vcf --output annotated.vcf
Implementation Details
-
Search (
)scripts/search.py- Uses NCBI E-utilities to query ClinVar with a user-provided
.--term - The query term supports ClinVar/Entrez syntax (e.g.,
,BRCA1[gene]
) to filter by gene and clinical significance.pathogenic[CLNSIG] - Output is expected to include matching ClinVar records/identifiers suitable for follow-up interpretation or annotation.
- Uses NCBI E-utilities to query ClinVar with a user-provided
-
Interpretation fields
- Clinical significance values (e.g., Pathogenic/Benign/VUS) and related interpretation guidance follow ClinVar conventions; see
.references/clinical_significance.md - Review status (e.g., level of evidence/review) is retrieved alongside significance where available.
- Clinical significance values (e.g., Pathogenic/Benign/VUS) and related interpretation guidance follow ClinVar conventions; see
-
VCF annotation (
)scripts/annotate.py- Takes an input VCF (
) and produces an annotated VCF (--input
).--output - Integrates with
to add ClinVar-derived annotations to variant records (requiresbcftools
installed and available onbcftools
).PATH - Designed for pipeline use: deterministic input/output files and command-line parameters.
- Takes an input VCF (
-
Bulk downloads
- Supports obtaining ClinVar datasets via FTP for offline indexing/annotation workflows.
- Recommended when you need reproducible, high-throughput annotation without repeated API calls.