Medical-research-skills cosmic-database
Access COSMIC to download mutation datasets, query Cancer Gene Census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/cosmic-database" ~/.claude/skills/aipoch-medical-research-skills-cosmic-database && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/cosmic-database/SKILL.mdsource content
COSMIC Database Skill
When to Use
- Use this skill when you need access cosmic to download mutation datasets, query cancer gene census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources in a reproducible workflow.
- Use this skill when a evidence insight task needs a packaged method instead of ad-hoc freeform output.
- Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
- Use this skill when
is the most direct path to complete the request.scripts/download_cosmic.py - Use this skill when you need the
package behavior rather than a generic answer.cosmic-database
Key Features
- Scope-focused workflow aligned to: Access COSMIC to download mutation datasets, query Cancer Gene Census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources.
- Packaged executable path(s):
.scripts/download_cosmic.py - Reference material available in
for task-specific guidance.references/ - Structured execution path designed to keep outputs consistent and reviewable.
Dependencies
:Python
. Repository baseline for current packaged skills.3.10+
:Third-party packages
. Add pinned versions if this skill needs stricter environment control.not explicitly version-pinned in this skill package
Example Usage
cd "20260316/scientific-skills/Evidence Insight/cosmic-database" python -m py_compile scripts/download_cosmic.py python scripts/download_cosmic.py --help
Example run plan:
- Confirm the user input, output path, and any required config values.
- Edit the in-file
block or documented parameters if the script uses fixed settings.CONFIG - Run
with the validated inputs.python scripts/download_cosmic.py - Review the generated output and return the final artifact with any assumptions called out.
Implementation Details
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface:
.scripts/download_cosmic.py - Reference guidance:
contains supporting rules, prompts, or checklists.references/ - Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
1. When to Use
Use this skill when you need COSMIC data for tasks such as:
- Downloading COSMIC mutation exports (TSV/VCF) for cohort or sample-level variant analysis.
- Retrieving Cancer Gene Census (CGC) gene lists for oncogene/tumor suppressor annotation and prioritization.
- Working with COSMIC mutational signatures (SBS/DBS/ID) for signature attribution or comparative studies.
- Accessing additional COSMIC genomics datasets (e.g., copy number, fusions, expression) for multi-omics integration.
- Building reproducible pipelines that programmatically fetch the latest COSMIC releases.
2. Key Features
- Authenticated downloads of COSMIC files (e.g., TSV/VCF; often GZIP-compressed).
- Cancer Gene Census access for curated cancer gene information.
- Mutational signature retrieval including SBS, DBS, and ID signatures.
- Support for multiple COSMIC dataset types, such as mutation, copy number, fusion, and expression resources.
- Pandas-friendly workflow for loading and filtering downloaded tables.
3. Dependencies
- Python 3.9+
>= 1.5pandas
>= 2.28requests
External requirements:
- A registered COSMIC account at https://cancer.sanger.ac.uk/cosmic
- Valid COSMIC login credentials (email + password)
4. Example Usage
The following example downloads a COSMIC file and loads it into a pandas DataFrame.
from scripts.download_cosmic import download_cosmic_file import pandas as pd # 1) Download a COSMIC dataset (example path; adjust to your target release/build) download_cosmic_file( email="user@email.com", password="pwd", filepath="GRCh38/cosmic/latest/CosmicMutantExport.tsv.gz" ) # 2) Load the downloaded GZIP-compressed TSV df = pd.read_csv( "CosmicMutantExport.tsv.gz", sep="\t", compression="gzip" ) # 3) Example analysis: filter by gene symbol (column name depends on the dataset) # df_gene = df[df["Gene name"] == "TP53"]
For dataset field definitions and COSMIC file specifics, see:
references/cosmic_data_reference.md.
5. Implementation Details
- Authentication: Downloads require COSMIC account credentials (email/password) and are performed via an authenticated HTTP session.
- File targeting: The
parameter specifies the COSMIC resource path (e.g., genome build such asfilepath
, release channel such asGRCh38
, and the target filename).latest - Data format: Many COSMIC exports are distributed as GZIP-compressed TSV (and sometimes VCF). Use
for TSVpandas.read_csv(..., sep="\t", compression="gzip")
files..gz - Typical workflow:
- Download the desired COSMIC export.
- Load into a DataFrame (or parse VCF with an appropriate library if needed).
- Filter/aggregate by gene, tumor type, sample, or signature depending on the analysis goal.