Claude-skill-registry curate-genome-assembly
Process genome assembly datasets for VEuPathDB resources
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/curate-genome-assembly" ~/.claude/skills/majiayu000-claude-skill-registry-curate-genome-assembly && rm -rf "$T"
skills/data/curate-genome-assembly/SKILL.mdGenome Assembly Dataset Curation
This skill guides processing of genome assembly datasets for VEuPathDB resources.
Prerequisites Check
This workflow requires the following repositories in
veupathdb-repos/:
- ApiCommonPresenters
- EbrcModelCommon
First, run the repository status check to verify repositories are present:
Note: this script is located in the skill directory
bash scripts/check-repos.sh ApiCommonPresenters EbrcModelCommon
If repositories are missing, the script will provide clone instructions.
Branch Confirmation: After verifying repositories exist, check their current branches and status using
git -C <path>, then confirm with the user before proceeding. Users typically create dataset-specific branches (see curator branching guidelines).
Example:
git -C veupathdb-repos/ApiCommonPresenters branch --show-current git -C veupathdb-repos/ApiCommonPresenters status -sb
Working Directory (Curation Workspace Directory)
IMPORTANT: All commands in this workflow must be run from your curation workspace directory (the directory that contains
veupathdb-repos/ as a subdirectory).
For Claude Code:
- DO NOT use
commands to change intocd
subdirectoriesveupathdb-repos/ - Use
for git operations in subdirectoriesgit -C <path> - Use absolute paths or relative paths from the curation workspace directory
- Example:
instead ofgit -C veupathdb-repos/ApiCommonPresenters statuscd veupathdb-repos/ApiCommonPresenters && git status
The workflow will create a
tmp/ subdirectory in the curation workspace directory for intermediate files.
Required Information
Gather the following before starting:
- VEuPathDB project - Valid projects listed in resources/valid-projects.json
- Assembly GenBank accession (e.g.,
including version)GCA_000988875.2
Workflow Overview
Step 1: Fetch Assembly Metadata from NCBI
Fetch assembly metadata from NCBI using the GenBank accession.
Command:
curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2/genome/accession/<ASSEMBLY_ACCESSION>/dataset_report" \ -H "Accept: application/json" > tmp/<ASSEMBLY_ACCESSION>_dataset_report.json
Detailed instructions: Step 1 - Fetch NCBI Metadata
Step 2: Fetch BioProject Metadata
Extract the BioProject accession from the assembly report and fetch additional details.
Command:
node scripts/fetch-bioproject.js <BIOPROJECT_ACCESSION>
This retrieves the BioProject title and description, saved to
tmp/<BIOPROJECT>_bioproject.json.
Detailed instructions: Step 2 - Fetch BioProject
Step 3: Fetch PubMed Data
Find and fetch publications for the genome assembly.
Command:
node scripts/fetch-pubmed.js <ASSEMBLY_ACCESSION>
Results saved to
tmp/<ASSEMBLY_ACCESSION>_pubmed.json.
Detailed instructions: Step 3 - Fetch PubMed
Step 4: Curate Contacts
Identify and curate contact entries for the genome submission.
Contact identification priority:
- Named submitter from assembly metadata
- Senior/last author from PubMed publications (if available)
- Curator judgment for additional contacts
Actions:
- Search existing contacts in
veupathdb-repos/EbrcModelCommon/Model/lib/xml/datasetPresenters/contacts/allContacts.xml - Create new contact entries if needed
- Present choices to curator for review
Detailed instructions: Step 4 - Curate Contacts
Step 5: Generate and Insert Presenter XML
Generate the datasetPresenter XML and insert it into the appropriate presenter file.
Command:
node scripts/generate-presenter-xml.js <ASSEMBLY_ACCESSION> <PROJECT> <PRIMARY_CONTACT_ID> [ADDITIONAL_CONTACT_IDS...]
Target file:
veupathdb-repos/ApiCommonPresenters/Model/lib/xml/datasetPresenters/<PROJECT>.xml
Detailed instructions: Step 5 - Update Presenter Files
Next Steps
After completing this workflow:
- Review generated XML for TODO fields that require curator input
- Commit changes to dataset branch (curator handles git operations)
- Create pull request for review (curator handles PR creation)
Resources
- Step 1 - Fetch NCBI Metadata
- Step 2 - Fetch BioProject
- Step 3 - Fetch PubMed
- Step 4 - Curate Contacts
- Step 5 - Update Presenter Files
- Curator Branching Guidelines
- Valid VEuPathDB Projects
Scripts
- Fetches BioProject metadata from NCBI (esearch + esummary)scripts/fetch-bioproject.js
- Fetches PubMed records linked to a BioProject (elink + esummary)scripts/fetch-pubmed.js
- Generates datasetPresenter XML from fetched metadatascripts/generate-presenter-xml.js
- Validates veupathdb-repos/ repository setup (synced from shared/)scripts/check-repos.sh