install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Genomics/long-read-sequencing/medaka-polishing" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-medaka-polishing && rm -rf "$T"
manifest:
Skills/Genomics/long-read-sequencing/medaka-polishing/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-longread-medaka description: Polish assemblies and call variants from Oxford Nanopore data using medaka. Uses neural networks trained on specific basecaller versions. Use when improving ONT-only assemblies or calling variants from Nanopore data without short-read polishing. tool_type: cli primary_tool: medaka measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Medaka Polishing and Variant Calling
Basic Consensus Polishing
# Polish assembly with medaka medaka_consensus -i reads.fastq.gz \ -d draft_assembly.fa \ -o medaka_output \ -t 4 \ -m r1041_e82_400bps_sup_v5.0.0
Variant Calling (Haploid)
# Call variants against reference (medaka v2.0+) medaka_variant \ -i reads.fastq.gz \ -r reference.fa \ -o output_dir \ -m r1041_e82_400bps_sup_v5.0.0
Note: Diploid variant calling has been deprecated in medaka v2.0. For diploid samples, use Clair3 instead.
Step-by-Step Workflow (medaka v2.0+)
# 1. Align reads to reference/draft minimap2 -ax map-ont reference.fa reads.fastq.gz | \ samtools sort -o aligned.bam samtools index aligned.bam # 2. Run neural network inference medaka inference aligned.bam consensus.hdf \ --model r1041_e82_400bps_sup_v5.0.0 \ --threads 2 # >2 threads has poor scaling # 3. Create consensus sequence from probabilities medaka sequence consensus.hdf reference.fa polished.fa # 4. Call variants from probabilities medaka vcf reference.fa consensus.hdf variants.vcf
List Available Models
# See all available models medaka tools list_models # Models are named: # r{pore}_{chemistry}_{speed}bps_{accuracy}_{version} # e.g., r1041_e82_400bps_sup_v5.0.0
Common Models
| Model | Description |
|---|---|
| r1041_e82_400bps_sup_v5.0.0 | R10.4.1, E8.2, SUP basecalling |
| r1041_e82_400bps_hac_v5.0.0 | R10.4.1, E8.2, HAC basecalling |
| r941_min_sup_g507 | R9.4.1, MinION, SUP |
| r941_min_hac_g507 | R9.4.1, MinION, HAC |
Choose Model Based on Basecaller
# Check which basecaller was used in your data # Then select matching model # For Guppy/Dorado SUP basecalling on R10.4.1 medaka_consensus -m r1041_e82_400bps_sup_v5.0.0 ... # For HAC basecalling medaka_consensus -m r1041_e82_400bps_hac_v5.0.0 ...
Polish Region Only
# Polish specific region (medaka v2.0+) medaka inference aligned.bam consensus.hdf \ --model r1041_e82_400bps_sup_v5.0.0 \ --region chr1:1000000-2000000
Multiple Rounds of Polishing
# First round medaka_consensus -i reads.fastq.gz -d draft.fa -o round1 -m model # Second round (diminishing returns, usually not needed) medaka_consensus -i reads.fastq.gz -d round1/consensus.fasta -o round2 -m model
Call Variants from Existing BAM
# If you already have aligned BAM (medaka v2.0+) medaka inference aligned.bam consensus.hdf --model r1041_e82_400bps_sup_v5.0.0 medaka vcf reference.fa consensus.hdf variants.vcf
Filter VCF Output
# Filter by quality bcftools filter -i 'QUAL>20' variants.vcf > variants.filtered.vcf # Get high-confidence calls bcftools view -i 'FILTER="PASS"' variants.vcf > variants.pass.vcf
Output Files
| File | Description |
|---|---|
| consensus.fasta | Polished sequence |
| consensus.hdf | Neural network outputs |
| variants.vcf | Variant calls |
| calls_to_draft.bam | Alignments used |
Key Parameters
| Parameter | Description |
|---|---|
| -i | Input reads (FASTQ) |
| -d | Draft assembly/reference |
| -o | Output directory |
| -m | Model name |
| -t | Threads |
| -b | Batch size (GPU memory) |
| --region | Specific region to process |
GPU Acceleration
# Enable GPU (if available) medaka_consensus -i reads.fastq.gz -d draft.fa -o output \ -m r1041_e82_400bps_sup_v5.0.0 \ -b 100 \ # Increase batch size for GPU -t 4
Related Skills
- long-read-alignment - Generate input alignments
- structural-variants - Find SVs from polished assembly
- variant-calling/variant-calling - Short-read variant calling comparison