Claude-skill-registry long-read-sequencing-agent

name: long-read-sequencing-agent

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/long-read-sequencing-agent" ~/.claude/skills/majiayu000-claude-skill-registry-long-read-sequencing-agent && rm -rf "$T"
manifest: skills/data/long-read-sequencing-agent/SKILL.md
source content

---name: long-read-sequencing-agent description: AI-powered analysis of long-read sequencing data (PacBio, ONT) for structural variant detection, isoform discovery, epigenetic modifications, and de novo assembly. license: MIT metadata: author: AI Group version: "1.0.0" created: "2026-01-19" compatibility:

  • system: Python 3.10+ allowed-tools:
  • run_shell_command
  • read_file
  • write_file

keywords:

  • long-read-sequencing-agent
  • automation
  • biomedical measurable_outcome: execute task with >95% success rate. ---"

Long-Read Sequencing Agent

The Long-Read Sequencing Agent provides comprehensive AI-driven analysis of long-read sequencing data from PacBio (HiFi) and Oxford Nanopore (ONT) platforms. It enables structural variant detection, full-length isoform discovery, base modification calling, and de novo genome assembly.

When to Use This Skill

  • When detecting structural variants (SVs) missed by short-read sequencing.
  • To characterize full-length transcript isoforms and alternative splicing.
  • For detecting DNA base modifications (5mC, 6mA) directly from sequencing.
  • When performing de novo genome assembly for complex regions.
  • To phase variants and generate fully-resolved haplotypes.

Core Capabilities

  1. Structural Variant Detection: AI-enhanced SV calling for deletions, insertions, inversions, translocations, and complex rearrangements.

  2. Isoform Discovery: Full-length transcript sequencing for novel isoform and fusion detection.

  3. Base Modification Calling: Direct detection of DNA methylation (5mC, 5hmC, 6mA) from native sequencing.

  4. Haplotype Phasing: Phase-resolved assemblies and variant calling.

  5. De Novo Assembly: Assemble complex genomic regions (centromeres, telomeres, HLA).

  6. Error Correction: AI-based error correction for long-read data.

Platform Comparison

FeaturePacBio HiFiONT (R10+)
Read length15-25 kb>100 kb possible
Accuracy>99.9% (HiFi)>99% (Q20+)
Base mods5mC, 6mA5mC, 5hmC, 6mA, more
Throughput20-40 Gb/run100+ Gb/run
CostHigherLower

Workflow

  1. Input: Long-read FASTQ/BAM files from PacBio or ONT sequencing.

  2. QC & Alignment: Filter reads by quality, align to reference genome.

  3. SV Calling: Detect structural variants using Sniffles, PBSV, or CuteSV.

  4. Isoform Analysis: Identify full-length isoforms with IsoSeq or FLAIR.

  5. Modification Calling: Extract base modifications from signal data.

  6. Phasing: Generate haplotype-resolved variant calls.

  7. Output: SV calls, isoform annotations, modification maps, phased assemblies.

Example Usage

User: "Analyze this PacBio HiFi dataset for structural variants and DNA methylation in a cancer sample."

Agent Action:

python3 Skills/Genomics/Long_Read_Sequencing_Agent/longread_analyzer.py \
    --input cancer_hifi.bam \
    --platform pacbio_hifi \
    --reference GRCh38.fa \
    --sv_calling sniffles2 \
    --methylation true \
    --phasing true \
    --output longread_results/

Structural Variant Detection

ToolPlatformSV TypesStrengths
Sniffles2BothAll SV typesSpeed, accuracy
PBSVPacBioAll SV typesHiFi optimized
CuteSVBothAll SV typesSensitivity
SAVANABothSomatic SVsCancer-specific
JasmineBothPopulation SVMulti-sample

SV Size Spectrum:

  • Small SVs: 50-500 bp (often missed by short-read)
  • Medium SVs: 500 bp - 10 kb
  • Large SVs: >10 kb
  • Complex SVs: Multi-breakpoint events

Isoform Analysis

Full-Length Transcript Sequencing:

  • Capture full gene structures (5' to 3')
  • Detect novel exons and splice junctions
  • Identify gene fusions
  • Quantify isoform expression

Tools:

  • IsoSeq3 (PacBio): Clustering and polishing
  • FLAIR (Both): Isoform discovery and quantification
  • StringTie2 (Both): Guided assembly
  • SQANTI3: Isoform classification and QC

Base Modification Detection

ModificationDetectionBiological Role
5mCBoth platformsGene silencing
5hmCONT primarilyActive demethylation
6mABoth platformsBacterial/mitochondrial
BrdUONTReplication timing

Resolution: Single-base, single-molecule, strand-specific

AI/ML Components

Error Correction:

  • DeepConsensus (PacBio): Transformer for HiFi calling
  • Medaka (ONT): Neural network polishing
  • PEPPER-Margin-DeepVariant: AI variant calling

SV Classification:

  • Deep learning for complex SV characterization
  • ML filters for false positive reduction
  • Multi-sample joint calling

Clinical Applications

  1. Cancer Genomics: Detect SVs driving oncogene activation
  2. Rare Disease: Resolve variants in complex regions
  3. Pharmacogenomics: Phase CYP450 star alleles
  4. HLA Typing: Full-resolution typing for transplant
  5. Repeat Expansions: Size tandem repeat diseases

Prerequisites

  • Python 3.10+
  • Sniffles2, PBSV, CuteSV for SV calling
  • minimap2/pbmm2 for alignment
  • High-memory system (64GB+ recommended)

Related Skills

  • Long_Read_SV_Caller - For specialized SV analysis
  • Variant_Interpretation - For variant annotation
  • Epigenomics_MethylGPT_Agent - For methylation analysis

Output Files

OutputFormatContent
SVsVCFStructural variants
MethylationBED/bigWigModification calls
IsoformsGTFTranscript annotations
PhasedVCFHaplotype-resolved variants
AssemblyFASTAAssembled contigs

Author

AI Group - Biomedical AI Platform