OpenClaw-Medical-Skills bio-read-alignment-hisat2-alignment

<!--

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-read-alignment-hisat2-alignment" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-read-alignment-hisat2-alignment && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-read-alignment-hisat2-alignment" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-read-alignment-hisat2-alignment && rm -rf "$T"
manifest: skills/bio-read-alignment-hisat2-alignment/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-read-alignment-hisat2-alignment description: Align RNA-seq reads with HISAT2, a memory-efficient splice-aware aligner. Use when STAR's memory requirements are too high or for general RNA-seq alignment. tool_type: cli primary_tool: HISAT2 measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

HISAT2 RNA-seq Alignment

Build Index

# Basic index (no annotation)
hisat2-build -p 8 reference.fa hisat2_index

# Index with splice sites and exons (recommended)
hisat2_extract_splice_sites.py annotation.gtf > splice_sites.txt
hisat2_extract_exons.py annotation.gtf > exons.txt

hisat2-build -p 8 \
    --ss splice_sites.txt \
    --exon exons.txt \
    reference.fa hisat2_index

Basic Alignment

# Paired-end reads
hisat2 -p 8 -x hisat2_index \
    -1 reads_1.fq.gz -2 reads_2.fq.gz \
    -S aligned.sam

# Single-end reads
hisat2 -p 8 -x hisat2_index \
    -U reads.fq.gz \
    -S aligned.sam

Direct to Sorted BAM

# Pipe to samtools
hisat2 -p 8 -x hisat2_index \
    -1 r1.fq.gz -2 r2.fq.gz | \
    samtools sort -@ 4 -o aligned.sorted.bam -

samtools index aligned.sorted.bam

Stranded Libraries

# Forward stranded (e.g., Ligation)
hisat2 -p 8 -x hisat2_index \
    --rna-strandness FR \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

# Reverse stranded (e.g., dUTP, TruSeq - most common)
hisat2 -p 8 -x hisat2_index \
    --rna-strandness RF \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

# Single-end stranded
hisat2 -p 8 -x hisat2_index \
    --rna-strandness F \    # or R for reverse
    -U reads.fq.gz -S aligned.sam

Novel Splice Junction Discovery

# Output novel splice junctions
hisat2 -p 8 -x hisat2_index \
    --novel-splicesite-outfile novel_splices.txt \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

# Use known + novel junctions for subsequent alignments
hisat2 -p 8 -x hisat2_index \
    --novel-splicesite-infile novel_splices.txt \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Two-Pass Alignment (Manual)

# Pass 1: Discover junctions from all samples
for r1 in *_R1.fq.gz; do
    r2=${r1/_R1/_R2}
    base=$(basename $r1 _R1.fq.gz)
    hisat2 -p 8 -x hisat2_index \
        --novel-splicesite-outfile ${base}_splices.txt \
        -1 $r1 -2 $r2 -S /dev/null
done

# Combine and filter junctions
cat *_splices.txt | sort -u > combined_splices.txt

# Pass 2: Realign with all junctions
for r1 in *_R1.fq.gz; do
    r2=${r1/_R1/_R2}
    base=$(basename $r1 _R1.fq.gz)
    hisat2 -p 8 -x hisat2_index \
        --novel-splicesite-infile combined_splices.txt \
        -1 $r1 -2 $r2 | \
        samtools sort -@ 4 -o ${base}.sorted.bam -
done

Read Group Information

hisat2 -p 8 -x hisat2_index \
    --rg-id sample1 \
    --rg SM:sample1 \
    --rg PL:ILLUMINA \
    --rg LB:lib1 \
    -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Downstream Quantification

# Output name-sorted BAM for htseq-count
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
    samtools sort -n -@ 4 -o aligned.namesorted.bam -

# Or coordinate-sorted for featureCounts
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
    samtools sort -@ 4 -o aligned.sorted.bam -

Key Parameters

ParameterDefaultDescription
-p1Number of threads
-x-Index basename
--rna-strandnessunstrandedFR/RF/F/R
--dtaoffDownstream transcriptome assembly
--dta-cufflinksoffFor Cufflinks
--min-intronlen20Minimum intron length
--max-intronlen500000Maximum intron length
-k5Max alignments to report

For StringTie/Cufflinks

# Use --dta for StringTie
hisat2 -p 8 -x hisat2_index \
    --dta \
    -1 r1.fq.gz -2 r2.fq.gz | \
    samtools sort -@ 4 -o aligned.sorted.bam -

Alignment Summary

# HISAT2 prints summary to stderr
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam 2> summary.txt

Example:

50000000 reads; of these:
  50000000 (100.00%) were paired; of these:
    2500000 (5.00%) aligned concordantly 0 times
    45000000 (90.00%) aligned concordantly exactly 1 time
    2500000 (5.00%) aligned concordantly >1 times
95.00% overall alignment rate

Memory Comparison

AlignerHuman Genome Memory
STAR~30GB
HISAT2~8GB

Related Skills

  • read-alignment/star-alignment - Alternative with more features
  • rna-quantification/featurecounts-counting - Count aligned reads
  • rna-quantification/alignment-free-quant - Skip alignment entirely
  • differential-expression/deseq2-basics - Downstream DE analysis
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->