OpenClaw-Medical-Skills bio-read-alignment-star-alignment

<!--

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-read-alignment-star-alignment" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-read-alignment-star-alignment && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-read-alignment-star-alignment" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-read-alignment-star-alignment && rm -rf "$T"
manifest: skills/bio-read-alignment-star-alignment/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-read-alignment-star-alignment description: Align RNA-seq reads with STAR (Spliced Transcripts Alignment to a Reference). Supports two-pass mode for novel splice junction discovery. Use when aligning RNA-seq data requiring splice-aware alignment. tool_type: cli primary_tool: STAR measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

STAR RNA-seq Alignment

Generate Genome Index

# Basic index generation
STAR --runMode genomeGenerate \
    --runThreadN 8 \
    --genomeDir star_index/ \
    --genomeFastaFiles reference.fa \
    --sjdbGTFfile annotation.gtf \
    --sjdbOverhang 100    # Read length - 1

Index with Specific Read Length

# For 150bp reads, use sjdbOverhang=149
STAR --runMode genomeGenerate \
    --runThreadN 8 \
    --genomeDir star_index_150/ \
    --genomeFastaFiles reference.fa \
    --sjdbGTFfile annotation.gtf \
    --sjdbOverhang 149

Basic Alignment

# Paired-end alignment
STAR --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn reads_1.fq.gz reads_2.fq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate

Single-End Alignment

STAR --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn reads.fq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate

Two-Pass Mode

# Two-pass mode for better novel junction detection
STAR --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn r1.fq.gz r2.fq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate \
    --twopassMode Basic

Quantification Mode

# Output gene counts (like featureCounts)
STAR --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn r1.fq.gz r2.fq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate \
    --quantMode GeneCounts

Output:

sample_ReadsPerGene.out.tab
with columns:

  1. Gene ID
  2. Unstranded counts
  3. Forward strand counts
  4. Reverse strand counts

ENCODE Options

# ENCODE recommended settings
STAR --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn r1.fq.gz r2.fq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate \
    --outSAMunmapped Within \
    --outSAMattributes NH HI AS NM MD \
    --outFilterType BySJout \
    --outFilterMultimapNmax 20 \
    --outFilterMismatchNmax 999 \
    --outFilterMismatchNoverReadLmax 0.04 \
    --alignIntronMin 20 \
    --alignIntronMax 1000000 \
    --alignMatesGapMax 1000000 \
    --alignSJoverhangMin 8 \
    --alignSJDBoverhangMin 1

Fusion Detection

# For chimeric/fusion detection
STAR --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn r1.fq.gz r2.fq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate \
    --chimSegmentMin 12 \
    --chimJunctionOverhangMin 8 \
    --chimOutType Junctions WithinBAM SoftClip \
    --chimMainSegmentMultNmax 1

Output Files

FileDescription
*Aligned.sortedByCoord.out.bamSorted BAM file
*Log.final.outAlignment summary statistics
*Log.outDetailed log
*SJ.out.tabSplice junctions
*ReadsPerGene.out.tabGene counts (if --quantMode)
*Chimeric.out.junctionFusion candidates (if chimeric)

Memory Requirements

# Reduce memory for limited systems
STAR --genomeLoad NoSharedMemory \
    --limitBAMsortRAM 10000000000 \  # 10GB for sorting
    ...

# For very large genomes, limit during index generation
STAR --runMode genomeGenerate \
    --limitGenomeGenerateRAM 31000000000 \  # 31GB
    ...

Shared Memory Mode

# Load genome into shared memory (for multiple samples)
STAR --genomeLoad LoadAndExit --genomeDir star_index/

# Run alignments (faster startup)
STAR --genomeLoad LoadAndKeep --genomeDir star_index/ ...

# Remove from memory when done
STAR --genomeLoad Remove --genomeDir star_index/

Key Parameters

ParameterDefaultDescription
--runThreadN1Number of threads
--sjdbOverhang100Read length - 1
--outFilterMultimapNmax10Max multi-mapping
--alignIntronMax0Max intron size
--outFilterMismatchNmax10Max mismatches
--outSAMtypeSAMOutput format
--quantMode-GeneCounts for counting
--twopassModeNoneBasic for two-pass

Related Skills

  • rna-quantification/featurecounts-counting - Alternative counting
  • rna-quantification/alignment-free-quant - Salmon/kallisto alternative
  • differential-expression/deseq2-basics - Downstream DE analysis
  • read-qc/fastp-workflow - Preprocess reads
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->