install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-small-rna-seq-smrna-preprocessing" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-small-rna-seq-smrna-preprocessin && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-small-rna-seq-smrna-preprocessing" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-small-rna-seq-smrna-preprocessin && rm -rf "$T"
manifest:
skills/bio-small-rna-seq-smrna-preprocessing/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-small-rna-seq-smrna-preprocessing description: Preprocess small RNA sequencing data with adapter trimming and size selection optimized for miRNA, piRNA, and other small RNAs. Use when preparing small RNA-seq reads for downstream quantification or discovery analysis. tool_type: cli primary_tool: cutadapt measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Small RNA Preprocessing
Adapter Trimming with Cutadapt
Small RNA libraries have specific 3' adapters that must be removed:
# Standard Illumina TruSeq small RNA adapter cutadapt \ -a TGGAATTCTCGGGTGCCAAGG \ -m 18 \ -M 30 \ --discard-untrimmed \ -o trimmed.fastq.gz \ input.fastq.gz # -a: 3' adapter sequence # -m 18: Minimum length (miRNAs are 18-25 nt) # -M 30: Maximum length (exclude longer fragments) # --discard-untrimmed: Remove reads without adapter (likely not small RNA)
Common Small RNA Adapters
| Kit | 3' Adapter Sequence |
|---|---|
| Illumina TruSeq | TGGAATTCTCGGGTGCCAAGG |
| NEBNext | AGATCGGAAGAGCACACGTCT |
| QIAseq | AACTGTAGGCACCATCAAT |
| Lexogen | TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC |
Size Selection
# Filter by length after trimming cutadapt \ -a TGGAATTCTCGGGTGCCAAGG \ -m 18 -M 26 \ -o mirna_length.fastq.gz \ input.fastq.gz # miRNA: 18-26 nt (typically 21-23 nt) # piRNA: 26-32 nt # snoRNA: variable, typically longer
Quality Trimming
# Trim low-quality bases from 3' end before adapter removal cutadapt \ -q 20 \ -a TGGAATTCTCGGGTGCCAAGG \ -m 18 \ -o trimmed.fastq.gz \ input.fastq.gz
Using fastp for Small RNA
# fastp with small RNA settings fastp \ --in1 input.fastq.gz \ --out1 trimmed.fastq.gz \ --adapter_sequence TGGAATTCTCGGGTGCCAAGG \ --length_required 18 \ --length_limit 30 \ --html report.html # Note: fastp auto-detects adapters but specifying is more reliable
Collapse Identical Reads
For small RNAs, collapsing identical sequences reduces computation:
# Using seqkit seqkit rmdup -s trimmed.fastq.gz -o collapsed.fasta # Using fastx_toolkit (legacy) fastx_collapser -i trimmed.fastq -o collapsed.fasta
Python Preprocessing
import gzip from collections import Counter def collapse_reads(fastq_path): '''Collapse identical sequences and count occurrences''' counts = Counter() with gzip.open(fastq_path, 'rt') as f: while True: header = f.readline() if not header: break seq = f.readline().strip() f.readline() # + f.readline() # qual # Only keep reads in miRNA size range if 18 <= len(seq) <= 26: counts[seq] += 1 return counts # Write collapsed FASTA def write_collapsed_fasta(counts, output_path): with open(output_path, 'w') as f: for i, (seq, count) in enumerate(counts.most_common()): f.write(f'>seq_{i}_x{count}\n{seq}\n')
QC Metrics for Small RNA
Key metrics to check:
- Read length distribution (should peak at 21-23 nt for miRNA)
- Adapter content (high if library is good)
- Percentage of reads in target size range
import matplotlib.pyplot as plt from collections import Counter def plot_length_distribution(fastq_path): lengths = Counter() with gzip.open(fastq_path, 'rt') as f: for i, line in enumerate(f): if i % 4 == 1: # Sequence line lengths[len(line.strip())] += 1 plt.bar(lengths.keys(), lengths.values()) plt.xlabel('Read Length') plt.ylabel('Count') plt.title('Small RNA Length Distribution') plt.savefig('length_dist.png')
Related Skills
- mirdeep2-analysis - Novel miRNA discovery
- mirge3-analysis - Fast miRNA quantification
- read-qc/adapter-trimming - General adapter trimming