install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-clip-seq-clip-preprocessing" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-clip-seq-clip-preprocessing && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-clip-seq-clip-preprocessing" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-clip-seq-clip-preprocessing && rm -rf "$T"
manifest:
skills/bio-clip-seq-clip-preprocessing/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-clip-seq-clip-preprocessing description: Preprocess CLIP-seq data including adapter trimming, UMI extraction, and PCR duplicate removal. Use when preparing raw CLIP, iCLIP, or eCLIP reads for peak calling. tool_type: cli primary_tool: umi_tools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
CLIP-seq Preprocessing
UMI Extraction (eCLIP/iCLIP)
# Extract UMI from read 1 umi_tools extract \ --stdin=reads_R1.fastq.gz \ --read2-in=reads_R2.fastq.gz \ --bc-pattern=NNNNNNNNNN \ --stdout=R1_umi.fastq.gz \ --read2-out=R2_umi.fastq.gz # bc-pattern: UMI barcode pattern # N = UMI base # For eCLIP: typically 10-nt UMI in read 1
Adapter Trimming
# Trim adapters after UMI extraction cutadapt \ -a AGATCGGAAGAGCACACGTCT \ -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \ -m 18 \ -o trimmed_R1.fastq.gz \ -p trimmed_R2.fastq.gz \ R1_umi.fastq.gz R2_umi.fastq.gz
Two-Pass Trimming (eCLIP)
# eCLIP protocol has inline adapters # First pass: trim 3' adapter cutadapt -a AGATCGGAAGAGC -m 18 -o pass1.fq.gz input.fq.gz # Second pass: trim 5' adapter (read-through) cutadapt -g AGATCGGAAGAGC -m 18 -o pass2.fq.gz pass1.fq.gz
PCR Duplicate Removal
# After alignment, deduplicate using UMIs umi_tools dedup \ --stdin=aligned.bam \ --stdout=deduped.bam \ --paired \ --method=unique # Methods: # unique: Exact UMI match # cluster: Allow UMI mismatches (default) # adjacency: Network-based clustering
Python Preprocessing
from umi_tools import UMIClusterer import pysam def count_umis_per_position(bam_path): '''Count unique UMIs at each genomic position''' from collections import defaultdict position_umis = defaultdict(set) with pysam.AlignmentFile(bam_path, 'rb') as bam: for read in bam: if read.is_unmapped: continue # Extract UMI from read name (added by umi_tools extract) umi = read.query_name.split('_')[-1] pos = (read.reference_name, read.reference_start) position_umis[pos].add(umi) return {pos: len(umis) for pos, umis in position_umis.items()}
Quality Control
def clip_qc(bam_path): '''CLIP-seq specific QC metrics''' import pysam total = 0 unique_positions = set() read_lengths = [] with pysam.AlignmentFile(bam_path, 'rb') as bam: for read in bam: if read.is_unmapped: continue total += 1 unique_positions.add((read.reference_name, read.reference_start)) read_lengths.append(read.query_length) return { 'total_reads': total, 'unique_positions': len(unique_positions), 'mean_read_length': sum(read_lengths) / len(read_lengths), 'complexity': len(unique_positions) / total }
Related Skills
- clip-alignment - Align preprocessed reads
- read-qc/umi-processing - General UMI handling
- clip-peak-calling - Call peaks from aligned reads