OpenClaw-Medical-Skills bio-rna-quantification-featurecounts-counting
<!--
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-rna-quantification-featurecounts-counting" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-rna-quantification-featurecounts && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-rna-quantification-featurecounts-counting" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-rna-quantification-featurecounts && rm -rf "$T"
manifest:
skills/bio-rna-quantification-featurecounts-counting/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-rna-quantification-featurecounts-counting description: Count reads per gene from aligned BAM files using Subread featureCounts. Use when processing BAM files from STAR/HISAT2 to generate gene-level counts for DESeq2/edgeR. tool_type: cli primary_tool: featureCounts measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
featureCounts Counting
Count reads mapping to genomic features (genes, exons) from BAM files.
Basic Usage
# Single sample featureCounts -a annotation.gtf -o counts.txt aligned.bam # Multiple samples (recommended - single matrix output) featureCounts -a annotation.gtf -o counts.txt sample1.bam sample2.bam sample3.bam # All BAMs in directory featureCounts -a annotation.gtf -o counts.txt *.bam
Paired-End Data
# Count fragments, not reads (required for paired-end) featureCounts -p --countReadPairs -a annotation.gtf -o counts.txt *.bam # Check proper pairs only featureCounts -p --countReadPairs -B -C -a annotation.gtf -o counts.txt *.bam
Flags:
- Input is paired-end-p
- Count fragments instead of reads--countReadPairs
- Only count properly paired reads-B
- Don't count chimeric fragments-C
Strand-Specific Libraries
# Unstranded (default) featureCounts -s 0 -a annotation.gtf -o counts.txt *.bam # Forward stranded (e.g., dUTP, NSR) featureCounts -s 1 -a annotation.gtf -o counts.txt *.bam # Reverse stranded (e.g., Illumina TruSeq, most common) featureCounts -s 2 -a annotation.gtf -o counts.txt *.bam
Determining strandedness: Use
infer_experiment.py from RSeQC or check library prep protocol.
Feature Types
# Count at gene level (default) featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt *.bam # Count at transcript level featureCounts -t exon -g transcript_id -a annotation.gtf -o counts.txt *.bam # Count CDS only featureCounts -t CDS -g gene_id -a annotation.gtf -o counts.txt *.bam
Flags:
- Feature type in GTF (default: exon)-t
- Meta-feature attribute (default: gene_id)-g
Multi-Mapping Reads
# Discard multi-mappers (default, recommended for DE) featureCounts -a annotation.gtf -o counts.txt *.bam # Count multi-mappers (fractional) featureCounts -M --fraction -a annotation.gtf -o counts.txt *.bam # Count multi-mappers (full count to each location) featureCounts -M -a annotation.gtf -o counts.txt *.bam
Overlapping Features
# Discard reads overlapping multiple features (default) featureCounts -a annotation.gtf -o counts.txt *.bam # Count reads overlapping multiple features featureCounts -O -a annotation.gtf -o counts.txt *.bam # Fractional count for overlaps featureCounts -O --fraction -a annotation.gtf -o counts.txt *.bam
Performance Options
# Use multiple threads featureCounts -T 8 -a annotation.gtf -o counts.txt *.bam # Use less memory (slower) featureCounts --largeBAM -a annotation.gtf -o counts.txt *.bam
Output Files
featureCounts produces two files:
- counts.txt - Main count matrix
Geneid Chr Start End Strand Length sample1.bam sample2.bam GENE1 chr1 100 500 + 400 1523 1891 GENE2 chr1 1000 2000 - 1000 892 756
- counts.txt.summary - Assignment statistics
Status sample1.bam sample2.bam Assigned 1523456 1678234 Unassigned_Unmapped 12345 11234 Unassigned_NoFeatures 234567 245678
Extract Count Matrix
# Remove first 6 columns (metadata) to get just counts cut -f1,7- counts.txt | tail -n +2 > count_matrix.txt
Python Processing
import pandas as pd counts = pd.read_csv('counts.txt', sep='\t', comment='#') count_matrix = counts.set_index('Geneid').iloc[:, 5:] # Skip metadata columns count_matrix.columns = [c.replace('.bam', '') for c in count_matrix.columns] count_matrix.to_csv('count_matrix.csv')
R Processing
counts <- read.table('counts.txt', header=TRUE, row.names=1, skip=1) count_matrix <- counts[, 6:ncol(counts)] # Skip metadata columns colnames(count_matrix) <- gsub('.bam', '', colnames(count_matrix))
Common Issues
Low assignment rate:
- Check strandedness setting (
)-s - Verify GTF matches reference genome version
- Check BAM alignment quality
Zero counts for known expressed genes:
- Ensure feature type matches GTF (
vs-t exon
)-t gene - Check gene_id attribute name in GTF
Related Skills
- alignment-files/sam-bam-basics - Input BAM file handling
- genome-intervals/gtf-gff-handling - GTF annotation files
- differential-expression/deseq2-basics - Downstream analysis with counts
- rna-quantification/count-matrix-qc - QC of count data