BioSkills bio-rna-quantification-featurecounts-counting

Count reads per gene from aligned BAM files using Subread featureCounts. Use when processing BAM files from STAR/HISAT2 to generate gene-level counts for DESeq2/edgeR.

install

source · Clone the upstream repo

git clone https://github.com/GPTomics/bioSkills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/rna-quantification/featurecounts-counting" ~/.claude/skills/gptomics-bioskills-bio-rna-quantification-featurecounts-counting && rm -rf "$T"

manifest: rna-quantification/featurecounts-counting/SKILL.md

source content

Version Compatibility

Reference examples tested with: DESeq2 1.42+, HISAT2 2.2.1+, STAR 2.7.11+, Subread 2.0+, edgeR 4.0+, pandas 2.2+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:

Python:
```
pip show <package>
```
then
```
help(module.function)
```
to check signatures
R:
```
packageVersion('<pkg>')
```
then
```
?function_name
```
to verify parameters
CLI:
```
<tool> --version
```
then
```
<tool> --help
```
to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

featureCounts Counting

"Count reads per gene from my BAM files" → Assign aligned reads to genomic features using a GTF annotation to produce a gene-by-sample count matrix for DE analysis.

CLI:

featureCounts -a genes.gtf -o counts.txt sample1.bam sample2.bam

Count reads mapping to genomic features (genes, exons) from BAM files.

Basic Usage

# Single sample
featureCounts -a annotation.gtf -o counts.txt aligned.bam

# Multiple samples (recommended - single matrix output)
featureCounts -a annotation.gtf -o counts.txt sample1.bam sample2.bam sample3.bam

# All BAMs in directory
featureCounts -a annotation.gtf -o counts.txt *.bam

Paired-End Data

# Count fragments, not reads (required for paired-end)
featureCounts -p --countReadPairs -a annotation.gtf -o counts.txt *.bam

# Check proper pairs only
featureCounts -p --countReadPairs -B -C -a annotation.gtf -o counts.txt *.bam

Flags:

```
-p
```
- Input is paired-end
```
--countReadPairs
```
- Count fragments instead of reads
```
-B
```
- Only count properly paired reads
```
-C
```
- Don't count chimeric fragments

Strand-Specific Libraries

# Unstranded (default)
featureCounts -s 0 -a annotation.gtf -o counts.txt *.bam

# Forward stranded (e.g., dUTP, NSR)
featureCounts -s 1 -a annotation.gtf -o counts.txt *.bam

# Reverse stranded (e.g., Illumina TruSeq, most common)
featureCounts -s 2 -a annotation.gtf -o counts.txt *.bam

Determining strandedness: Use

infer_experiment.py

from RSeQC or check library prep protocol.

Feature Types

# Count at gene level (default)
featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt *.bam

# Count at transcript level
featureCounts -t exon -g transcript_id -a annotation.gtf -o counts.txt *.bam

# Count CDS only
featureCounts -t CDS -g gene_id -a annotation.gtf -o counts.txt *.bam

Flags:

```
-t
```
- Feature type in GTF (default: exon)
```
-g
```
- Meta-feature attribute (default: gene_id)

Multi-Mapping Reads

# Discard multi-mappers (default, recommended for DE)
featureCounts -a annotation.gtf -o counts.txt *.bam

# Count multi-mappers (fractional)
featureCounts -M --fraction -a annotation.gtf -o counts.txt *.bam

# Count multi-mappers (full count to each location)
featureCounts -M -a annotation.gtf -o counts.txt *.bam

Overlapping Features

# Discard reads overlapping multiple features (default)
featureCounts -a annotation.gtf -o counts.txt *.bam

# Count reads overlapping multiple features
featureCounts -O -a annotation.gtf -o counts.txt *.bam

# Fractional count for overlaps
featureCounts -O --fraction -a annotation.gtf -o counts.txt *.bam

Performance Options

# Use multiple threads
featureCounts -T 8 -a annotation.gtf -o counts.txt *.bam

# Use less memory (slower)
featureCounts --largeBAM -a annotation.gtf -o counts.txt *.bam

Output Files

featureCounts produces two files:

counts.txt - Main count matrix

Geneid  Chr     Start   End     Strand  Length  sample1.bam  sample2.bam
GENE1   chr1    100     500     +       400     1523         1891
GENE2   chr1    1000    2000    -       1000    892          756

counts.txt.summary - Assignment statistics

Status                  sample1.bam  sample2.bam
Assigned                1523456      1678234
Unassigned_Unmapped     12345        11234
Unassigned_NoFeatures   234567       245678

Extract Count Matrix

# Remove first 6 columns (metadata) to get just counts
cut -f1,7- counts.txt | tail -n +2 > count_matrix.txt

Python Processing

import pandas as pd

counts = pd.read_csv('counts.txt', sep='\t', comment='#')
count_matrix = counts.set_index('Geneid').iloc[:, 5:]  # Skip metadata columns
count_matrix.columns = [c.replace('.bam', '') for c in count_matrix.columns]
count_matrix.to_csv('count_matrix.csv')

R Processing

counts <- read.table('counts.txt', header=TRUE, row.names=1, skip=1)
count_matrix <- counts[, 6:ncol(counts)]  # Skip metadata columns
colnames(count_matrix) <- gsub('.bam', '', colnames(count_matrix))

Common Issues

Low assignment rate:

Check strandedness setting (
```
-s
```
)
Verify GTF matches reference genome version
Check BAM alignment quality

Zero counts for known expressed genes:

Ensure feature type matches GTF (
```
-t exon
```
vs
```
-t gene
```
)
Check gene_id attribute name in GTF

Related Skills

alignment-files/sam-bam-basics - Input BAM file handling
genome-intervals/gtf-gff-handling - GTF annotation files
differential-expression/deseq2-basics - Downstream analysis with counts
rna-quantification/count-matrix-qc - QC of count data