BioSkills bio-rna-structure-secondary-structure-prediction

Predicts RNA secondary structures using minimum free energy folding and partition function analysis with ViennaRNA (RNAfold, RNAalifold, RNAcofold). Computes base-pair probabilities, centroid structures, and consensus structures from alignments. Use when predicting RNA folding, evaluating structural stability, or comparing structures across homologs.

install
source · Clone the upstream repo
git clone https://github.com/GPTomics/bioSkills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/rna-structure/secondary-structure-prediction" ~/.claude/skills/gptomics-bioskills-bio-rna-structure-secondary-structure-prediction && rm -rf "$T"
manifest: rna-structure/secondary-structure-prediction/SKILL.md
source content

Version Compatibility

Reference examples tested with: Infernal 1.1+, matplotlib 3.8+, numpy 1.26+

Before using code patterns, verify installed versions match. If versions differ:

  • Python:
    pip show <package>
    then
    help(module.function)
    to check signatures
  • CLI:
    <tool> --version
    then
    <tool> --help
    to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Secondary Structure Prediction

"Predict the secondary structure of my RNA sequence" → Compute minimum free energy (MFE) folding, base-pair probabilities via partition function, and consensus structures from alignments using thermodynamic models.

  • CLI:
    RNAfold
    for single-sequence MFE/partition folding
  • CLI:
    RNAalifold
    for consensus structure from alignment
  • CLI:
    RNAcofold
    for RNA-RNA interaction structure

Predict RNA secondary structures using thermodynamic models. ViennaRNA provides MFE folding, partition function analysis, consensus structure prediction from alignments, and RNA-RNA interaction prediction.

RNAfold: Single Sequence Folding

MFE Structure

# Basic MFE folding (reads sequence from stdin or file)
echo "GGGAAACCC" | RNAfold

# With partition function (-p) and base-pair probabilities
echo "GGGAAACCC" | RNAfold -p

# Output PostScript dot plot and structure plot
echo ">myRNA" > input.fa
echo "GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA" >> input.fa
RNAfold -p --noPS < input.fa  # Suppress PostScript files

Key RNAfold Options

OptionDescription
-p
Compute partition function and base-pair probabilities
--MEA
Compute maximum expected accuracy structure
-d2
Dangling end energies on both sides of helices (default)
-T 37
Temperature in Celsius (default: 37)
--noLP
No lonely pairs (isolated base pairs)
--noPS
Suppress PostScript output files
-C
Read structure constraints from input
--shape
Incorporate SHAPE reactivity data

Constrained Folding

# Force specific positions paired/unpaired
# Constraint notation: '.' = unconstrained, 'x' = unpaired, '(' ')' = forced pair
echo -e ">constrained\nGGGCUAUUAGCUCAGUUGGUUAGAGCGCACC\n...xxxx.........................." | RNAfold -C

RNAalifold: Consensus Structure from Alignment

Predicts a consensus structure from a multiple sequence alignment, combining thermodynamic stability with covariation evidence.

# Input: Stockholm or ClustalW alignment format
RNAalifold --aln alignment.sto

# With covariation weighting and partition function
RNAalifold --cfactor 0.6 --nfactor 0.5 -p alignment.sto

# RIBOSUM scoring for better covariation detection
RNAalifold --ribosum_scoring alignment.sto
OptionDescription
--cfactor
Covariation weight (default: 1.0, lower = more thermodynamic)
--nfactor
Non-compatible penalty (default: 1.0)
--ribosum_scoring
Use RIBOSUM matrices for covariation
-p
Partition function for consensus

RNAcofold: RNA-RNA Interaction

Predicts the hybridization structure of two RNA molecules.

# Two sequences separated by '&'
echo "GCGCGC&GCGCGC" | RNAcofold

# With partition function
echo "GCGCGC&GCGCGC" | RNAcofold -p

LinearFold: Fast Folding for Long Sequences

For sequences longer than ~5,000 nt, LinearFold provides O(n) time complexity instead of O(n^3).

# LinearFold (if installed separately)
echo "GGGAAACCC" | linearfold

# ViennaRNA also supports --maxBPspan for long sequences
RNAfold --maxBPspan 300 < long_sequence.fa

ViennaRNA Python API

import RNA

sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'

# MFE folding
structure, mfe = RNA.fold(sequence)
print(f'Structure: {structure}')
print(f'MFE: {mfe:.2f} kcal/mol')

# Partition function and base-pair probabilities
fc = RNA.fold_compound(sequence)
structure_pf, pf_energy = fc.pf()
print(f'Ensemble energy: {pf_energy:.2f} kcal/mol')

# Base-pair probability matrix
bpp = fc.bpp()

# Centroid structure (most representative of the ensemble)
centroid, centroid_dist = fc.centroid()
print(f'Centroid: {centroid}')
print(f'Distance to ensemble: {centroid_dist:.2f}')

# MEA structure (maximum expected accuracy)
mea_struct, mea_val = fc.MEA()
print(f'MEA structure: {mea_struct}')
print(f'MEA value: {mea_val:.2f}')

Folding with Constraints (Python)

import RNA

sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'

md = RNA.md()
md.uniq_ML = 1  # Unique multiloop decomposition

fc = RNA.fold_compound(sequence, md)

# Force position 10 unpaired (0-indexed)
fc.hc_add_up(10, RNA.CONSTRAINT_CONTEXT_ALL_LOOPS)

# Force positions 1-3 paired with 70-72
fc.hc_add_bp(1, 72, RNA.CONSTRAINT_CONTEXT_ALL_LOOPS)

structure, mfe = fc.mfe()
print(f'Constrained: {structure} ({mfe:.2f} kcal/mol)')

SHAPE-Constrained Folding (Python)

import RNA

sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'

fc = RNA.fold_compound(sequence)

# SHAPE reactivities: negative values = no data
# Deigan et al. (2009) parameters: m=1.8, b=-0.6 (default for SHAPE)
reactivities = [-999] + [0.1, 0.05, 0.8, 0.9, 0.2, 0.1, 0.3]  # 1-indexed, -999 = missing
fc.sc_add_SHAPE_deigan(reactivities, 1.8, -0.6)

structure, mfe = fc.mfe()
print(f'SHAPE-guided: {structure} ({mfe:.2f} kcal/mol)')

Structure Comparison

import RNA

struct1 = '(((....)))'
struct2 = '(((....).))'

# Base-pair distance
bp_dist = RNA.bp_distance(struct1, struct2)
print(f'Base-pair distance: {bp_dist}')

# Tree edit distance (more sophisticated)
tree1 = RNA.make_tree(RNA.expand_Full(struct1))
tree2 = RNA.make_tree(RNA.expand_Full(struct2))
tree_dist = RNA.tree_edit_distance(tree1, tree2)
print(f'Tree edit distance: {tree_dist}')

Structure Formats

FormatDescriptionExample
Dot-bracketParentheses for pairs, dots for unpaired
(((...)))
CT (connect)Tab-delimited: index, base, prev, next, pair, indexStandard for mfold
BPSEQThree columns: position, nucleotide, pair partner (0=unpaired)Used by comparative databases
WUSSExtended dot-bracket with pseudoknot notation
<<..AA..>>..aa

Format Conversion

import RNA

sequence = 'GGGAAACCC'
structure = '(((...)))'

# Dot-bracket to base-pair list
pt = RNA.ptable(structure)
pairs = [(i, pt[i]) for i in range(1, len(pt)) if pt[i] > i]
print(f'Base pairs: {pairs}')

# Dot-bracket to BPSEQ
for i in range(1, len(sequence) + 1):
    print(f'{i} {sequence[i-1]} {pt[i]}')

Visualization

Forna (web-based)

# Generate JSON for forna viewer (http://rna.tbi.univie.ac.at/forna/)
import json

forna_data = {
    'sequence': 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACC',
    'structure': '((((....((((......))))....))))'
}
print(json.dumps(forna_data))

R2DT (standardized 2D layouts)

# R2DT provides template-based 2D layouts for known RNA families
# Requires Docker
docker run -v $(pwd):/data rnacentral/r2dt draw /data/input.fa /data/output/

Matplotlib Dot Plot

import RNA
import matplotlib.pyplot as plt
import numpy as np

sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'
fc = RNA.fold_compound(sequence)
fc.pf()
bpp = fc.bpp()

n = len(sequence)
matrix = np.zeros((n, n))
for i in range(1, n + 1):
    for j in range(i + 1, n + 1):
        matrix[i-1][j-1] = bpp[i][j]

fig, ax = plt.subplots(figsize=(8, 8))
ax.imshow(matrix, cmap='YlOrRd', origin='lower', vmin=0, vmax=1)
ax.set_xlabel('Position')
ax.set_ylabel('Position')
ax.set_title('Base-pair probability matrix')
plt.tight_layout()
plt.savefig('bpp_dotplot.png', dpi=150)

Quality Thresholds

MetricThresholdRationale
MFE z-score< -2.0Sequence folds significantly better than shuffled controls
Ensemble diversity< 5.0Low diversity indicates a well-defined structure
Base-pair probability> 0.9High confidence for individual pairs
Covariation score> 0.0Positive covariation supports predicted pair

Related Skills

  • ncrna-search - Classify structured RNAs by family using Infernal/Rfam
  • structure-probing - Use experimental SHAPE/DMS data to constrain predictions
  • genome-annotation/ncrna-annotation - Genome-wide ncRNA annotation
  • sequence-manipulation/sequence-properties - Sequence composition analysis