BioSkills bio-rna-structure-structure-probing

Analyzes experimental RNA structure probing data from SHAPE-MaP and DMS-MaPseq experiments using ShapeMapper2. Converts mutation rates to per-nucleotide reactivity profiles that constrain structure prediction. Use when processing SHAPE-MaP or DMS-MaPseq sequencing data to obtain experimental RNA structure information.

install
source · Clone the upstream repo
git clone https://github.com/GPTomics/bioSkills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/rna-structure/structure-probing" ~/.claude/skills/gptomics-bioskills-bio-rna-structure-structure-probing && rm -rf "$T"
manifest: rna-structure/structure-probing/SKILL.md
source content

Version Compatibility

Reference examples tested with: STAR 2.7.11+, eggNOG-mapper 2.1+, matplotlib 3.8+, numpy 1.26+, pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:

  • Python:
    pip show <package>
    then
    help(module.function)
    to check signatures
  • CLI:
    <tool> --version
    then
    <tool> --help
    to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Structure Probing

"Process my SHAPE-MaP experiment to get RNA reactivity profiles" → Convert mutation rates from SHAPE-MaP or DMS-MaPseq sequencing data into per-nucleotide reactivity profiles, then use reactivities as constraints for thermodynamic structure prediction.

  • CLI:
    shapemapper
    (ShapeMapper2) for end-to-end SHAPE-MaP processing
  • CLI:
    RNAfold --shape
    (ViennaRNA) for SHAPE-constrained folding

Analyze experimental RNA structure probing data (SHAPE-MaP, DMS-MaPseq) to obtain per-nucleotide reactivity profiles. High reactivity indicates flexible/single-stranded nucleotides; low reactivity indicates base-paired/structured positions. Reactivities constrain thermodynamic folding for more accurate structure prediction.

Platform Note

ShapeMapper2 is Linux-only. On macOS, use Docker or Singularity:

# Docker
docker pull shapemapper2/shapemapper2
docker run -v $(pwd):/data shapemapper2/shapemapper2 shapemapper \
    --target /data/target.fa --modified --R1 /data/mod_R1.fq.gz --R2 /data/mod_R2.fq.gz \
    --untreated --R1 /data/unmod_R1.fq.gz --R2 /data/unmod_R2.fq.gz \
    --out /data/results

# Singularity
singularity pull shapemapper2.sif docker://shapemapper2/shapemapper2
singularity exec -B $(pwd):/data shapemapper2.sif shapemapper \
    --target /data/target.fa --modified --R1 /data/mod_R1.fq.gz ...

ShapeMapper2 Pipeline

Basic SHAPE-MaP Analysis

shapemapper \
    --target target_rna.fa \
    --name my_rna \
    --modified --R1 modified_R1.fastq.gz --R2 modified_R2.fastq.gz \
    --untreated --R1 untreated_R1.fastq.gz --R2 untreated_R2.fastq.gz \
    --out results/ \
    --nproc 8 \
    --min-depth 5000

With Denatured Control

A denatured control normalizes for sequence-dependent mutation biases.

shapemapper \
    --target target_rna.fa \
    --name my_rna \
    --modified --R1 modified_R1.fastq.gz --R2 modified_R2.fastq.gz \
    --untreated --R1 untreated_R1.fastq.gz --R2 untreated_R2.fastq.gz \
    --denatured --R1 denatured_R1.fastq.gz --R2 denatured_R2.fastq.gz \
    --out results/ \
    --nproc 8

Amplicon Mode (Targeted)

# For PCR-amplified targets
shapemapper \
    --target target_rna.fa \
    --name my_rna \
    --amplicon \
    --modified --R1 modified_R1.fastq.gz --R2 modified_R2.fastq.gz \
    --untreated --R1 untreated_R1.fastq.gz --R2 untreated_R2.fastq.gz \
    --out results/ \
    --nproc 8

Key ShapeMapper2 Options

OptionDescription
--target
FASTA file with target RNA sequence(s)
--name
Name for output files
--modified
Modified sample FASTQ files
--untreated
Untreated control FASTQ files
--denatured
Denatured control FASTQ files
--amplicon
Amplicon sequencing mode
--nproc
Number of processors
--min-depth
Minimum read depth per nucleotide (default: 5000)
--min-qual-to-count
Minimum base quality (default: 20)
--overwrite
Overwrite existing output
--star-aligner
Use STAR instead of Bowtie2 for alignment

ShapeMapper2 Output Files

results/
├── my_rna_profile.txt          # Per-nucleotide reactivity profile
├── my_rna_shapemapper_log.txt  # Run log and QC metrics
├── my_rna_histograms/          # Mutation rate histograms
├── my_rna_profile.pdf          # Reactivity bar plot
└── my_rna_map.shape            # SHAPE reactivities for RNAfold

Reactivity Interpretation

SHAPE Reactivity Scale

ValueInterpretation
< 0.3Low reactivity, likely base-paired
0.3 - 0.7Moderate, partially flexible or dynamic
> 0.7High reactivity, likely single-stranded
-999No data (insufficient read depth)

Quality Filters

MetricThresholdRationale
Read depth>= 5,000Minimum for reliable mutation rate estimation
Effective depth>= 1,000After quality filtering
Modification rate (untreated)< 0.5%High background suggests RNA damage or mapping artifact
Modification rate (modified)1-10%Too low = undermodified; too high = degraded

Structure-Guided Folding with SHAPE Data

RNAfold with SHAPE Constraints

# Use ShapeMapper2 .shape output directly with RNAfold
RNAfold --shape=results/my_rna_map.shape < target_rna.fa

# Adjust SHAPE slope/intercept for different reagents
# Default (1NAI/NMIA): m=1.8, b=-0.6
RNAfold --shape=results/my_rna_map.shape --shapeMethod="Dm1.8b-0.6" < target_rna.fa

# Convert to hard constraints at extreme positions
# Reactivity > 0.7 forced unpaired, < 0.3 allowed to pair
RNAfold --shape=results/my_rna_map.shape --shapeConversion="S" < target_rna.fa

Python: SHAPE-Constrained Folding

import RNA
import pandas as pd

def load_shape_profile(profile_file):
    '''Load ShapeMapper2 reactivity profile.'''
    df = pd.read_csv(profile_file, sep='\t')
    reactivities = df['Reactivity_profile'].tolist()
    sequence = ''.join(df['Nucleotide'].tolist())
    return sequence, reactivities


def fold_with_shape(sequence, reactivities, m=1.8, b=-0.6):
    '''
    Fold RNA constrained by SHAPE reactivities.

    m, b: Deigan et al. (2009) parameters.
    m=1.8, b=-0.6: standard for SHAPE (1M7, NMIA, NAI).
    m=1.1, b=-0.3: suggested for DMS-MaPseq (A/C only).
    '''
    fc = RNA.fold_compound(sequence)

    # Prepend -999 for 0-indexed padding (ViennaRNA expects 1-indexed)
    shape_data = [-999.0] + [r if r != -999 else -999.0 for r in reactivities]
    fc.sc_add_SHAPE_deigan(shape_data, m, b)

    structure, mfe = fc.mfe()

    # Also compute partition function for confidence
    fc2 = RNA.fold_compound(sequence)
    fc2.sc_add_SHAPE_deigan(shape_data, m, b)
    _, pf_energy = fc2.pf()
    centroid, _ = fc2.centroid()

    return {
        'mfe_structure': structure,
        'mfe_energy': mfe,
        'centroid_structure': centroid,
        'ensemble_energy': pf_energy
    }


def shape_agreement(structure, reactivities, low_threshold=0.3, high_threshold=0.7):
    '''
    Assess agreement between SHAPE data and predicted structure.

    Returns fraction of nucleotides where reactivity agrees with structure
    (low reactivity = paired, high reactivity = unpaired).
    '''
    agree, total = 0, 0
    for i, (char, react) in enumerate(zip(structure, reactivities)):
        if react == -999:
            continue
        total += 1
        paired = char in '()'
        if paired and react < low_threshold:
            agree += 1
        elif not paired and react > high_threshold:
            agree += 1
    return agree / total if total > 0 else 0.0

DMS-MaPseq Analysis

DMS methylates unpaired A and C residues. SEISMIC-RNA (successor to DREEM) or rf-count process DMS-MaPseq data.

SEISMIC-RNA

# Install
pip install seismic-rna

# Align reads to target
seismic align target.fa reads_R1.fq.gz reads_R2.fq.gz --out-dir seismic_out

# Relate mutations to reference
seismic relate seismic_out/align --out-dir seismic_out

# Compute mutation rates per position
seismic mask seismic_out/relate --out-dir seismic_out

# Cluster structural conformations (if RNA adopts multiple states)
seismic cluster seismic_out/mask --out-dir seismic_out --max-clusters 3

rf-count (RNAframework)

# Count mutations with rf-count
rf-count -t target.fa -r modified.bam -rc untreated.bam -o rf_results/

# Normalize reactivities with rf-norm
rf-norm -t target.fa -i rf_results/ -o rf_norm/ -sm 3 -nm 2

DMS vs SHAPE Differences

FeatureSHAPE (1M7/NAI)DMS-MaPseq
Reactive nucleotidesAll four (A, C, G, U)Primarily A, C
ResolutionAll positions~50% of positions
In-cell probingNAI-N3 for in vivoStandard DMS works in vivo
Reagent costModerateLow
Analysis toolsShapeMapper2SEISMIC-RNA, rf-count

Visualization

Plot Reactivity Profile

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np


def plot_reactivity_profile(profile_file, output_file='reactivity_profile.png', title=None):
    '''Plot SHAPE/DMS reactivity bar chart with confidence coloring.'''
    df = pd.read_csv(profile_file, sep='\t')
    positions = df.index + 1
    reactivities = df['Reactivity_profile'].values

    valid = reactivities != -999
    colors = np.where(reactivities > 0.7, '#FF0000',   # high = red = unpaired
             np.where(reactivities > 0.3, '#FFA500',   # moderate = orange
                                          '#000000'))   # low = black = paired

    fig, ax = plt.subplots(figsize=(max(8, len(positions) * 0.08), 4))
    ax.bar(positions[valid], reactivities[valid], color=colors[valid], width=1.0, edgecolor='none')
    ax.axhline(y=0.3, color='gray', linestyle='--', linewidth=0.5)
    ax.axhline(y=0.7, color='gray', linestyle='--', linewidth=0.5)
    ax.set_xlabel('Nucleotide position')
    ax.set_ylabel('Reactivity')
    ax.set_title(title or 'SHAPE reactivity profile')
    ax.set_xlim(0, len(positions) + 1)
    plt.tight_layout()
    plt.savefig(output_file, dpi=150)
    plt.close()
    print(f'Saved reactivity profile to {output_file}')

Arc Diagram with Reactivities

import RNA
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np


def plot_arc_diagram(sequence, structure, reactivities=None, output_file='arc_diagram.png'):
    '''Draw arc diagram colored by SHAPE reactivity.'''
    n = len(sequence)
    pt = RNA.ptable(structure)

    fig, ax = plt.subplots(figsize=(max(8, n * 0.1), 4))

    if reactivities is not None:
        valid_react = [r for r in reactivities if r != -999 and r >= 0]
        vmax = np.percentile(valid_react, 95) if valid_react else 1.0
        cmap = plt.cm.YlOrRd
        for i in range(n):
            r = reactivities[i]
            color = 'gray' if r == -999 else cmap(min(r / vmax, 1.0))
            ax.plot(i + 1, 0, 'o', color=color, markersize=3)
    else:
        for i in range(n):
            ax.plot(i + 1, 0, 'o', color='black', markersize=3)

    for i in range(1, n + 1):
        j = pt[i]
        if j > i:
            center = (i + j) / 2
            radius = (j - i) / 2
            arc = patches.Arc((center, 0), j - i, j - i, angle=0,
                              theta1=0, theta2=180, color='steelblue', linewidth=0.5)
            ax.add_patch(arc)

    ax.set_xlim(0, n + 1)
    ax.set_ylim(-1, n / 2 + 1)
    ax.set_xlabel('Position')
    ax.set_title('Arc diagram')
    ax.set_aspect('equal')
    plt.tight_layout()
    plt.savefig(output_file, dpi=150)
    plt.close()

Related Skills

  • secondary-structure-prediction - Unconstrained and SHAPE-constrained RNA folding
  • clip-seq/binding-site-annotation - Protein-RNA binding site annotation
  • epitranscriptomics/m6a-peak-calling - RNA modification detection