BioSkills bio-temporal-genomics-periodicity-detection

Discovers periodic signals of unknown period in time-series omics data using Lomb-Scargle periodograms (scipy), autocorrelation, and wavelet time-frequency decomposition (pywt). Identifies dominant frequencies, handles irregularly sampled data, and detects transient periodicity. Use when searching for periodic patterns of unknown period length, analyzing cell cycle oscillations, or processing unevenly spaced time-series. Not for testing known 24-hour rhythms (see temporal-genomics/circadian-rhythms).

install
source · Clone the upstream repo
git clone https://github.com/GPTomics/bioSkills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/temporal-genomics/periodicity-detection" ~/.claude/skills/gptomics-bioskills-bio-temporal-genomics-periodicity-detection && rm -rf "$T"
manifest: temporal-genomics/periodicity-detection/SKILL.md
source content

Version Compatibility

Reference examples tested with: matplotlib 3.8+, numpy 1.26+, pwr 1.3+, scipy 1.12+, statsmodels 0.14+

Before using code patterns, verify installed versions match. If versions differ:

  • Python:
    pip show <package>
    then
    help(module.function)
    to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Periodicity Detection

"Find periodic patterns of unknown period in my time-series data" → Compute frequency spectra using Lomb-Scargle periodograms (handles irregular sampling), identify significant spectral peaks, and detect transient periodicity via continuous wavelet transforms.

  • Python:
    scipy.signal.lombscargle()
    for Lomb-Scargle periodogram
  • Python:
    pywt.cwt()
    for wavelet time-frequency decomposition

Discovers periodic signals of unknown frequency in time-series omics data. Handles irregular sampling, identifies dominant oscillation periods, and detects transient or time-varying periodicity through spectral and time-frequency methods.

Core Workflow

  1. Prepare time-series expression data (handle missing values, detrend if needed)
  2. Compute frequency spectrum (Lomb-Scargle, Welch, or autocorrelation)
  3. Identify significant spectral peaks
  4. Assess statistical significance (false alarm probability, permutation FDR)
  5. For transient periodicity, apply wavelet time-frequency decomposition

Lomb-Scargle Periodogram (scipy)

Goal: Discover periodic signals of unknown frequency in time-series expression data, especially with irregular sampling.

Approach: Compute a Lomb-Scargle periodogram over a frequency grid spanning biologically plausible periods, identify statistically significant spectral peaks using false alarm probabilities, and convert peak frequencies to period estimates.

Handles irregularly sampled time series without interpolation. Standard method for astronomical and biological time-series with uneven sampling.

Basic Lomb-Scargle

import numpy as np
from scipy.signal import lombscargle

# times: observation times (may be unevenly spaced)
# values: expression values at those times
# Subtract mean for proper normalization
values_centered = values - np.mean(values)

# Frequency grid: test periods from 4h to 72h
# min_period=4h: Nyquist limit for ~2h sampling; shorter periods unresolvable
# max_period=72h: longer than this needs more data (at least 2 full cycles)
min_freq = 2 * np.pi / 72.0
max_freq = 2 * np.pi / 4.0
# 1000 frequencies: fine grid for precise period estimation
freqs = np.linspace(min_freq, max_freq, 1000)

# Angular frequencies for scipy.signal.lombscargle
power = lombscargle(times, values_centered, freqs, normalize=True)

# Convert to periods for interpretation
periods = 2 * np.pi / freqs

Peak Detection and Significance

from scipy.signal import find_peaks

# Find spectral peaks
# prominence=0.1: minimum peak prominence above local background
peaks, properties = find_peaks(power, prominence=0.1)

# Dominant period
dominant_idx = np.argmax(power)
dominant_period = periods[dominant_idx]
dominant_power = power[dominant_idx]

False Alarm Probability (FAP)

from astropy.timeseries import LombScargle

# astropy provides built-in FAP calculation
ls = LombScargle(times, values_centered)
frequency, power_astropy = ls.autopower(
    minimum_frequency=1 / 72.0,
    maximum_frequency=1 / 4.0
)

# FAP: probability that noise alone produces a peak this high
# FAP < 0.01: strong evidence of true periodicity
fap = ls.false_alarm_probability(power_astropy.max())

# FAP at multiple significance levels
fap_levels = ls.false_alarm_level([0.01, 0.05, 0.10])

Genome-Wide Periodicity Screening

from statsmodels.stats.multitest import multipletests

# Screen all genes; collect FAP for each
fap_values = []
dominant_periods = []
for gene_idx in range(expr_mat.shape[0]):
    vals = expr_mat[gene_idx] - np.mean(expr_mat[gene_idx])
    power = lombscargle(times, vals, freqs, normalize=True)
    dominant_periods.append(periods[np.argmax(power)])
    # Approximate FAP using Baluev (2008) method via astropy
    ls = LombScargle(times, vals)
    _, pwr = ls.autopower(minimum_frequency=1 / 72.0, maximum_frequency=1 / 4.0)
    fap_values.append(ls.false_alarm_probability(pwr.max()))

# BH FDR correction for genome-wide multiple testing
reject, qvals, _, _ = multipletests(fap_values, method='fdr_bh')
# q < 0.05: standard FDR threshold
n_periodic = np.sum(qvals < 0.05)

Autocorrelation

Detects periodicity by measuring self-similarity at increasing time lags.

from statsmodels.tsa.stattools import acf

# nlags: test up to 2x expected maximum period (in sampling intervals)
# For 4h sampling testing up to 72h: nlags = 72/4 = 18
acf_values, confint = acf(values, nlags=18, alpha=0.05)

# Periodic signal produces peaks at multiples of the period
# Find first significant peak after lag 0
# Peaks above upper confidence bound indicate significant autocorrelation
peak_lags, _ = find_peaks(acf_values[1:])
significant_peaks = peak_lags[acf_values[peak_lags + 1] > confint[peak_lags + 1, 1]]
if len(significant_peaks) > 0:
    # First significant peak lag corresponds to the fundamental period
    fundamental_period = (significant_peaks[0] + 1) * sampling_interval

Wavelet Time-Frequency Decomposition (pywt)

Detects transient or time-varying periodicity that global methods miss.

Continuous Wavelet Transform (CWT)

import pywt
import numpy as np

# Morlet wavelet: standard for biological oscillation detection
# Balances time and frequency resolution
wavelet = 'cmor1.5-1.0'

# Scales correspond to periods being tested
# scales = center_freq * period * sampling_rate
# For 4h sampling, test periods 8-72h
sampling_rate = 1 / 4.0
center_freq = pywt.central_frequency(wavelet)
periods_to_test = np.arange(8, 73, 1)
scales = center_freq * periods_to_test * sampling_rate

# CWT returns complex coefficients; power = |coefficients|^2
coefficients, frequencies = pywt.cwt(values, scales, wavelet, sampling_period=4.0)
power_matrix = np.abs(coefficients) ** 2

Scalogram Visualization

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
im = ax.pcolormesh(times, periods_to_test, power_matrix, shading='auto', cmap='viridis')
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Period (hours)')
ax.set_title('Wavelet scalogram')
ax.invert_yaxis()
plt.colorbar(im, ax=ax, label='Power')

Detect Time-Varying Periodicity

# Ridge extraction: track dominant period over time
dominant_period_over_time = periods_to_test[np.argmax(power_matrix, axis=0)]

# Identify time windows where periodicity is strong
# Threshold: mean + 2*SD of power; captures peaks above background noise
power_threshold = np.mean(power_matrix) + 2 * np.std(power_matrix)
significant_mask = np.max(power_matrix, axis=0) > power_threshold

Welch Periodogram (Evenly Sampled)

For regularly sampled data, Welch's method provides smooth power spectral density estimates.

from scipy.signal import welch

# nperseg: segment length; controls frequency resolution vs variance tradeoff
# Longer segments = finer frequency resolution but more variance
# nperseg = n_timepoints // 2: standard compromise
fs = 1 / sampling_interval
freqs_welch, psd = welch(values, fs=fs, nperseg=len(values) // 2)

# Convert to periods
periods_welch = 1 / freqs_welch[1:]
psd_trimmed = psd[1:]

Permutation-Based FDR

For genome-wide periodicity screening without parametric assumptions.

# Permutation test: shuffle timepoints to generate null distribution
# n_perm=1000: standard for genome-wide testing; increase to 10000 for publication
n_perm = 1000
null_max_powers = np.zeros(n_perm)
for perm in range(n_perm):
    shuffled = np.random.permutation(values)
    null_power = lombscargle(times, shuffled - np.mean(shuffled), freqs, normalize=True)
    null_max_powers[perm] = np.max(null_power)

# Empirical p-value: fraction of null peaks >= observed peak
empirical_p = np.mean(null_max_powers >= dominant_power)

Method Selection

MethodSamplingDetectsLimitations
Lomb-ScargleUneven OKGlobal periodicityNo time localization
AutocorrelationEven preferredFundamental periodLow frequency resolution
Wavelet CWTEven preferredTransient periodicityLower frequency precision
Welch PSDEven requiredGlobal PSDCannot handle gaps

Parameter Guide

ParameterTypical ValueRationale
Min period2x sampling intervalNyquist criterion
Max periodTotal duration / 2Need at least 2 full cycles
FAP threshold< 0.01Conservative for individual genes
FDR thresholdq < 0.05Standard for genome-wide screening
Waveletcmor1.5-1.0 (Morlet)Standard for biological oscillations
Permutations1000-100001000 for screening, 10000 for publication

Related Skills

  • circadian-rhythms - Known-period rhythm testing with cosinor and JTK_CYCLE
  • temporal-clustering - Group genes by periodicity characteristics
  • trajectory-modeling - Non-periodic trajectory fitting with GAMs