Cc-skills backtesting-py-oracle

backtesting.py configuration for SQL oracle validation and range bar pattern backtesting. Use when running backtesting.py against ClickHouse SQL results, configuring Backtest() constructor, handling overlapping trades, multi-position mode, rolling quantile NaN handling, trade sorting, or oracle gate validation. TRIGGERS - backtesting.py, Backtest(), hedging, exclusive_orders, multi-position, overlapping trades, oracle validation, SQL vs Python, trade comparison, entry price mismatch, signal count mismatch, rolling quantile NaN, ExitTime sort, stats._trades, gen600_strategy, champion_strategy, gen300_strategy, barrier setup.

install
source · Clone the upstream repo
git clone https://github.com/terrylica/cc-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/terrylica/cc-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/quant-research/skills/backtesting-py-oracle" ~/.claude/skills/terrylica-cc-skills-backtesting-py-oracle && rm -rf "$T"
manifest: plugins/quant-research/skills/backtesting-py-oracle/SKILL.md
source content

backtesting.py Oracle Validation for Range Bar Patterns

Configuration and anti-patterns for using backtesting.py to validate ClickHouse SQL sweep results. Ensures bit-atomic replicability between SQL and Python trade evaluation.

Companion skills:

clickhouse-antipatterns
(SQL correctness, AP-16) |
sweep-methodology
(sweep design) |
rangebar-eval-metrics
(evaluation metrics)

Validated: Gen600 oracle verification (2026-02-12) — 3 assets, 5 gates, ALL PASS.


Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

Critical Configuration (NEVER omit)

from backtesting import Backtest

bt = Backtest(
    df,
    Strategy,
    cash=100_000,
    commission=0,
    hedging=True,           # REQUIRED: Multiple concurrent positions
    exclusive_orders=False,  # REQUIRED: Don't auto-close on new signal
)

Why: SQL evaluates each signal independently (overlapping trades allowed). Without

hedging=True
, backtesting.py skips signals while a position is open, producing fewer trades than SQL. This was discovered when SOLUSDT produced 105 Python trades vs 121 SQL trades — 16 signals were silently skipped.


Anti-Patterns (Ordered by Severity)

BP-01: Missing Multi-Position Mode (CRITICAL)

Symptom: Python produces fewer trades than SQL. Gate 1 (signal count) fails.

Root Cause: Default

exclusive_orders=True
prevents opening new positions while one is active.

Fix: Always use

hedging=True, exclusive_orders=False
.

BP-02: ExitTime Sort Order (CRITICAL)

Symptom: Entry prices appear mismatched (Gate 3 fails) even though both SQL and Python use the same price source.

Root Cause:

stats._trades
is sorted by ExitTime, not EntryTime. When overlapping trades exit in a different order than they entered, trade[i] no longer maps to signal[i].

Fix:

trades = stats._trades.sort_values("EntryTime").reset_index(drop=True)

BP-03: NaN Poisoning in Rolling Quantile (CRITICAL)

Symptom: Cross-asset tests fail with far fewer Python trades. Feature quantile becomes NaN and propagates forward indefinitely.

Root Cause:

np.percentile
with NaN inputs returns NaN. If even one NaN feature value enters the rolling window, all subsequent quantiles become NaN, making all subsequent filter comparisons fail.

Fix: Skip NaN values when building the signal window:

def _rolling_quantile_on_signals(feature_arr, is_signal_arr, quantile_pct, window=1000):
    result = np.full(len(feature_arr), np.nan)
    signal_values = []
    for i in range(len(feature_arr)):
        if is_signal_arr[i]:
            if len(signal_values) > 0:
                window_data = signal_values[-window:]
                result[i] = np.percentile(window_data, quantile_pct * 100)
            # Only append non-NaN values (matches SQL quantileExactExclusive NULL handling)
            if not np.isnan(feature_arr[i]):
                signal_values.append(feature_arr[i])
    return result

BP-04: Data Range Mismatch (MODERATE)

Symptom: Different signal counts between SQL and Python for assets with early data (BNB, XRP).

Root Cause:

load_range_bars()
defaults to
start='2020-01-01'
but SQL has no lower bound.

Fix: Always pass

start='2017-01-01'
to cover all available data.

BP-05: Margin Exhaustion with Overlapping Positions (MODERATE)

Symptom: Orders canceled with insufficient margin. Fewer trades than expected.

Root Cause: With

hedging=True
and default full-equity sizing, overlapping positions exhaust available margin.

Fix: Use fixed fractional sizing:

self.buy(size=0.01)  # 1% equity per trade

BP-06: Signal Timestamp vs Entry Timestamp (LOW)

Symptom: Gate 2 (timestamp match) fails because SQL uses signal bar timestamps while Python uses entry bar timestamps.

Root Cause: SQL outputs the signal detection bar's

timestamp_ms
. Python's
EntryTime
is the fill bar (next bar after signal). These differ by 1 bar.

Fix: Record signal bar timestamps in the strategy's

next()
method:

# Before calling self.buy()
self._signal_timestamps.append(int(self.data.index[-1].timestamp() * 1000))

5-Gate Oracle Validation Framework

GateMetricThresholdWhat it catches
1Signal Count<5% diffMissing signals, filter misalignment
2Timestamp Match>95%Timing offset, warmup differences
3Entry Price>95%Price source mismatch, sort ordering
4Exit Type>90%Barrier logic differences
5Kelly Fraction<0.02Aggregate outcome alignment

Expected residual: 1-2 exit type mismatches per asset at TIME barrier boundary (bar 50). SQL uses

fwd_closes[max_bars]
, backtesting.py closes at current bar price. Impact on Kelly < 0.006.


Strategy Architecture: Single vs Multi-Position

ModeConstructorUse CasePosition Sizing
Single-position
hedging=False
(default)
Champion 1-bar holdFull equity
Multi-position
hedging=True, exclusive_orders=False
SQL oracle validationFixed fractional (
size=0.01
)

Multi-Position Strategy Template

class Gen600Strategy(Strategy):
    def next(self):
        current_bar = len(self.data) - 1

        # 1. Register newly filled trades and set barriers
        for trade in self.trades:
            tid = id(trade)
            if tid not in self._known_trades:
                self._known_trades.add(tid)
                self._trade_entry_bar[tid] = current_bar
                actual_entry = trade.entry_price
                if self.tp_mult > 0:
                    trade.tp = actual_entry * (1.0 + self.tp_mult * self.threshold_pct)
                if self.sl_mult > 0:
                    trade.sl = actual_entry * (1.0 - self.sl_mult * self.threshold_pct)

        # 2. Check time barrier for each open trade
        for trade in list(self.trades):
            tid = id(trade)
            entry_bar = self._trade_entry_bar.get(tid, current_bar)
            if self.max_bars > 0 and (current_bar - entry_bar) >= self.max_bars:
                trade.close()
                self._trade_entry_bar.pop(tid, None)

        # 3. Check for new signal (no position guard — overlapping allowed)
        if self._is_signal[current_bar]:
            self.buy(size=0.01)

Data Loading

from data_loader import load_range_bars

df = load_range_bars(
    symbol="SOLUSDT",
    threshold=1000,
    start="2017-01-01",      # Cover all available data
    end="2025-02-05",        # Match SQL cutoff
    extra_columns=["volume_per_trade", "lookback_price_range"],  # Gen600 features
)

HTML Equity Plot: Y-Axis Auto-Fit on Zoom (BP-07 to BP-10)

backtesting.py generates Bokeh HTML plots via

bt.plot()
. By default, the y-axis is fixed — zooming on the x-axis does NOT rescale the y-axis to fit visible data. This makes it impossible to inspect zoomed-in regions of equity curves that span 4+ orders of magnitude.

Reference implementation:

scripts/gen800/plotting.py
in opendeviationbar-patterns.

BP-07: NEVER Use LogScale for Equity (CRITICAL)

Symptom: Equity panel y-axis doesn't auto-fit on zoom. JS callbacks fail silently.

Root Cause: Bokeh's

LogScale
breaks
CustomJS
y-range callbacks in multi-panel linked x_range layouts. The
js_on_change
callback fires but
y_range.start
/
y_range.end
assignments are ignored by the LogScale renderer.

Proven via POC: LogScale + JS callback works in single-panel mode but fails when 5 panels share a linked x_range.

Fix: Transform equity data to

log10()
in Python, display on linear scale with a
CustomJSTickFormatter
that shows
10^tick
as readable values:

import numpy as np
from bokeh.models import CustomJSTickFormatter, Range1d

# Transform data
raw = np.asarray(src.data["equity"], dtype=float)
raw = np.where(raw > 0, raw, 1e-10)
src.data["equity"] = np.log10(raw).tolist()

# Custom tick formatter (shows 1%, 100%, 10K%, 1M%)
child.yaxis[0].formatter = CustomJSTickFormatter(code="""
    const v = Math.pow(10, tick);
    if (v >= 1e6) return (v/1e6).toFixed(1) + 'M%';
    if (v >= 1e3) return (v/1e3).toFixed(0) + 'K%';
    if (v >= 1) return v.toFixed(0) + '%';
    return v.toFixed(2) + '%';
""")

# MUST set initial y_range explicitly (backtesting.py leaves it as NaN)
valid = eq[np.isfinite(eq)]
pad = (valid.max() - valid.min()) * 0.05
child.y_range = Range1d(start=valid.min() - pad, end=valid.max() + pad)

BP-08: Panel-Aware Column Matching (CRITICAL)

Symptom: Drawdown/P&L panels go blank when zooming. Shows millions of percent.

Root Cause: Multiple panels share the same

ColumnDataSource
(backtesting.py optimization). The Equity panel's Line renderer and the Drawdown panel's Line renderer both reference a source containing
equity
,
drawdown
,
High
,
Low
, etc. A naive "find first y column" approach picks
equity
for the Drawdown panel.

Fix: Match by

panel_label
FIRST, then by column name:

panel_label = child.yaxis[0].axis_label or ""

# Label-first matching (panels share data sources!)
if "Drawdown" in panel_label and "drawdown" in d:
    hi_col = lo_col = "drawdown"
elif "Equity" in panel_label and "equity" in d:
    hi_col = lo_col = "equity"
elif "High" in d and "Low" in d:  # OHLC
    hi_col, lo_col = "High", "Low"

BP-09: JS Callback Pattern for Y-Axis Auto-Fit (PROVEN)

Attach a

CustomJS
callback to
x_range.js_on_change
for each panel. The callback scans visible data and sets
y_range
directly:

from bokeh.models import CustomJS

cb = CustomJS(
    args={"source": src, "yr": child.y_range,
          "x_col": "index", "y_col": "equity"},
    code="""
    const xs = source.data[x_col];
    const ys = source.data[y_col];
    const x0 = cb_obj.start, x1 = cb_obj.end;
    let lo = Infinity, hi = -Infinity;
    for (let i = 0; i < xs.length; i++) {
        if (xs[i] >= x0 && xs[i] <= x1 && isFinite(ys[i])) {
            if (ys[i] < lo) lo = ys[i];
            if (ys[i] > hi) hi = ys[i];
        }
    }
    if (isFinite(lo) && isFinite(hi) && lo < hi) {
        const pad = (hi - lo) * 0.05 || 0.001;
        yr.start = lo - pad;
        yr.end = hi + pad;
    }
    """,
)
child.x_range.js_on_change("start", cb)
child.x_range.js_on_change("end", cb)

BP-10: DataRange1d(only_visible=True) Is Unreliable

Symptom:

DataRange1d(only_visible=True)
works for simple cases but fails with:

  • LogScale panels (y-range computed in wrong space)
  • VBar renderers (candlesticks — bounds not tracked correctly)
  • Shared data sources across panels

Fix: Always use the JS callback pattern (BP-09) instead of

DataRange1d(only_visible=True)
.

Panel Data Source Reference (backtesting.py internals)

PanelLabelRendererY ColumnSource
Equity
"Equity"
Patch (
equity_dd
), Line (
equity
)
equity
Shared OHLC source
Drawdown
"Drawdown"
Line (
drawdown
), Scatter (peak)
drawdown
Shared OHLC source
P/L
"Profit / Loss"
Scatter (
returns
), MultiLine
y
or
returns
Separate trade source
OHLC
""
(no label)
Segment, VBar
High
/
Low
Shared OHLC source
Volume
"Volume"
VBar
Volume
(top)
Shared OHLC source

Project Artifacts (rangebar-patterns repo)

ArtifactPath
Oracle comparison script
scripts/gen600_oracle_compare.py
Gen600 strategy (reference)
backtest/backtesting_py/gen600_strategy.py
SQL oracle query template
sql/gen600_oracle_trades.sql
Oracle validation findings
findings/2026-02-12-gen600-oracle-validation.md
Backtest CLAUDE.md
backtest/CLAUDE.md
ClickHouse AP-16
.claude/skills/clickhouse-antipatterns/SKILL.md
Fork source
~/fork-tools/backtesting.py/

Post-Execution Reflection

After this skill completes, check before closing:

  1. Did the command succeed? — If not, fix the instruction or error table that caused the failure.
  2. Did parameters or output change? — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
  3. Was a workaround needed? — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.