Anofox Forecast DuckDB Extension — Cheat Sheet
Extension:
anofox_forecast
v0.4.6 |
DuckDB: v1.4.x+ |
Dual naming: ts_*
and
anofox_fcst_ts_*
Installation
LOAD anofox_forecast;
-- All functions available as ts_* and anofox_fcst_ts_* (identical)
Critical Gotchas
-
Seasonality is NOT auto-detected. You must pass
seasonal_period
explicitly.
Detect first with ts_detect_periods_by
, then pass to forecasting.
-
DO NOT chain _by
table functions in CTEs. Returns 0 rows silently under parallel execution.
Always
CREATE TABLE
between pipeline steps:
-- BROKEN (0 rows):
WITH step1 AS (SELECT * FROM ts_fill_gaps_by(...))
SELECT * FROM ts_fill_nulls_const_by('step1', ...);
-- CORRECT:
CREATE TABLE step1 AS SELECT * FROM ts_fill_gaps_by(...);
SELECT * FROM ts_fill_nulls_const_by('step1', ...);
-
Model names are case-sensitive.
'AutoETS'
works, 'autoets'
errors.
-
ts_cv_forecast_by
requires pre-created folds. Input table must have
fold_id
and split
columns
(from ts_cv_folds_by
or ts_cv_split_by
). Passing raw data throws a clear error.
-
ts_forecast_by
requires frequency as 7th positional parameter. No default — you must specify it:
-- WRONG: missing frequency
SELECT * FROM ts_forecast_by('sales', id, date, val, 'Naive', 12);
-- CORRECT:
SELECT * FROM ts_forecast_by('sales', id, date, val, 'Naive', 12, '1d');
-
Metric _by
table macros are deprecated. Use scalar functions with
GROUP BY
:
-- Deprecated: SELECT * FROM ts_mae_by(...)
-- Use instead:
SELECT id, ts_mae(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds)) AS mae
FROM results GROUP BY id;
-
Always use ORDER BY
in LIST()
for temporal correctness:
LIST(value ORDER BY date) -- correct
LIST(value) -- wrong: order not guaranteed
Three API Styles
1. Table Macros (primary — use these)
Operate on table names as strings. Handle grouping automatically.
SELECT * FROM ts_forecast_by('sales', product_id, date, revenue, 'AutoETS', 14, '1d',
MAP{'seasonal_period': '7'});
2. Scalar Functions
Operate on arrays. Use with
LIST()
aggregation and
GROUP BY
.
SELECT product_id,
ts_mae(LIST(actual ORDER BY date), LIST(forecast ORDER BY date)) AS mae
FROM results GROUP BY product_id;
3. Aggregate Functions
Return structs. Access fields with
(result).field_name
.
SELECT product_id, (ts_stats(LIST(value ORDER BY date))).*
FROM sales GROUP BY product_id;
Parameter Syntax
STRUCT (recommended)
MAP{'seasonal_period': '7'}
MAP{'seasonal_periods': '[7, 365]'}
MAP{'method': 'autoperiod', 'max_period': '28'}
All param values are strings (even numbers). Arrays use JSON syntax:
'[7, 365]'
.
Frequency Strings
| Format | Examples |
|---|
| Polars style | '1d' , '1h' , '30m' , '1w' , '1mo' , '1q' , '1y'
|
| DuckDB INTERVAL | '1 day' , '1 hour'
|
| Raw integer | '1' , '7' (interpreted as days) |
Common Workflows
1. Basic Forecast
-- Forecast 14 days ahead with weekly seasonality
SELECT * FROM ts_forecast_by(
'sales', product_id, date, revenue,
'HoltWinters', 14, '1d',
MAP{'seasonal_period': '7'}
);
2. Data Preparation Pipeline (CREATE TABLE between steps!)
-- Step 1: Fill gaps
CREATE TABLE gaps_filled AS
SELECT * FROM ts_fill_gaps_by('raw_data', product_id, date, value, '1d');
-- Step 2: Impute NULLs
CREATE TABLE nulls_filled AS
SELECT * FROM ts_fill_nulls_const_by('gaps_filled', product_id, date, value, 0.0);
-- Step 3: Drop short series
CREATE TABLE clean_data AS
SELECT * FROM ts_drop_short_by('nulls_filled', product_id, 20);
3. Detect Seasonality → Forecast
-- Step 1: Detect
SELECT id, (periods).primary_period
FROM ts_detect_periods_by('sales', product_id, date, value, MAP{});
-- Returns e.g. primary_period = 7 (weekly)
-- Step 2: Forecast with detected period
SELECT * FROM ts_forecast_by(
'sales', product_id, date, value,
'AutoETS', 14, '1d', MAP{'seasonal_period': '7'}
);
4. Cross-Validation & Model Comparison
-- Step 1: Create folds
CREATE TABLE cv_folds AS
SELECT * FROM ts_cv_folds_by('data', unique_id, ds, y, 3, 12, MAP{});
-- Step 2: Forecast per fold (for each model)
CREATE TABLE cv_naive AS
SELECT * FROM ts_cv_forecast_by('cv_folds', unique_id, ds, y, 'Naive', MAP{});
CREATE TABLE cv_autoets AS
SELECT * FROM ts_cv_forecast_by('cv_folds', unique_id, ds, y, 'AutoETS',
MAP{'seasonal_period': '7'});
-- Step 3: Compare metrics
SELECT 'Naive' AS model,
ts_mae(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds)) AS mae,
ts_rmse(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds)) AS rmse
FROM cv_naive GROUP BY ALL
UNION ALL
SELECT 'AutoETS',
ts_mae(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds)),
ts_rmse(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds))
FROM cv_autoets GROUP BY ALL;
5. Full Production Pipeline
-- 1. Quality check
SELECT id, (stats).length, (stats).n_nulls, (stats).n_gaps
FROM ts_stats_by('raw', product_id, date, value, '1d');
-- 2. Prep (materialize each step!)
CREATE TABLE step1 AS
SELECT * FROM ts_fill_gaps_by('raw', product_id, date, value, '1d');
CREATE TABLE step2 AS
SELECT * FROM ts_fill_nulls_const_by('step1', product_id, date, value, 0.0);
CREATE TABLE clean AS
SELECT * FROM ts_drop_short_by('step2', product_id, 20);
-- 3. Detect seasonality
SELECT id, (periods).primary_period
FROM ts_detect_periods_by('clean', product_id, date, value, MAP{});
-- 4. Backtest
CREATE TABLE cv_folds AS
SELECT * FROM ts_cv_folds_by('clean', product_id, date, value, 5, 14, MAP{});
CREATE TABLE backtest AS
SELECT * FROM ts_cv_forecast_by('cv_folds', product_id, date, value, 'AutoETS',
MAP{'seasonal_period': '7'});
-- 5. Evaluate
SELECT product_id,
ts_mae(LIST(y ORDER BY date), LIST(yhat ORDER BY date)) AS mae,
ts_rmse(LIST(y ORDER BY date), LIST(yhat ORDER BY date)) AS rmse
FROM backtest GROUP BY product_id;
-- 6. Forecast
CREATE TABLE forecasts AS
SELECT * FROM ts_forecast_by('clean', product_id, date, value,
'AutoETS', 14, '1d', MAP{'seasonal_period': '7'});
-- 7. Conformal intervals
CREATE TABLE calibration AS
SELECT * FROM ts_conformal_calibrate('backtest', value, yhat, {'alpha': 0.1});
SELECT * FROM ts_conformal_apply_by(
'forecasts', product_id, yhat,
(SELECT conformity_score FROM calibration)
);
Model Quick Reference (32 Models)
Automatic Selection (6)
| Model | Optional Params | Best For |
|---|
AutoETS
| seasonal_period
| Unknown patterns (default pick) |
AutoARIMA
| seasonal_period
| Unknown patterns, ARIMA family |
AutoTheta
| seasonal_period
| Unknown patterns, Theta family |
AutoMFLES
| seasonal_periods[]
| Multiple seasonalities |
AutoMSTL
| seasonal_periods[]
| Multiple seasonalities |
AutoTBATS
| seasonal_periods[]
| Multiple seasonalities |
Basic (6)
| Model | Required | Optional | Best For |
|---|
Naive
| — | — | Baseline benchmark |
SMA
| — | window (def: 5) | Smoothed baseline |
SeasonalNaive
| seasonal_period | — | Seasonal baseline |
SES
| — | alpha (def: 0.3) | No trend, no seasonality |
SESOptimized
| — | — | Optimized SES |
RandomWalkDrift
| — | — | Trend without seasonality |
Exponential Smoothing (4)
| Model | Required | Optional |
|---|
Holt
| — | alpha , beta
|
HoltWinters
| seasonal_period | alpha , beta , gamma
|
SeasonalES
| seasonal_period | alpha , gamma
|
SeasonalESOptimized
| seasonal_period | — |
Theta Methods (5)
| Model | Optional |
|---|
Theta
| seasonal_period , theta
|
OptimizedTheta
| seasonal_period
|
DynamicTheta
| seasonal_period , theta
|
DynamicOptimizedTheta
| seasonal_period
|
AutoTheta
| seasonal_period
|
State Space & ARIMA (4)
| Model | Required | Optional |
|---|
ETS
| — | seasonal_period , model
|
AutoETS
| — | seasonal_period
|
ARIMA
| p, d, q | P , D , Q , s
|
AutoARIMA
| — | seasonal_period
|
Multiple Seasonality (6)
| Model | Required | Optional |
|---|
MFLES
| seasonal_periods[] | iterations
|
AutoMFLES
| — | seasonal_periods[]
|
MSTL
| seasonal_periods[] | stl_method
|
AutoMSTL
| — | seasonal_periods[]
|
TBATS
| seasonal_periods[] | use_box_cox
|
AutoTBATS
| — | seasonal_periods[]
|
Intermittent Demand (6)
| Model | Optional | Best For |
|---|
CrostonClassic
| — | Sparse demand |
CrostonOptimized
| — | Sparse demand |
CrostonSBA
| — | Sparse demand (bias-corrected) |
ADIDA
| — | Aggregate-Disaggregate |
IMAPA
| — | Multiple aggregation |
TSB
| alpha_d , alpha_p
| Best intermittent (tunable) |
Model Selection Guide
| Data Characteristics | Recommended Models |
|---|
| No trend, no seasonality | Naive , SES , SESOptimized
|
| Trend, no seasonality | Holt , Theta , RandomWalkDrift
|
| Single seasonal period | SeasonalNaive , HoltWinters , SeasonalES
|
| Multiple seasonalities | MSTL , MFLES , TBATS
|
| Many zeros (intermittent) | CrostonClassic , CrostonSBA , TSB
|
| Unknown characteristics | AutoETS , AutoARIMA , AutoTheta
|
| Short series (< 20 pts) | Naive , SES
|
| Scenario | First Try | Alternative |
|---|
| Daily retail sales | HoltWinters
| MSTL
|
| Weekly financial data | Theta
| AutoETS
|
| Hourly sensor data | MFLES
| MSTL
|
| Spare parts demand | CrostonSBA
| TSB
|
Function Quick Reference
Forecasting
| Function | Purpose |
|---|
ts_forecast_by(table, group, date, value, method, horizon, frequency, params)
| Multi-series forecast |
ts_forecast_exog_by(table, group, date, value, x_cols, future_table, future_date, future_x, model, horizon, params, freq)
| Forecast with exogenous variables |
Data Preparation
| Function | Purpose |
|---|
ts_fill_gaps_by(table, group, date, value, freq)
| Fill missing timestamps with NULL |
ts_fill_forward_by(table, group, date, value, target_date, freq)
| Extend series to target date |
ts_fill_nulls_const_by(table, group, date, value, fill_val)
| Replace NULLs with constant |
ts_fill_nulls_forward_by(table, group, date, value)
| Forward-fill NULLs |
ts_fill_nulls_backward_by(table, group, date, value)
| Backward-fill NULLs |
ts_fill_nulls_mean_by(table, group, date, value)
| Fill NULLs with mean |
ts_drop_constant_by(table, group, value)
| Remove constant series |
ts_drop_short_by(table, group, min_len)
| Remove short series |
ts_drop_gappy_by(table, group, value, max_gap_ratio)
| Remove gappy series |
ts_drop_zeros_by(table, group, value)
| Remove all-zero series |
ts_drop_leading_zeros_by(table, group, date, value)
| Trim leading zeros |
ts_drop_trailing_zeros_by(table, group, date, value)
| Trim trailing zeros |
ts_drop_edge_zeros_by(table, group, date, value)
| Trim both edges |
ts_diff_by(table, group, date, value, order)
| Compute differences |
Statistics & Quality
| Function | Purpose |
|---|
ts_stats_by(table, group, date, value, freq)
| 36 statistics per series |
ts_data_quality_by(table, group, date, value, min_len, freq)
| Quality scores (0-1) |
ts_quality_report(stats_table, min_len)
| Summary quality report |
ts_stats_summary(stats_table)
| Summary across all series |
Period Detection & Decomposition
| Function | Purpose |
|---|
ts_detect_periods_by(table, group, date, value, params)
| Detect seasonal periods |
ts_classify_seasonality_by(table, group, date, value, period)
| Classify seasonality type |
ts_mstl_decomposition_by(table, group, date, value, periods[], params)
| MSTL decomposition |
ts_detrend_by(table, group, date, value, method)
| Remove trend |
ts_detect_peaks_by(table, group, date, value, params)
| Detect peaks |
ts_analyze_peak_timing_by(table, group, date, value, period, params)
| Peak timing analysis |
Cross-Validation
| Function | Purpose |
|---|
ts_cv_folds_by(table, group, date, value, n_folds, horizon, params)
| Create CV folds |
ts_cv_forecast_by(folds_table, group, date, value, method, params)
| Forecast on CV folds |
ts_cv_split_by(table, group, date, value, cutoff_dates[], horizon, params)
| Custom fold boundaries |
ts_cv_hydrate_by(folds, source, group, date, features[], params)
| Add features to folds |
Evaluation Metrics (scalar — use with GROUP BY)
| Function | Signature | Description |
|---|
ts_mae
| (LIST, LIST) → DOUBLE
| Mean Absolute Error |
ts_mse
| (LIST, LIST) → DOUBLE
| Mean Squared Error |
ts_rmse
| (LIST, LIST) → DOUBLE
| Root Mean Squared Error |
ts_mape
| (LIST, LIST) → DOUBLE
| Mean Absolute Percentage Error |
ts_smape
| (LIST, LIST) → DOUBLE
| Symmetric MAPE |
ts_r2
| (LIST, LIST) → DOUBLE
| R-squared |
ts_bias
| (LIST, LIST) → DOUBLE
| Bias (mean error) |
ts_mase
| (LIST, LIST, LIST) → DOUBLE
| Mean Absolute Scaled Error |
ts_rmae
| (LIST, LIST, LIST) → DOUBLE
| Relative MAE |
ts_coverage
| (LIST, LIST, LIST) → DOUBLE
| Interval coverage |
ts_quantile_loss
| (LIST, LIST, DOUBLE) → DOUBLE
| Quantile loss |
Conformal Prediction
| Function | Purpose |
|---|
ts_conformal_by(backtest, group, actual, forecast, point_forecast, params)
| One-step conformal intervals |
ts_conformal_calibrate(backtest, actual, forecast, params)
| Calibrate conformity score |
ts_conformal_apply_by(forecasts, group, forecast_col, score)
| Apply calibrated score |
ts_conformal_predict(residuals[], forecasts[], alpha)
| Array-based conformal |
ts_conformal_predict_asymmetric(residuals[], forecasts[], alpha)
| Asymmetric intervals |
ts_conformal_quantile(residuals[], alpha)
| Compute conformity quantile |
ts_conformal_intervals(forecasts[], score)
| Apply score to array |
ts_conformal_coverage(actuals[], lower[], upper[])
| Empirical coverage |
ts_conformal_evaluate(actuals[], lower[], upper[], alpha)
| Full evaluation |
ts_interval_width_by(table, group, lower, upper)
| Mean interval width |
Feature Extraction
| Function | Purpose |
|---|
ts_features_by(table, group, date, value)
| Extract 117 tsfresh features |
ts_features_list()
| List available features |
Hierarchical
| Function | Purpose |
|---|
ts_combine_keys((SELECT date, val, id1, id2, ...), params)
| Combine ID columns |
ts_aggregate_hierarchy((SELECT date, val, id1, id2, ...), params)
| Aggregate at all levels |
ts_split_keys((SELECT uid, date, val), ...)
| Split combined keys |
ts_validate_separator((SELECT id1, id2, ...), ...)
| Validate separator char |
Changepoint Detection
| Function | Purpose |
|---|
ts_detect_changepoints_by(table, group, date, value, params)
| Detect structural breaks |
Timestamp Validation
| Function | Purpose |
|---|
ts_validate_timestamps_by(table, group, date, expected_dates[])
| Validate timestamps exist |
ts_validate_timestamps_summary_by(table, group, date, expected_dates[])
| Validation summary |
Future Value Handling
| Function | Purpose |
|---|
ts_fill_unknown_by(table, group, date, value, cutoff, params)
| Fill unknown future values |
ts_mark_unknown_by(table, group, date, cutoff)
| Mark known/unknown rows |
Minimum Data Requirements
| Model Category | Minimum Observations |
|---|
| Naive, SMA | 1+ |
| SES, Holt | 3+ |
| HoltWinters, SeasonalES | 2 × seasonal_period |
| MSTL, MFLES, TBATS | 2 × max(seasonal_periods) |
| AutoETS, AutoARIMA | 10+ (more is better) |
| Croston variants, TSB | 4+ |
| Cross-validation | horizon × n_folds + initial_train_size |
Testing Queries
Verify generated SQL works against the actual extension:
bash .claude/skills/anofox-forecast/scripts/test-query.sh "SELECT * FROM ts_features_list() LIMIT 3"
Always test SQL before presenting it to the user. The script runs in-memory with the project's built extension.
Deep-Dive References
For full function signatures, all parameters, return columns, and detailed examples:
- Forecasting Models — All 32 models, exogenous support
- Data Preparation — Filtering, imputation, gap filling
- Statistics & Quality — 36 stats, quality scores
- Period Detection & Decomposition — 12 detection methods, MSTL, detrending
- Cross-Validation — Folds, forecasting, hydration, custom splits
- Evaluation Metrics — All 11 scalar metrics
- Feature Extraction — 117 tsfresh features
- Conformal Prediction — Distribution-free intervals
- Hierarchical — Multi-key combine, aggregate, split
- Changepoint Detection — Structural break detection