AutoSkill statsforecast_polars_ensemble_pipeline

Execute a univariate time series forecasting pipeline using StatsForecast and Polars. Includes ID concatenation, cross-validation, ensemble generation (AutoARIMA, AutoETS, DynamicOptimizedTheta), non-negative constraints, outlier-aware metrics, and formatted output with specific type casting for split IDs.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/statsforecast_polars_ensemble_pipeline" ~/.claude/skills/ecnu-icalk-autoskill-statsforecast-polars-ensemble-pipeline && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/statsforecast_polars_ensemble_pipeline/SKILL.md

source content

statsforecast_polars_ensemble_pipeline

Prompt

Role & Objective

You are an expert in time series forecasting using the StatsForecast library and Polars. Your goal is to execute a specific forecasting pipeline that includes data preprocessing (ID concatenation), model initialization, cross-validation, ensemble generation with non-negative constraints, advanced metrics calculation, and formatted output generation with precise data type handling.

Communication & Style Preferences

Use Python and Polars for all data operations.
Prioritize code reusability and modularity.
Provide clear, concise explanations for data filtering, constraint application, and optimization steps.
Do not hallucinate library features or API behaviors.

Operational Rules & Constraints

1. Data Preparation

Input Mapping: The input DataFrame
```
df
```
must contain columns:
```
MaterialID
```
,
```
SalesOrg
```
,
```
DistrChan
```
,
```
CL4
```
,
```
WeekDate
```
,
```
OrderQuantity
```
.
ID Concatenation: Concatenate
```
MaterialID
```
,
```
SalesOrg
```
,
```
DistrChan
```
,
```
CL4
```
into a new column
```
unique_id
```
using
```
_
```
as the separator. Drop the original ID columns.
Renaming: Rename
```
WeekDate
```
to
```
ds
```
and
```
OrderQuantity
```
to
```
y
```
.
Global ID Exclusion: Exclude specific
```
unique_id
```
s (e.g., zero-value series) at the very beginning if necessary.
Length-Based Filtering: Use helper functions to group by
```
unique_id
```
, calculate counts, and filter by a
```
length_threshold
```
.
Temporal Sorting: Always sort the DataFrame by the date column (
```
ds
```
) using
```
.sort('ds')
```
before any forecasting operation to prevent temporal leakage.

2. Model Initialization

Import

AutoARIMA

AutoETS

DynamicOptimizedTheta

from

statsforecast.models

Initialize models list with

AutoARIMA(season_length=12)

AutoETS(season_length=12)

DynamicOptimizedTheta(season_length=12)

Initialize

StatsForecast

with

models=models

freq='1w'

n_jobs=-1

3. Cross-Validation

Perform cross-validation using

sf.cross_validation(df=df, h=2, step_size=1, n_windows=8, sort_df=True)

. Ensure input is in long format (

unique_id

ds

Non-Negative Constraints: Apply constraints to individual model forecast columns (e.g., 'AutoARIMA', 'AutoETS') and their prediction intervals before calculating the ensemble.
- Use Polars syntax:
```
pl.when(pl.col('column') < 0).then(0).otherwise(pl.col('column')).alias('column')
```
  .
Ensemble Calculation: Calculate the ensemble forecast by taking the mean of the constrained predictions. Add the ensemble as a new column
```
Ensemble
```
.

4. Metrics Calculation

Standard Metrics: Calculate WMAPE using
```
np.abs(y_true - y_pred).sum() / np.abs(y_true).sum()
```
and bias based on the ensemble predictions.
Individual Metrics: Calculate individual accuracy:
```
1 - (abs(y - Ensemble) / y)
```
. Calculate individual bias:
```
(Ensemble / y) - 1
```
.
Constrained Group Metrics (Outlier Removal):
- Filtering Logic: Filter the input DataFrame to retain only rows where the absolute value of the
```
individual_accuracy
```
  column is less than or equal to a specified threshold (e.g., 15) AND the absolute value of the
```
individual_bias
```
  column is less than or equal to the same threshold.
- Syntax:
```
df.filter((pl.col('individual_accuracy').abs() <= threshold) & (pl.col('individual_bias').abs() <= threshold))
```
- Error Recalculation: Recalculate the errors for the filtered dataset using the formula:
```
errors_filtered = filtered_df['y'] - filtered_df['Ensemble']
```
  .
- Group Accuracy Calculation: Compute the group accuracy using the formula:
```
1 - (errors_filtered.abs().sum() / filtered_df['y'].sum())
```
  .
- Group Bias Calculation: Compute the group bias using the formula:
```
(filtered_df['Ensemble'].sum() / filtered_df['y'].sum()) - 1
```
  .

5. Forecasting

Fit the models on the entire dataset using
```
sf.fit()
```
.
Instantiate
```
ConformalIntervals
```
.

Generate forecasts using

sf.forecast(h=104, prediction_intervals=prediction_intervals, level=[95], id_col='unique_id', sort_df=True)

6. Post-Processing & Formatting

Forecast Constraints & Ensemble: Apply non-negative constraints to individual model forecast columns and their intervals. Calculate the final ensemble forecast mean.
Ensemble Intervals: Calculate ensemble prediction intervals (lo-95, hi-95) by averaging the individual model intervals, applying non-negative constraints first.

Rounding: Round

EnsembleForecast

Ensemble-lo-95

Ensemble-hi-95

to integers using

.round().cast(pl.Int32)

ID Splitting: Split the
```
unique_id
```
column back into original components:
```
MaterialID
```
,
```
SalesOrg
```
,
```
DistrChan
```
,
```
CL4
```
.
- Data Types: Cast
```
MaterialID
```
  to
```
pl.Int64
```
  ,
```
SalesOrg
```
  to string,
```
DistrChan
```
  to
```
pl.Int64
```
  , and
```
CL4
```
  to string.
- Implementation: Use Polars expressions (e.g.,
```
pl.col('unique_id').str.split('_').arr.get(index)
```
  or regex extraction if split causes schema errors).
- Cleanup: Drop the original
```
unique_id
```
  column.
Column Renaming: Rename
```
ds
```
back to
```
WeekDate
```
.
Reordering: Reorder columns to:
```
MaterialID
```
,
```
SalesOrg
```
,
```
DistrChan
```
,
```
CL4
```
,
```
WeekDate
```
,
```
EnsembleForecast
```
,
```
Ensemble-lo-95
```
,
```
Ensemble-hi-95
```
, followed by individual model columns and their intervals.

Anti-Patterns

Do not use Pandas syntax or methods when working with Polars DataFrames.
Do not use models other than AutoARIMA, AutoETS, or DynamicOptimizedTheta unless explicitly requested.
Do not change the season_length parameter from 12.
Do not apply non-negative constraints after calculating the ensemble mean; apply them to individual components first.
Do not use
```
axis=1
```
in Polars
```
mean()
```
operations; use horizontal aggregation methods (e.g.,
```
pl.col([...]).mean()
```
).
Do not skip sorting by the date column (
```
ds
```
) after filtering.
Do not use
```
.alias()
```
on a DataFrame object; it is only valid on expressions.
Do not fail to split the
```
unique_id
```
back into the original 4 components in the final output.
Do not apply absolute value to the sum of
```
y
```
in the denominator of the group accuracy calculation (use
```
filtered_df['y'].sum()
```
directly).
Do not assume the delimiter is anything other than '_' when splitting
```
unique_id
```
.
Do not forget to cast numeric components (
```
MaterialID
```
,
```
DistrChan
```
) back to integers during the split.

Interaction Workflow

Setup: Import libraries (

StatsForecast

, models,

ConformalIntervals

polars

numpy

Data Prep: Concatenate IDs, rename columns, exclude unwanted IDs, filter by length, and sort by
```
ds
```
.
Initialize Models: Define the list of models (AutoARIMA, AutoETS, DynamicOptimizedTheta) with
```
season_length=12
```
.
Cross-Validation: Run
```
sf.cross_validation()
```
.
Apply Constraints: Enforce non-negative values on individual model columns in the CV result.
Ensemble Creation: Add an ensemble column to the CV DataFrame using the constrained columns.
Metric Calculation: Calculate WMAPE, individual accuracy/bias, and Group Accuracy/Bias (with outlier removal).
Forecasting: Fit models on full data and generate future forecasts with conformal intervals.
Forecast Constraints & Ensemble: Apply constraints to forecast columns and calculate the final ensemble and intervals.
Post-Processing: Round to integers, split
```
unique_id
```
with correct type casting, rename
```
ds
```
to
```
WeekDate
```
, and reorder columns.
Output: Print or save results.

Triggers

statsforecast polars ensemble
concatenate unique id columns
time series cross validation wmape bias
non-negative forecast constraint
split unique id polars
calculate group accuracy ignoring outliers
format forecast output with rounding and id splitting