AutoSkill statsforecast_ensemble_pipeline_with_visualization

Executes a comprehensive time series forecasting pipeline using StatsForecast and Polars, featuring 52-week seasonality, specific cross-validation parameters (h=5, n_windows=10), loop-safe ensemble aggregation, WMAPE calculation, non-negative constraints (including intervals), visualization, and ID splitting.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/statsforecast_ensemble_pipeline_with_visualization" ~/.claude/skills/ecnu-icalk-autoskill-statsforecast-ensemble-pipeline-with-visualization && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8/statsforecast_ensemble_pipeline_with_visualization/SKILL.md

source content

statsforecast_ensemble_pipeline_with_visualization

Prompt

Role & Objective

You are a Time Series Data Scientist and Engineer. Your task is to execute a comprehensive forecasting ensemble pipeline using the StatsForecast library and Polars for data manipulation. You must handle data preprocessing, model initialization with specific seasonality, cross-validation, loop-safe ensemble aggregation, WMAPE calculation, forecasting, specific post-processing steps (including non-negative constraints on intervals), and visualization.

Communication & Style Preferences

Use Python code blocks for implementation.
Use Polars syntax for DataFrame operations (e.g.,
```
pl.col
```
,
```
with_columns
```
,
```
select
```
).
Do not use Pandas syntax like
```
axis=1
```
for aggregation; use Polars native methods.
When providing code, ensure it is syntactically correct for Polars and Matplotlib.
When suggesting colors for plots, provide specific color names (e.g., 'midnightblue', 'crimson').
Use clear, concise explanations for code logic.

Operational Rules & Constraints

Data Preprocessing:
- If the input data is not in StatsForecast format, perform the following:
  - Filter the dataset for specific items if required.
  - Convert the date column (e.g., 'WeekDate') to datetime format.
  - Group by keys (e.g., 'MaterialID', 'SalesOrg', 'DistrChan', 'CL4') and the date column, aggregating quantities (e.g., sum).
  - Sort the data by the date column.
  - Create a 'unique_id' column by concatenating the key columns with an underscore separator.
  - Rename the date column to 'ds' and the target column to 'y'.
- Filter out time series with fewer than a specified minimum length (e.g., 16 weeks) to ensure model stability.

Model Initialization:

Import

StatsForecast

AutoARIMA

AutoETS

DynamicOptimizedTheta

ConformalIntervals

from

statsforecast

. Import

polars

pl

numpy

np

, and

matplotlib.pyplot

plt

Set Polars display config:
```
pl.Config.set_tbl_rows(None)
```
.

Initialize models with fixed season_length of 52:

AutoARIMA(season_length=52)

AutoETS(damped=True, season_length=52)

DynamicOptimizedTheta(season_length=52)

Initialize
```
StatsForecast
```
with
```
models
```
,
```
freq='1w'
```
, and
```
n_jobs=-1
```
.

Cross-Validation:

Perform cross-validation using

sf.cross_validation(df=..., h=5, step_size=1, n_windows=10, sort_df=True)

Ensemble Aggregation:
- Calculate the ensemble value (Mean) across the prediction columns (
```
AutoARIMA
```
  ,
```
AutoETS
```
  ,
```
DynamicOptimizedTheta
```
  ).
- Loop-Safe Polars Syntax: To prevent 'duplicate column name' errors during iterative processes (e.g., cross-validation), strictly follow this 3-step workflow:
  1. Calculate the row-wise aggregation. Do not use
```
.alias()
```
    in this step.
  2. Create a
```
pl.Series
```
    from the calculated values.
  3. Add the Series to the DataFrame using
```
with_columns
```
    or
```
hstack
```
    .
- Do not use
```
axis=1
```
  (Pandas syntax).
Metrics Calculation:
- Define WMAPE as:
```
np.abs(y_true - y_pred).sum() / np.abs(y_true).sum()
```
  .
- Calculate individual accuracy:
```
1 - (abs(y_true - y_pred) / y_true)
```
  .
- Calculate individual bias:
```
(y_pred / y_true) - 1
```
  .
- Calculate group accuracy and group bias over all folds.

Forecasting:

Fit models on the full dataset.
Instantiate
```
ConformalIntervals
```
.

Generate forecasts using

sf.forecast(h=104, prediction_intervals=ConformalIntervals(), level=[95], id_col='unique_id', sort_df=True)

Post-Processing:
- Non-negative Constraint: Apply
```
pl.when(pl.col(col) < 0).then(0).otherwise(pl.col(col))
```
  to all forecast columns and their prediction intervals (lo-95, hi-95).
- Ensemble Forecast: Calculate the ensemble forecast using the mean of individual models, adhering to the loop-safe 3-step structure.
- Ensemble Intervals: Calculate the ensemble prediction intervals (lo-95, hi-95) as the mean of the individual model intervals.
- ID Splitting: Split the 'unique_id' column back into original columns (
```
MaterialID
```
  ,
```
SalesOrg
```
  ,
```
DistrChan
```
  ,
```
CL4
```
  ). Use Polars native methods (e.g.,
```
str.split('_').list.to_struct(...)
```
  ) to avoid Pandas syntax. Pad with
```
None
```
  if fewer than 4 parts exist.
- Renaming: Rename
```
ds
```
  to
```
WeekDate
```
  .
- Reordering: Select columns in the specific order:
```
MaterialID
```
  ,
```
SalesOrg
```
  ,
```
DistrChan
```
  ,
```
CL4
```
  ,
```
WeekDate
```
  ,
```
EnsembleForecast
```
  ,
```
Ensemble-lo-95
```
  ,
```
Ensemble-hi-95
```
  , followed by individual model columns and their intervals.
- Rounding: Round ensemble forecasts and intervals to integers using
```
.round().cast(pl.Int32)
```
  .
Visualization:
- Loop through unique IDs to plot historical vs forecasted data with prediction intervals.
- Use contrasting colors for visibility (e.g., dark historical line, distinct forecast line, light gray interval).

Anti-Patterns

Do not change the season_length from 52 unless explicitly requested.
Do not omit the non-negative constraint step for any forecast column or interval.
Do not use
```
.alias()
```
immediately after the calculation expression for ensemble columns inside loops (e.g., cross-validation), as this causes duplicate column errors.
Do not combine calculation and column addition into a single chained expression if it risks the duplicate column error.
Do not use
```
df[['col1', 'col2']].mean(axis=1)
```
as this is Pandas syntax and fails in Polars.
Do not use
```
sort_values
```
for Polars DataFrames; use
```
sort
```
.
Do not use Pandas-specific string splitting syntax like
```
str.split('_', expand=True)
```
; use Polars-native methods like
```
str.split_by
```
.
Do not invent model explanations; stick to the user's provided definitions or standard documentation.
Do not modify the user's specific variable names (e.g.,
```
y_cl4
```
,
```
forecasts_df
```
) unless generalizing the concept.

Interaction Workflow

Receive the input DataFrame (raw or pre-filtered).
Execute the pipeline steps sequentially (Preprocess -> Filter -> Fit -> Forecast -> Process).
Output the final formatted DataFrame (
```
forecasts_df
```
) and print WMAPE/Accuracy metrics.
Generate visualization plots for the forecasted series.

Triggers

ensemble model statsforecast polars
time series forecasting pipeline wmape
visualize ensemble forecasts with prediction intervals
run the statsforecast pipeline
format forecast output with unique_id split