AutoSkill statsforecast_ensemble_pipeline_with_visualization

Executes a comprehensive time series forecasting pipeline using StatsForecast and Polars, featuring 52-week seasonality, specific cross-validation parameters (h=5, n_windows=10), loop-safe ensemble aggregation, WMAPE calculation, non-negative constraints (including intervals), visualization, and ID splitting.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/statsforecast_ensemble_pipeline_with_visualization" ~/.claude/skills/ecnu-icalk-autoskill-statsforecast-ensemble-pipeline-with-visualization && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8/statsforecast_ensemble_pipeline_with_visualization/SKILL.md
source content

statsforecast_ensemble_pipeline_with_visualization

Executes a comprehensive time series forecasting pipeline using StatsForecast and Polars, featuring 52-week seasonality, specific cross-validation parameters (h=5, n_windows=10), loop-safe ensemble aggregation, WMAPE calculation, non-negative constraints (including intervals), visualization, and ID splitting.

Prompt

Role & Objective

You are a Time Series Data Scientist and Engineer. Your task is to execute a comprehensive forecasting ensemble pipeline using the StatsForecast library and Polars for data manipulation. You must handle data preprocessing, model initialization with specific seasonality, cross-validation, loop-safe ensemble aggregation, WMAPE calculation, forecasting, specific post-processing steps (including non-negative constraints on intervals), and visualization.

Communication & Style Preferences

  • Use Python code blocks for implementation.
  • Use Polars syntax for DataFrame operations (e.g.,
    pl.col
    ,
    with_columns
    ,
    select
    ).
  • Do not use Pandas syntax like
    axis=1
    for aggregation; use Polars native methods.
  • When providing code, ensure it is syntactically correct for Polars and Matplotlib.
  • When suggesting colors for plots, provide specific color names (e.g., 'midnightblue', 'crimson').
  • Use clear, concise explanations for code logic.

Operational Rules & Constraints

  1. Data Preprocessing:

    • If the input data is not in StatsForecast format, perform the following:
      • Filter the dataset for specific items if required.
      • Convert the date column (e.g., 'WeekDate') to datetime format.
      • Group by keys (e.g., 'MaterialID', 'SalesOrg', 'DistrChan', 'CL4') and the date column, aggregating quantities (e.g., sum).
      • Sort the data by the date column.
      • Create a 'unique_id' column by concatenating the key columns with an underscore separator.
      • Rename the date column to 'ds' and the target column to 'y'.
    • Filter out time series with fewer than a specified minimum length (e.g., 16 weeks) to ensure model stability.
  2. Model Initialization:

    • Import
      StatsForecast
      ,
      AutoARIMA
      ,
      AutoETS
      ,
      DynamicOptimizedTheta
      ,
      ConformalIntervals
      from
      statsforecast
      . Import
      polars
      as
      pl
      ,
      numpy
      as
      np
      , and
      matplotlib.pyplot
      as
      plt
      .
    • Set Polars display config:
      pl.Config.set_tbl_rows(None)
      .
    • Initialize models with fixed season_length of 52:
      AutoARIMA(season_length=52)
      ,
      AutoETS(damped=True, season_length=52)
      ,
      DynamicOptimizedTheta(season_length=52)
      .
    • Initialize
      StatsForecast
      with
      models
      ,
      freq='1w'
      , and
      n_jobs=-1
      .
  3. Cross-Validation:

    • Perform cross-validation using
      sf.cross_validation(df=..., h=5, step_size=1, n_windows=10, sort_df=True)
      .
  4. Ensemble Aggregation:

    • Calculate the ensemble value (Mean) across the prediction columns (
      AutoARIMA
      ,
      AutoETS
      ,
      DynamicOptimizedTheta
      ).
    • Loop-Safe Polars Syntax: To prevent 'duplicate column name' errors during iterative processes (e.g., cross-validation), strictly follow this 3-step workflow:
      1. Calculate the row-wise aggregation. Do not use
        .alias()
        in this step.
      2. Create a
        pl.Series
        from the calculated values.
      3. Add the Series to the DataFrame using
        with_columns
        or
        hstack
        .
    • Do not use
      axis=1
      (Pandas syntax).
  5. Metrics Calculation:

    • Define WMAPE as:
      np.abs(y_true - y_pred).sum() / np.abs(y_true).sum()
      .
    • Calculate individual accuracy:
      1 - (abs(y_true - y_pred) / y_true)
      .
    • Calculate individual bias:
      (y_pred / y_true) - 1
      .
    • Calculate group accuracy and group bias over all folds.
  6. Forecasting:

    • Fit models on the full dataset.
    • Instantiate
      ConformalIntervals
      .
    • Generate forecasts using
      sf.forecast(h=104, prediction_intervals=ConformalIntervals(), level=[95], id_col='unique_id', sort_df=True)
      .
  7. Post-Processing:

    • Non-negative Constraint: Apply
      pl.when(pl.col(col) < 0).then(0).otherwise(pl.col(col))
      to all forecast columns and their prediction intervals (lo-95, hi-95).
    • Ensemble Forecast: Calculate the ensemble forecast using the mean of individual models, adhering to the loop-safe 3-step structure.
    • Ensemble Intervals: Calculate the ensemble prediction intervals (lo-95, hi-95) as the mean of the individual model intervals.
    • ID Splitting: Split the 'unique_id' column back into original columns (
      MaterialID
      ,
      SalesOrg
      ,
      DistrChan
      ,
      CL4
      ). Use Polars native methods (e.g.,
      str.split('_').list.to_struct(...)
      ) to avoid Pandas syntax. Pad with
      None
      if fewer than 4 parts exist.
    • Renaming: Rename
      ds
      to
      WeekDate
      .
    • Reordering: Select columns in the specific order:
      MaterialID
      ,
      SalesOrg
      ,
      DistrChan
      ,
      CL4
      ,
      WeekDate
      ,
      EnsembleForecast
      ,
      Ensemble-lo-95
      ,
      Ensemble-hi-95
      , followed by individual model columns and their intervals.
    • Rounding: Round ensemble forecasts and intervals to integers using
      .round().cast(pl.Int32)
      .
  8. Visualization:

    • Loop through unique IDs to plot historical vs forecasted data with prediction intervals.
    • Use contrasting colors for visibility (e.g., dark historical line, distinct forecast line, light gray interval).

Anti-Patterns

  • Do not change the season_length from 52 unless explicitly requested.
  • Do not omit the non-negative constraint step for any forecast column or interval.
  • Do not use
    .alias()
    immediately after the calculation expression for ensemble columns inside loops (e.g., cross-validation), as this causes duplicate column errors.
  • Do not combine calculation and column addition into a single chained expression if it risks the duplicate column error.
  • Do not use
    df[['col1', 'col2']].mean(axis=1)
    as this is Pandas syntax and fails in Polars.
  • Do not use
    sort_values
    for Polars DataFrames; use
    sort
    .
  • Do not use Pandas-specific string splitting syntax like
    str.split('_', expand=True)
    ; use Polars-native methods like
    str.split_by
    .
  • Do not invent model explanations; stick to the user's provided definitions or standard documentation.
  • Do not modify the user's specific variable names (e.g.,
    y_cl4
    ,
    forecasts_df
    ) unless generalizing the concept.

Interaction Workflow

  1. Receive the input DataFrame (raw or pre-filtered).
  2. Execute the pipeline steps sequentially (Preprocess -> Filter -> Fit -> Forecast -> Process).
  3. Output the final formatted DataFrame (
    forecasts_df
    ) and print WMAPE/Accuracy metrics.
  4. Generate visualization plots for the forecasted series.

Triggers

  • ensemble model statsforecast polars
  • time series forecasting pipeline wmape
  • visualize ensemble forecasts with prediction intervals
  • run the statsforecast pipeline
  • format forecast output with unique_id split