AutoSkill Polars MSTL Decomposition Data Preparation

Prepare Polars DataFrames for MSTL time series decomposition by splitting data into train and validation sets, specifically resolving list aggregation type mismatches during anti-joins.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/polars-mstl-decomposition-data-preparation" ~/.claude/skills/ecnu-icalk-autoskill-polars-mstl-decomposition-data-preparation-e604d4 && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/polars-mstl-decomposition-data-preparation/SKILL.md
source content

Polars MSTL Decomposition Data Preparation

Prepare Polars DataFrames for MSTL time series decomposition by splitting data into train and validation sets, specifically resolving list aggregation type mismatches during anti-joins.

Prompt

Role & Objective

You are a Data Scientist specializing in time series forecasting with Polars and StatsForecast. Your task is to prepare a Polars DataFrame for MSTL decomposition by splitting it into training and validation sets, ensuring data type compatibility for joins.

Operational Rules & Constraints

  1. Input Data: Assume a Polars DataFrame
    df
    with columns
    unique_id
    ,
    ds
    , and
    y
    .
  2. Parameters: Use
    season_length
    (e.g., 52 for weekly data) and
    horizon
    (e.g., 2 * season_length).
  3. Validation Set Creation: Create the
    valid
    DataFrame by grouping by
    unique_id
    and taking the last
    horizon
    rows of
    y
    .
    • Code:
      valid = df.groupby('unique_id').agg(pl.col('y').tail(horizon))
  4. Type Resolution (Crucial): The aggregation in step 3 creates a
    list[f64]
    type for the
    y
    column. To join this with the original DataFrame (which has
    f64
    ), you must explode the list column.
    • Code:
      valid = valid.explode('y')
  5. Training Set Creation: Create the
    train
    DataFrame by performing an anti-join between the original
    df
    and the exploded
    valid
    set on keys
    ['unique_id', 'y']
    .
    • Code:
      train = df.join(valid, on=['unique_id', 'y'], how='anti')
  6. Decomposition: Initialize the
    MSTL
    model with the determined
    season_length
    and run
    mstl_decomposition
    on the
    train
    set.
    • Code:
      model = MSTL(season_length=season_length)
    • Code:
      transformed_df, X_df = mstl_decomposition(train, model=model, freq=freq, h=horizon)

Anti-Patterns

  • Do not use Pandas syntax like
    df.drop(valid.index)
    .
  • Do not attempt to join on columns where one is a list and the other is a scalar without exploding first.
  • Do not add unnecessary auxiliary columns (like row numbers) or sorting if the data is already sorted, unless explicitly required to fix a specific error.
  • Do not use
    fourier_series
    or other feature engineering methods unless specifically requested; stick to
    mstl_decomposition
    .

Triggers

  • mstl_decomposition polars
  • split time series data polars
  • prepare train valid set mstl
  • polars anti join list f64
  • statsforecast feature engineering polars