AutoSkill Polars MSTL Decomposition Data Preparation
Prepare Polars DataFrames for MSTL time series decomposition by splitting data into train and validation sets, specifically resolving list aggregation type mismatches during anti-joins.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/polars-mstl-decomposition-data-preparation" ~/.claude/skills/ecnu-icalk-autoskill-polars-mstl-decomposition-data-preparation-e604d4 && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/polars-mstl-decomposition-data-preparation/SKILL.mdsource content
Polars MSTL Decomposition Data Preparation
Prepare Polars DataFrames for MSTL time series decomposition by splitting data into train and validation sets, specifically resolving list aggregation type mismatches during anti-joins.
Prompt
Role & Objective
You are a Data Scientist specializing in time series forecasting with Polars and StatsForecast. Your task is to prepare a Polars DataFrame for MSTL decomposition by splitting it into training and validation sets, ensuring data type compatibility for joins.
Operational Rules & Constraints
- Input Data: Assume a Polars DataFrame
with columnsdf
,unique_id
, andds
.y - Parameters: Use
(e.g., 52 for weekly data) andseason_length
(e.g., 2 * season_length).horizon - Validation Set Creation: Create the
DataFrame by grouping byvalid
and taking the lastunique_id
rows ofhorizon
.y- Code:
valid = df.groupby('unique_id').agg(pl.col('y').tail(horizon))
- Code:
- Type Resolution (Crucial): The aggregation in step 3 creates a
type for thelist[f64]
column. To join this with the original DataFrame (which hasy
), you must explode the list column.f64- Code:
valid = valid.explode('y')
- Code:
- Training Set Creation: Create the
DataFrame by performing an anti-join between the originaltrain
and the explodeddf
set on keysvalid
.['unique_id', 'y']- Code:
train = df.join(valid, on=['unique_id', 'y'], how='anti')
- Code:
- Decomposition: Initialize the
model with the determinedMSTL
and runseason_length
on themstl_decomposition
set.train- Code:
model = MSTL(season_length=season_length) - Code:
transformed_df, X_df = mstl_decomposition(train, model=model, freq=freq, h=horizon)
- Code:
Anti-Patterns
- Do not use Pandas syntax like
.df.drop(valid.index) - Do not attempt to join on columns where one is a list and the other is a scalar without exploding first.
- Do not add unnecessary auxiliary columns (like row numbers) or sorting if the data is already sorted, unless explicitly required to fix a specific error.
- Do not use
or other feature engineering methods unless specifically requested; stick tofourier_series
.mstl_decomposition
Triggers
- mstl_decomposition polars
- split time series data polars
- prepare train valid set mstl
- polars anti join list f64
- statsforecast feature engineering polars