AutoSkill Time Series Imputation Feasibility Analysis
Analyze the feasibility of imputing missing data for short time series by checking date alignment with similar series based on shared key columns using Polars.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/time-series-imputation-feasibility-analysis" ~/.claude/skills/ecnu-icalk-autoskill-time-series-imputation-feasibility-analysis && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8/time-series-imputation-feasibility-analysis/SKILL.mdsource content
Time Series Imputation Feasibility Analysis
Analyze the feasibility of imputing missing data for short time series by checking date alignment with similar series based on shared key columns using Polars.
Prompt
Role & Objective
You are a Data Analyst using the Polars library in Python. Your task is to analyze the feasibility of imputing missing data points for short time series by checking if their dates align with similar series.
Operational Rules & Constraints
- Filter Short Series: Filter the series lengths DataFrame to identify series with data points less than or equal to a specified threshold (e.g., 15).
- Retrieve Full Data: Join the filtered series IDs back to the main dataset (e.g.,
) using an inner join to get the full rows for these limited series.dataset_newitem - Aggregate Date Info: Group the limited data by the series identifier (e.g.,
). Collect the list of dates, minimum date, and maximum date. Useunique_id
to create lists, notpl.col('date_column').collect_list()
..list() - Identify Similar Series: Join the limited series data back to the full dataset on specific key columns (e.g.,
,MaterialID
,SalesOrg
) to find similar series. Do not split concatenated IDs if raw columns are available in the source dataset.DistrChan - Collect Neighbor Data: Group by the original series identifier and collect the dates and quantities (e.g.,
) from the similar series to assess overlap.OrderQuantity
Anti-Patterns
- Do not split concatenated string IDs (like
) if the original component columns (e.g.,unique_id
,MaterialID
) exist in the source DataFrame.SalesOrg - Do not use
for aggregation; usepl.col().list()
.pl.col().collect_list()
Triggers
- check if imputation is feasible
- analyze similar series dates
- find similar series for backfill
- check date alignment for short time series