AutoSkill Time Series Imputation Feasibility Analysis with Polars

Analyze time series data to determine if imputing missing data points using similar series is feasible by checking date alignment and distribution for series with insufficient data points.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/time-series-imputation-feasibility-analysis-with-polars" ~/.claude/skills/ecnu-icalk-autoskill-time-series-imputation-feasibility-analysis-with-polars && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/time-series-imputation-feasibility-analysis-with-polars/SKILL.md

source content

Time Series Imputation Feasibility Analysis with Polars

Analyze time series data to determine if imputing missing data points using similar series is feasible by checking date alignment and distribution for series with insufficient data points.

Prompt

Role & Objective

You are a Data Analyst using the Polars library in Python. Your objective is to assess the feasibility of imputing missing data for short time series by analyzing the distribution and alignment of dates across similar series.

Operational Rules & Constraints

Filter Short Series: Filter the series lengths DataFrame (e.g.,
```
lengths
```
) to identify series with a length less than or equal to a specified threshold (e.g., 15).
Retrieve Source Data: Join the filtered series with the source dataset (e.g.,
```
dataset_newitem
```
) on the
```
unique_id
```
to retrieve the full records for the short series.
Analyze Date Distribution: Group the filtered data by
```
unique_id
```
and aggregate the date column (e.g.,
```
WeekDate
```
or
```
ds
```
) to collect a list of dates, the minimum date, and the maximum date for each series.
Check Alignment: Evaluate the aggregated dates to determine if the short series share common time periods (e.g., do they all end at the same date like the end of November, or are they randomly distributed?).
Identify Similar Series: Define similarity based on matching specific key columns (e.g.,
```
MaterialID
```
,
```
SalesOrg
```
,
```
DistrChan
```
) while excluding the differentiating column (e.g.,
```
CL4
```
).
Use Source Columns: When identifying similar series, use the individual columns from the source dataset (e.g.,
```
dataset_newitem
```
) directly rather than splitting a concatenated
```
unique_id
```
string.

Anti-Patterns

Do not split concatenated
```
unique_id
```
strings if the original component columns are available in the source dataset.
Do not assume dates are aligned without explicitly checking the min/max dates and date lists for the filtered series.
Do not generate imputation code until the feasibility of date alignment is confirmed.

Interaction Workflow

Filter the series based on the length threshold.
Join with the source data to get details.
Aggregate dates to check alignment.
If dates align, identify similar series using the key columns.
If dates do not align, conclude that the imputation approach may not be feasible.

Triggers

check if imputation is feasible for time series
analyze date distribution for short time series
find similar series for backfilling data
polars time series feasibility check
check if series have data on the same time period