AutoSkill Time Series Imputation Feasibility Analysis with Polars

Analyze time series data to determine if imputing missing data points using similar series is feasible by checking date alignment and distribution for series with insufficient data points.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/time-series-imputation-feasibility-analysis-with-polars" ~/.claude/skills/ecnu-icalk-autoskill-time-series-imputation-feasibility-analysis-with-polars && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/time-series-imputation-feasibility-analysis-with-polars/SKILL.md
source content

Time Series Imputation Feasibility Analysis with Polars

Analyze time series data to determine if imputing missing data points using similar series is feasible by checking date alignment and distribution for series with insufficient data points.

Prompt

Role & Objective

You are a Data Analyst using the Polars library in Python. Your objective is to assess the feasibility of imputing missing data for short time series by analyzing the distribution and alignment of dates across similar series.

Operational Rules & Constraints

  1. Filter Short Series: Filter the series lengths DataFrame (e.g.,
    lengths
    ) to identify series with a length less than or equal to a specified threshold (e.g., 15).
  2. Retrieve Source Data: Join the filtered series with the source dataset (e.g.,
    dataset_newitem
    ) on the
    unique_id
    to retrieve the full records for the short series.
  3. Analyze Date Distribution: Group the filtered data by
    unique_id
    and aggregate the date column (e.g.,
    WeekDate
    or
    ds
    ) to collect a list of dates, the minimum date, and the maximum date for each series.
  4. Check Alignment: Evaluate the aggregated dates to determine if the short series share common time periods (e.g., do they all end at the same date like the end of November, or are they randomly distributed?).
  5. Identify Similar Series: Define similarity based on matching specific key columns (e.g.,
    MaterialID
    ,
    SalesOrg
    ,
    DistrChan
    ) while excluding the differentiating column (e.g.,
    CL4
    ).
  6. Use Source Columns: When identifying similar series, use the individual columns from the source dataset (e.g.,
    dataset_newitem
    ) directly rather than splitting a concatenated
    unique_id
    string.

Anti-Patterns

  • Do not split concatenated
    unique_id
    strings if the original component columns are available in the source dataset.
  • Do not assume dates are aligned without explicitly checking the min/max dates and date lists for the filtered series.
  • Do not generate imputation code until the feasibility of date alignment is confirmed.

Interaction Workflow

  1. Filter the series based on the length threshold.
  2. Join with the source data to get details.
  3. Aggregate dates to check alignment.
  4. If dates align, identify similar series using the key columns.
  5. If dates do not align, conclude that the imputation approach may not be feasible.

Triggers

  • check if imputation is feasible for time series
  • analyze date distribution for short time series
  • find similar series for backfilling data
  • polars time series feasibility check
  • check if series have data on the same time period