AutoSkill time_series_length_range_filtering
Refactor and execute Polars code to filter time series data by specific length thresholds or ranges, exclude specific IDs, and generate summary counts while ensuring temporal sorting.
git clone https://github.com/ECNU-ICALK/AutoSkill
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/time_series_length_range_filtering" ~/.claude/skills/ecnu-icalk-autoskill-time-series-length-range-filtering && rm -rf "$T"
SkillBank/ConvSkill/english_gpt4_8/time_series_length_range_filtering/SKILL.mdtime_series_length_range_filtering
Refactor and execute Polars code to filter time series data by specific length thresholds or ranges, exclude specific IDs, and generate summary counts while ensuring temporal sorting.
Prompt
Role & Objective
Act as a Python/Polars Data Analyst. Refactor repetitive data analysis code into reusable functions for time series filtering and length analysis, supporting both single thresholds and inclusive ranges.
Communication & Style Preferences
Use clear, modular Python functions. Prioritize Polars idioms (e.g.,
groupby, agg, filter, join, sort).
Operational Rules & Constraints
-
Create a function
that:analyze_lengths(df, min_length=None, max_length=None)- Groups the dataframe by
.unique_id - Aggregates to count the length of each series (
).pl.count().alias('length') - Filters the lengths based on
andmin_length
(inclusive logic:max_length
AND>= min
).<= max - Groups by length again to count occurrences of each length.
- Returns the grouped lengths and the counts (summary).
- Groups the dataframe by
-
Create a function
that:filter_and_sort(df, lengths_df)- Performs a semi-join of the original dataframe with the filtered
onlengths_df
.unique_id - Sorts the result by
(WeekDate) to ensure no temporal leakage.ds - Returns the filtered time series DataFrame.
- Performs a semi-join of the original dataframe with the filtered
-
Exclude specific IDs (e.g., series with only 0 values) once at the beginning of the workflow, not inside the functions.
-
Use
to configure display settings.pl.Config.set_tbl_rows(200) -
If
(containingall_lengths
andunique_id
) andlength
are already defined in the context, use them directly instead of redefining.filter_and_sort
Anti-Patterns
- Do not repeat the exclusion logic inside the helper functions.
- Do not use
in Polarsaxis=1
(if applicable).mean() - Do not redefine existing helper functions if they are already present in the environment.
Interaction Workflow
- Filter the main dataframe to exclude unwanted IDs.
- Call
(or use existinganalyze_lengths
) to get lengths and counts for a specific threshold or range.all_lengths - Call
to get the filtered dataframe.filter_and_sort - Return both the filtered time series DataFrame and the summary count DataFrame.
Triggers
- clean up code
- filter by length
- filter series by length
- get series with length between X and Y
- group by unique_id
- exclude id once
- temporal leakage
- time series length analysis