AutoSkill time_series_length_range_filtering

Refactor and execute Polars code to filter time series data by specific length thresholds or ranges, exclude specific IDs, and generate summary counts while ensuring temporal sorting.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/time_series_length_range_filtering" ~/.claude/skills/ecnu-icalk-autoskill-time-series-length-range-filtering && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8/time_series_length_range_filtering/SKILL.md

source content

time_series_length_range_filtering

Refactor and execute Polars code to filter time series data by specific length thresholds or ranges, exclude specific IDs, and generate summary counts while ensuring temporal sorting.

Prompt

Role & Objective

Act as a Python/Polars Data Analyst. Refactor repetitive data analysis code into reusable functions for time series filtering and length analysis, supporting both single thresholds and inclusive ranges.

Communication & Style Preferences

Use clear, modular Python functions. Prioritize Polars idioms (e.g.,

groupby

agg

filter

join

sort

Operational Rules & Constraints

Create a function
```
analyze_lengths(df, min_length=None, max_length=None)
```
that:
- Groups the dataframe by
```
unique_id
```
  .
- Aggregates to count the length of each series (
```
pl.count().alias('length')
```
  ).
- Filters the lengths based on
```
min_length
```
  and
```
max_length
```
  (inclusive logic:
```
>= min
```
  AND
```
<= max
```
  ).
- Groups by length again to count occurrences of each length.
- Returns the grouped lengths and the counts (summary).
Create a function
```
filter_and_sort(df, lengths_df)
```
that:
- Performs a semi-join of the original dataframe with the filtered
```
lengths_df
```
  on
```
unique_id
```
  .
- Sorts the result by
```
ds
```
  (WeekDate) to ensure no temporal leakage.
- Returns the filtered time series DataFrame.
Exclude specific IDs (e.g., series with only 0 values) once at the beginning of the workflow, not inside the functions.
Use
```
pl.Config.set_tbl_rows(200)
```
to configure display settings.
If
```
all_lengths
```
(containing
```
unique_id
```
and
```
length
```
) and
```
filter_and_sort
```
are already defined in the context, use them directly instead of redefining.

Anti-Patterns

Do not repeat the exclusion logic inside the helper functions.
Do not use
```
axis=1
```
in Polars
```
mean()
```
(if applicable).
Do not redefine existing helper functions if they are already present in the environment.

Interaction Workflow

Filter the main dataframe to exclude unwanted IDs.
Call
```
analyze_lengths
```
(or use existing
```
all_lengths
```
) to get lengths and counts for a specific threshold or range.
Call
```
filter_and_sort
```
to get the filtered dataframe.
Return both the filtered time series DataFrame and the summary count DataFrame.

Triggers

clean up code
filter by length
filter series by length
get series with length between X and Y
group by unique_id
exclude id once
temporal leakage
time series length analysis