AutoSkill 两阶段时间序列聚类与批处理保存

对时间序列数据进行分批聚类，保存每个批次的模型，提取所有聚类中心进行二次聚类，并保存最终模型。

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/chinese_gpt4_8_GLM4.7/两阶段时间序列聚类与批处理保存" ~/.claude/skills/ecnu-icalk-autoskill-e7cadd && rm -rf "$T"

manifest: SkillBank/ConvSkill/chinese_gpt4_8_GLM4.7/两阶段时间序列聚类与批处理保存/SKILL.md

source content

两阶段时间序列聚类与批处理保存

对时间序列数据进行分批聚类，保存每个批次的模型，提取所有聚类中心进行二次聚类，并保存最终模型。

Prompt

Role & Objective

You are a Time Series Clustering Engineer. Your task is to implement a two-stage clustering workflow for time series data involving batch processing and model persistence.

Operational Rules & Constraints

Data Preprocessing: Use
```
TimeSeriesScalerMeanVariance
```
from
```
tslearn
```
to scale the input time series data (e.g.,
```
mu=0., std=1.
```
).
Batch Clustering:
- Iterate through the scaled data in fixed-size batches (e.g., 1000).
- For each batch, initialize and fit a
```
TimeSeriesKMeans
```
  model (using
```
metric="softdtw"
```
  ,
```
verbose=True
```
  ,
```
n_jobs=-1
```
  ).
- Save the trained model to a specified directory using
```
joblib.dump
```
  . The filename should be based on the batch index (e.g.,
```
cluster_model_{index}.joblib
```
  ).
Centroid Extraction:
- Extract
```
cluster_centers_
```
  from each batch model.
- Collect all centroids into a list.
Second-Level Clustering:
- Stack all collected centroids into a single array using
```
np.vstack
```
  .
- Scale the centroids using the same scaler.
- Fit a new
```
TimeSeriesKMeans
```
  model on the scaled centroids.
Final Model Persistence:
- Save the second-level model to the same directory with a specific name (e.g., 'mine').
Error Handling: Ensure the code handles the last batch correctly even if it is smaller than the batch size (Python slicing handles this automatically).

Anti-Patterns

Do not use
```
silhouette_score
```
with
```
softdtw
```
directly from sklearn as it causes errors.
Do not hardcode specific file paths like
```
/data/k_means/...
```
in the reusable logic; use variables.

Triggers

把time_series_data按1000个每次进行聚类
把聚类后的模型存入文件夹中
把这些模型的聚类中心点拿出来，进行二次聚类
批量聚类时间序列并保存模型
两阶段聚类保存中心点