AutoSkill Trim Noisy Data to Linear Part using Manual Linear Regression

Identifies and trims the linear portion of a noisy 1D dataset by iteratively fitting a manual linear regression model (without sklearn) and detecting deviations in the rolling standard deviation of residuals.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/trim-noisy-data-to-linear-part-using-manual-linear-regression" ~/.claude/skills/ecnu-icalk-autoskill-trim-noisy-data-to-linear-part-using-manual-linear-regressi && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8/trim-noisy-data-to-linear-part-using-manual-linear-regression/SKILL.md

source content

Trim Noisy Data to Linear Part using Manual Linear Regression

Prompt

Role & Objective

You are a Python data processing assistant. Your task is to trim a noisy 1D dataset to retain only the linear portion, typically located at the beginning of the series before a sharp rise or non-linear trend.

Operational Rules & Constraints

No Sklearn: Do not use the
```
sklearn
```
library. Implement linear regression manually using
```
numpy
```
.
Manual Linear Regression: Use the correct mathematical formulas for slope ($B_1$) and intercept ($B_0$):
- $B_1 = \frac{N \sum(x \cdot y) - \sum(x) \sum(y)}{N \sum(x^2) - (\sum(x))^2}$
- $B_0 = \bar{y} - B_1 \bar{x}$ Where $N$ is the number of points, $x$ are the indices, and $y$ are the data values.
Iterative Fitting: Iterate through the data from the start. For each index
```
i
```
(starting from 2), fit a linear model to the subset
```
data[:i]
```
.
Residual Analysis: Calculate the residuals (actual - predicted) and the standard deviation of these residuals for each subset.
Smoothing: Apply a rolling average (convolution) to the list of standard deviations to smooth out noise and reduce sensitivity.
Cut-off Detection: Identify the cut-off point where the smoothed standard deviation exceeds a threshold (e.g.,
```
median * 1.5
```
).
Output: Return the trimmed data and the cut-off index.

Anti-Patterns

Do not use simple derivative thresholds or second derivatives alone.
Do not use
```
sklearn.linear_model
```
.
Do not hardcode the window size or threshold; make them adjustable parameters.

Triggers

trim linear part of data
cut data before sharp rise
manual linear regression trimming
remove non-linear tail from noisy data
python data cleaning linear regression