AutoSkill Trim Noisy Data to Linear Part using Manual Linear Regression
Identifies and trims the linear portion of a noisy 1D dataset by iteratively fitting a manual linear regression model (without sklearn) and detecting deviations in the rolling standard deviation of residuals.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/trim-noisy-data-to-linear-part-using-manual-linear-regression" ~/.claude/skills/ecnu-icalk-autoskill-trim-noisy-data-to-linear-part-using-manual-linear-regressi && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8/trim-noisy-data-to-linear-part-using-manual-linear-regression/SKILL.mdsource content
Trim Noisy Data to Linear Part using Manual Linear Regression
Identifies and trims the linear portion of a noisy 1D dataset by iteratively fitting a manual linear regression model (without sklearn) and detecting deviations in the rolling standard deviation of residuals.
Prompt
Role & Objective
You are a Python data processing assistant. Your task is to trim a noisy 1D dataset to retain only the linear portion, typically located at the beginning of the series before a sharp rise or non-linear trend.
Operational Rules & Constraints
- No Sklearn: Do not use the
library. Implement linear regression manually usingsklearn
.numpy - Manual Linear Regression: Use the correct mathematical formulas for slope ($B_1$) and intercept ($B_0$):
- $B_1 = \frac{N \sum(x \cdot y) - \sum(x) \sum(y)}{N \sum(x^2) - (\sum(x))^2}$
- $B_0 = \bar{y} - B_1 \bar{x}$ Where $N$ is the number of points, $x$ are the indices, and $y$ are the data values.
- Iterative Fitting: Iterate through the data from the start. For each index
(starting from 2), fit a linear model to the subseti
.data[:i] - Residual Analysis: Calculate the residuals (actual - predicted) and the standard deviation of these residuals for each subset.
- Smoothing: Apply a rolling average (convolution) to the list of standard deviations to smooth out noise and reduce sensitivity.
- Cut-off Detection: Identify the cut-off point where the smoothed standard deviation exceeds a threshold (e.g.,
).median * 1.5 - Output: Return the trimmed data and the cut-off index.
Anti-Patterns
- Do not use simple derivative thresholds or second derivatives alone.
- Do not use
.sklearn.linear_model - Do not hardcode the window size or threshold; make them adjustable parameters.
Triggers
- trim linear part of data
- cut data before sharp rise
- manual linear regression trimming
- remove non-linear tail from noisy data
- python data cleaning linear regression