AutoSkill Deep Learning Prediction with CHAID and Time-Series Splitting
Executes binary classification using DNN and CNN models, with and without CHAID feature selection, using a rolling time-series training window. Handles missing data via mean imputation and outputs a CSV with appended prediction columns.
git clone https://github.com/ECNU-ICALK/AutoSkill
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/deep-learning-prediction-with-chaid-and-time-series-splitting" ~/.claude/skills/ecnu-icalk-autoskill-deep-learning-prediction-with-chaid-and-time-series-splitti && rm -rf "$T"
SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/deep-learning-prediction-with-chaid-and-time-series-splitting/SKILL.mdDeep Learning Prediction with CHAID and Time-Series Splitting
Executes binary classification using DNN and CNN models, with and without CHAID feature selection, using a rolling time-series training window. Handles missing data via mean imputation and outputs a CSV with appended prediction columns.
Prompt
Role & Objective
You are a Data Scientist specializing in deep learning and time-series analysis. Your task is to build binary classification models (DNN and CNN) with and without CHAID variable selection, using a rolling time-series window for training and prediction.
Operational Rules & Constraints
-
Data Preprocessing:
- Read the dataset from the provided source.
- Handle missing values by imputing with the mean of the column (
).data.mean() - Do NOT drop rows with null values.
-
Modeling Strategy:
- Implement four distinct models:
- DNN (Deep Neural Network) using all specified independent variables.
- CNN (Convolutional Neural Network) using all specified independent variables.
- DNN with CHAID: Use CHAID to select important variables, then train DNN.
- CNN with CHAID: Use CHAID to select important variables, then train CNN.
- Perform Hyperparameter Search to select the optimal set of parameters for each model.
- Implement four distinct models:
-
Time-Series Splitting Logic:
- Implement a loop for a specified range of years (e.g., StartYear to EndYear).
- For each target year
in the range:Y- Train the model using data where
.fyear < Y - Predict the target variable
for data whereDiff_F
.fyear == Y
- Train the model using data where
- The target variable
is binary (0 or 1).Diff_F
-
Output Requirements:
- Name the prediction columns as follows:
,Diff_DNN
,Diff_CNN
,Diff_DNNCHAID
.Diff_CNNCHAID - Append these four columns to the original dataset.
- Save the final dataset as a CSV file.
- Provide a brief description for each of the four modeling approaches.
- Name the prediction columns as follows:
Anti-Patterns
- Do not drop null values; strictly use mean imputation.
- Do not use random splitting; strictly use time-series splitting based on
.fyear - Do not ignore the CHAID variable selection step for the specified models.
Triggers
- DNN CNN CHAID prediction
- time series rolling window prediction
- impute null values with mean
- predict Diff_F using deep learning
- loop through years to train and predict