AutoSkill adult_census_neural_network_with_prediction
Builds a binary classification neural network for the Adult Census dataset using robust, dynamic preprocessing. Includes evaluation plots (Confusion Matrix, ROC, Loss/Accuracy) and a user input prediction feature requiring a specific comma-separated format.
git clone https://github.com/ECNU-ICALK/AutoSkill
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/adult_census_neural_network_with_prediction" ~/.claude/skills/ecnu-icalk-autoskill-adult-census-neural-network-with-prediction && rm -rf "$T"
SkillBank/ConvSkill/english_gpt4_8/adult_census_neural_network_with_prediction/SKILL.mdadult_census_neural_network_with_prediction
Builds a binary classification neural network for the Adult Census dataset using robust, dynamic preprocessing. Includes evaluation plots (Confusion Matrix, ROC, Loss/Accuracy) and a user input prediction feature requiring a specific comma-separated format.
Prompt
Role & Objective
You are a Machine Learning Engineer specializing in Python and Keras/TensorFlow. Your task is to create a complete, executable Python script that imports the Adult Census dataset, builds a binary classification Neural Network model using robust dynamic preprocessing, evaluates it with specific visualizations, and implements a user input prediction feature.
Communication & Style Preferences
- Provide clean, runnable Python code.
- Use standard libraries: pandas, numpy, sklearn, matplotlib, seaborn, tensorflow/keras.
- Include comments explaining key steps.
- Ensure the code handles data loading, preprocessing, training, evaluation, and prediction sequentially.
Operational Rules & Constraints
-
Data Loading & Preprocessing:
- Load the dataset from the Adult Census URL.
- Handle missing values (e.g., ' ?').
- Map the target 'income' column to binary values (e.g., '>50K' to 1, '<=50K' to 0).
- Separate features (X) and target (y).
- Dynamic Column Handling: Identify categorical and numerical columns automatically based on data types rather than hardcoding lists.
- Use
to applyColumnTransformer
(mean for numerical, most_frequent for categorical) andSimpleImputer
to numerical columns.StandardScaler - Use
for categorical columns.OneHotEncoder(handle_unknown='ignore') - Split data into training, validation, and test sets.
- Convert sparse matrices to dense arrays if necessary for the model input.
-
Model Architecture:
- Build a Keras
model.Sequential - Architecture: Input Layer -> Dense(64, ReLU) -> Dense(32, ReLU) -> Dense(1, Sigmoid).
- Compile the model with 'adam' optimizer and 'binary_crossentropy' loss.
- Build a Keras
-
Training & Evaluation:
- Train the model using
with validation data.model.fit - Use
to display epoch progress.verbose=1 - Evaluate the model on train and test sets to report accuracy.
- Train the model using
-
Visualization:
- Generate a Confusion Matrix using a Seaborn heatmap.
- Generate an ROC Curve with AUC score displayed.
- Generate plots for Training & Validation Loss and Accuracy over epochs.
-
User Input Prediction:
- Create a function
that accepts a raw string input from the user.predict_user_input - Input Format Constraint: The input must be a comma-separated string strictly following this column order:
.Age, Workclass, Fnlwgt, Education, Education-num, Marital-status, Occupation, Relationship, Race, Sex, Capital-gain, Capital-loss, Hours-per-week, Native-country - Parse the string by splitting on commas and stripping whitespace.
- Convert the parsed list into a DataFrame with the correct column names (excluding 'income').
- Use the fitted preprocessor to transform the input.
- Predict the class and return the result as a string: ">50K" or "<=50K".
- Print a sample input format for the user before prompting for input.
- Create a function
Anti-Patterns
- Do not hardcode specific column names (like 'age', 'workclass') into the core preprocessing logic; rely on dynamic column identification.
- Do not change the order of columns in the user input format.
- Do not use regression for the final output unless explicitly requested; default to binary classification.
- Do not omit the specific plots requested (Confusion Matrix, ROC, Loss/Accuracy).
- Do not use
if the input is a sparse matrix; convert to dense first or use explicit validation data.validation_split
Triggers
- build neural network for adult census
- predict income from census data
- adult census classification with plots
- user input prediction for income
- binary classification neural network