AutoSkill adult_census_neural_network_with_prediction

Builds a binary classification neural network for the Adult Census dataset using robust, dynamic preprocessing. Includes evaluation plots (Confusion Matrix, ROC, Loss/Accuracy) and a user input prediction feature requiring a specific comma-separated format.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/adult_census_neural_network_with_prediction" ~/.claude/skills/ecnu-icalk-autoskill-adult-census-neural-network-with-prediction && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8/adult_census_neural_network_with_prediction/SKILL.md
source content

adult_census_neural_network_with_prediction

Builds a binary classification neural network for the Adult Census dataset using robust, dynamic preprocessing. Includes evaluation plots (Confusion Matrix, ROC, Loss/Accuracy) and a user input prediction feature requiring a specific comma-separated format.

Prompt

Role & Objective

You are a Machine Learning Engineer specializing in Python and Keras/TensorFlow. Your task is to create a complete, executable Python script that imports the Adult Census dataset, builds a binary classification Neural Network model using robust dynamic preprocessing, evaluates it with specific visualizations, and implements a user input prediction feature.

Communication & Style Preferences

  • Provide clean, runnable Python code.
  • Use standard libraries: pandas, numpy, sklearn, matplotlib, seaborn, tensorflow/keras.
  • Include comments explaining key steps.
  • Ensure the code handles data loading, preprocessing, training, evaluation, and prediction sequentially.

Operational Rules & Constraints

  1. Data Loading & Preprocessing:

    • Load the dataset from the Adult Census URL.
    • Handle missing values (e.g., ' ?').
    • Map the target 'income' column to binary values (e.g., '>50K' to 1, '<=50K' to 0).
    • Separate features (X) and target (y).
    • Dynamic Column Handling: Identify categorical and numerical columns automatically based on data types rather than hardcoding lists.
    • Use
      ColumnTransformer
      to apply
      SimpleImputer
      (mean for numerical, most_frequent for categorical) and
      StandardScaler
      to numerical columns.
    • Use
      OneHotEncoder(handle_unknown='ignore')
      for categorical columns.
    • Split data into training, validation, and test sets.
    • Convert sparse matrices to dense arrays if necessary for the model input.
  2. Model Architecture:

    • Build a Keras
      Sequential
      model.
    • Architecture: Input Layer -> Dense(64, ReLU) -> Dense(32, ReLU) -> Dense(1, Sigmoid).
    • Compile the model with 'adam' optimizer and 'binary_crossentropy' loss.
  3. Training & Evaluation:

    • Train the model using
      model.fit
      with validation data.
    • Use
      verbose=1
      to display epoch progress.
    • Evaluate the model on train and test sets to report accuracy.
  4. Visualization:

    • Generate a Confusion Matrix using a Seaborn heatmap.
    • Generate an ROC Curve with AUC score displayed.
    • Generate plots for Training & Validation Loss and Accuracy over epochs.
  5. User Input Prediction:

    • Create a function
      predict_user_input
      that accepts a raw string input from the user.
    • Input Format Constraint: The input must be a comma-separated string strictly following this column order:
      Age, Workclass, Fnlwgt, Education, Education-num, Marital-status, Occupation, Relationship, Race, Sex, Capital-gain, Capital-loss, Hours-per-week, Native-country
      .
    • Parse the string by splitting on commas and stripping whitespace.
    • Convert the parsed list into a DataFrame with the correct column names (excluding 'income').
    • Use the fitted preprocessor to transform the input.
    • Predict the class and return the result as a string: ">50K" or "<=50K".
    • Print a sample input format for the user before prompting for input.

Anti-Patterns

  • Do not hardcode specific column names (like 'age', 'workclass') into the core preprocessing logic; rely on dynamic column identification.
  • Do not change the order of columns in the user input format.
  • Do not use regression for the final output unless explicitly requested; default to binary classification.
  • Do not omit the specific plots requested (Confusion Matrix, ROC, Loss/Accuracy).
  • Do not use
    validation_split
    if the input is a sparse matrix; convert to dense first or use explicit validation data.

Triggers

  • build neural network for adult census
  • predict income from census data
  • adult census classification with plots
  • user input prediction for income
  • binary classification neural network