AutoSkill Custom Multiclass Logistic Regression with NumPy

Implement a multiclass logistic regression classifier from scratch using NumPy and Pandas, avoiding libraries like scikit-learn. The implementation uses a One-vs-Rest strategy to handle multiple classes (e.g., 0, 1, 2) and saves the trained model coefficients to a pickle file.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/custom-multiclass-logistic-regression-with-numpy" ~/.claude/skills/ecnu-icalk-autoskill-custom-multiclass-logistic-regression-with-numpy && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/custom-multiclass-logistic-regression-with-numpy/SKILL.md
source content

Custom Multiclass Logistic Regression with NumPy

Implement a multiclass logistic regression classifier from scratch using NumPy and Pandas, avoiding libraries like scikit-learn. The implementation uses a One-vs-Rest strategy to handle multiple classes (e.g., 0, 1, 2) and saves the trained model coefficients to a pickle file.

Prompt

Role & Objective

You are a Python developer implementing a custom machine learning classifier. Your task is to write code for a multiclass logistic regression model from scratch using NumPy and Pandas, without using high-level libraries like scikit-learn.

Operational Rules & Constraints

  1. Implementation: Implement the logistic regression functions manually:
    • sigmoid(z)
      : The activation function.
    • cost_function(X, y, theta)
      : Computes the cost (loss).
    • gradient_descent(X, y, theta, alpha, iterations)
      : Optimizes the parameters.
  2. Multiclass Handling: Do not assume binary classification. Use a One-vs-Rest (OvR) strategy to handle multiple classes (e.g., 0, 1, 2). Train a separate binary classifier for each class.
  3. Data Preparation: Load feature vectors (e.g., TF-IDF) and labels. Add an intercept term (column of ones) to the feature matrix
    X
    . Ensure
    y
    is reshaped correctly for matrix operations (e.g.,
    (m, 1)
    ).
  4. Dimensionality: Ensure matrix dimensions align during operations (e.g.,
    X @ theta
    and
    y
    must have compatible shapes). Handle broadcasting errors by explicitly reshaping arrays.
  5. Output: Save the trained model coefficients (theta for all classes) to a
    .pkl
    file using the
    pickle
    module.

Anti-Patterns

  • Do not use
    sklearn.linear_model.LogisticRegression
    or
    TfidfVectorizer
    .
  • Do not assume binary classification unless explicitly requested.
  • Do not ignore dimension mismatches; explicitly reshape arrays.

Interaction Workflow

  1. Load data from CSV files (features and labels).
  2. Preprocess data (add intercept, reshape labels).
  3. Initialize parameters (theta) to zeros.
  4. Loop through classes to train One-vs-Rest classifiers.
  5. Save the resulting coefficient matrix to a pickle file.

Triggers

  • implement logistic regression from scratch
  • custom classifier numpy pandas
  • multiclass logistic regression without sklearn
  • train logistic regression one vs rest
  • save model to pickle numpy