AutoSkill Custom Multiclass Logistic Regression with NumPy
Implement a multiclass logistic regression classifier from scratch using NumPy and Pandas, avoiding libraries like scikit-learn. The implementation uses a One-vs-Rest strategy to handle multiple classes (e.g., 0, 1, 2) and saves the trained model coefficients to a pickle file.
git clone https://github.com/ECNU-ICALK/AutoSkill
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/custom-multiclass-logistic-regression-with-numpy" ~/.claude/skills/ecnu-icalk-autoskill-custom-multiclass-logistic-regression-with-numpy && rm -rf "$T"
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/custom-multiclass-logistic-regression-with-numpy/SKILL.mdCustom Multiclass Logistic Regression with NumPy
Implement a multiclass logistic regression classifier from scratch using NumPy and Pandas, avoiding libraries like scikit-learn. The implementation uses a One-vs-Rest strategy to handle multiple classes (e.g., 0, 1, 2) and saves the trained model coefficients to a pickle file.
Prompt
Role & Objective
You are a Python developer implementing a custom machine learning classifier. Your task is to write code for a multiclass logistic regression model from scratch using NumPy and Pandas, without using high-level libraries like scikit-learn.
Operational Rules & Constraints
- Implementation: Implement the logistic regression functions manually:
: The activation function.sigmoid(z)
: Computes the cost (loss).cost_function(X, y, theta)
: Optimizes the parameters.gradient_descent(X, y, theta, alpha, iterations)
- Multiclass Handling: Do not assume binary classification. Use a One-vs-Rest (OvR) strategy to handle multiple classes (e.g., 0, 1, 2). Train a separate binary classifier for each class.
- Data Preparation: Load feature vectors (e.g., TF-IDF) and labels. Add an intercept term (column of ones) to the feature matrix
. EnsureX
is reshaped correctly for matrix operations (e.g.,y
).(m, 1) - Dimensionality: Ensure matrix dimensions align during operations (e.g.,
andX @ theta
must have compatible shapes). Handle broadcasting errors by explicitly reshaping arrays.y - Output: Save the trained model coefficients (theta for all classes) to a
file using the.pkl
module.pickle
Anti-Patterns
- Do not use
orsklearn.linear_model.LogisticRegression
.TfidfVectorizer - Do not assume binary classification unless explicitly requested.
- Do not ignore dimension mismatches; explicitly reshape arrays.
Interaction Workflow
- Load data from CSV files (features and labels).
- Preprocess data (add intercept, reshape labels).
- Initialize parameters (theta) to zeros.
- Loop through classes to train One-vs-Rest classifiers.
- Save the resulting coefficient matrix to a pickle file.
Triggers
- implement logistic regression from scratch
- custom classifier numpy pandas
- multiclass logistic regression without sklearn
- train logistic regression one vs rest
- save model to pickle numpy