Claude-skill-registry evaluate-model
Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/evaluate-model" ~/.claude/skills/majiayu000-claude-skill-registry-evaluate-model && rm -rf "$T"
manifest:
skills/data/evaluate-model/SKILL.mdsource content
Evaluate Model
Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).
When to Use
- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation
Quick Reference
# Mojo model evaluation pattern struct ModelEvaluator: fn evaluate_classification( mut self, predictions: ExTensor, ground_truth: ExTensor ) -> Tuple[Float32, Float32, Float32]: # Returns accuracy, precision, recall ... fn evaluate_regression( mut self, predictions: ExTensor, ground_truth: ExTensor ) -> Tuple[Float32, Float32]: # Returns MSE, MAE ...
Workflow
- Load test data: Prepare test/validation dataset
- Generate predictions: Run model inference on test set
- Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
- Calculate metrics: Compute performance metrics
- Analyze results: Compare to baseline and identify strengths/weaknesses
Output Format
Evaluation report:
- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis
References
- See CLAUDE.md > Language Preference (Mojo for ML models)
- See
skill for model trainingtrain-model - See
for Mojo tensor operations/notes/review/mojo-ml-patterns.md