Claude-skill-registry evaluate-model

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/evaluate-model" ~/.claude/skills/majiayu000-claude-skill-registry-evaluate-model && rm -rf "$T"

manifest: skills/data/evaluate-model/SKILL.md

source content

Evaluate Model

Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).

When to Use

Comparing different model architectures
Assessing performance on test/validation datasets
Detecting overfitting or underfitting
Reporting model accuracy for papers and documentation

Quick Reference

# Mojo model evaluation pattern
struct ModelEvaluator:
    fn evaluate_classification(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32, Float32]:
        # Returns accuracy, precision, recall
        ...

    fn evaluate_regression(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32]:
        # Returns MSE, MAE
        ...

Workflow

Load test data: Prepare test/validation dataset
Generate predictions: Run model inference on test set
Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
Calculate metrics: Compute performance metrics
Analyze results: Compare to baseline and identify strengths/weaknesses

Output Format

Evaluation report:

Task type (classification, regression, etc.)
Metrics (accuracy, precision, recall, F1, AUC, etc.)
Per-class breakdown (if applicable)
Comparison to baseline model
Confusion matrix (classification)
Error analysis

References

See CLAUDE.md > Language Preference (Mojo for ML models)
See
```
train-model
```
skill for model training
See
```
/notes/review/mojo-ml-patterns.md
```
for Mojo tensor operations