Claude-skill-registry evaluate-model

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/evaluate-model" ~/.claude/skills/majiayu000-claude-skill-registry-evaluate-model && rm -rf "$T"
manifest: skills/data/evaluate-model/SKILL.md
source content

Evaluate Model

Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).

When to Use

  • Comparing different model architectures
  • Assessing performance on test/validation datasets
  • Detecting overfitting or underfitting
  • Reporting model accuracy for papers and documentation

Quick Reference

# Mojo model evaluation pattern
struct ModelEvaluator:
    fn evaluate_classification(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32, Float32]:
        # Returns accuracy, precision, recall
        ...

    fn evaluate_regression(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32]:
        # Returns MSE, MAE
        ...

Workflow

  1. Load test data: Prepare test/validation dataset
  2. Generate predictions: Run model inference on test set
  3. Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
  4. Calculate metrics: Compute performance metrics
  5. Analyze results: Compare to baseline and identify strengths/weaknesses

Output Format

Evaluation report:

  • Task type (classification, regression, etc.)
  • Metrics (accuracy, precision, recall, F1, AUC, etc.)
  • Per-class breakdown (if applicable)
  • Comparison to baseline model
  • Confusion matrix (classification)
  • Error analysis

References

  • See CLAUDE.md > Language Preference (Mojo for ML models)
  • See
    train-model
    skill for model training
  • See
    /notes/review/mojo-ml-patterns.md
    for Mojo tensor operations