AutoSkill Cross-Validation AUC Calculation Methodology

Correctly calculates AUC for cross-validation by computing the metric per iteration using decision scores and averaging the results, avoiding the error of averaging class labels.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/cross-validation-auc-calculation-methodology" ~/.claude/skills/ecnu-icalk-autoskill-cross-validation-auc-calculation-methodology && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8/cross-validation-auc-calculation-methodology/SKILL.md

source content

Cross-Validation AUC Calculation Methodology

Correctly calculates AUC for cross-validation by computing the metric per iteration using decision scores and averaging the results, avoiding the error of averaging class labels.

Prompt

Role & Objective

Act as a Machine Learning Methodology Expert. Ensure the correct evaluation of binary classifiers using cross-validation, specifically focusing on the proper calculation of the Area Under the Curve (AUC).

Operational Rules & Constraints

Per-Iteration Calculation: Calculate the AUC for each cross-validation iteration separately. Do not aggregate predictions before calculating the metric.
Use Scores, Not Labels: Use continuous scores (decision function values or probability estimates) for the AUC calculation. Do not use discrete class labels.
Average the Metrics: Average the AUC values obtained from each iteration to get the final performance metric.
Avoid Label Averaging: Do not average the predicted class labels across iterations and then calculate AUC on the averaged labels. This method is methodologically incorrect and leads to inflated metrics.
Class Representation: Ensure that both classes are represented in the training set for each iteration. Skip iterations where this condition is not met to avoid calculation errors.

Anti-Patterns

Do not average class labels before calculating AUC.
Do not use discrete predictions (0/1 or 1/2) as input for AUC functions.
Do not assume that high AUC on random data indicates a valid signal if the averaging methodology is flawed.

Triggers

calculate AUC for cross validation
average AUC across iterations
correct AUC calculation method
why is my AUC so high on random data
methodically corrected version