Claude-skill-registry generalization-evaluator

Cross-domain evaluation to estimate generality and detect blind spots. Use when asked to assess broad capability, compare models across domains, or identify missing skills.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/generalization-evaluator" ~/.claude/skills/majiayu000-claude-skill-registry-generalization-evaluator && rm -rf "$T"
manifest: skills/data/generalization-evaluator/SKILL.md
source content

Generalization Evaluator

Use this skill to measure generality across domains and identify weak coverage.

Workflow

  1. Load a task set (use references/task_set.example.json).
  2. Run the task set with a consistent runner.
  3. Score pass/fail per task and summarize by domain.
  4. Rank gaps by impact.

Scripts

  • Run: python scripts/run_eval.py --tasks references/task_set.example.json --runner ollama --model qwen3:latest

Output Expectations

  • Provide a domain score table and a short summary of weaknesses.
  • List the top 3 skill gaps with suggested skill actions.