Claude-Code-Game-Studios skill-test
Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report).
git clone https://github.com/Donchitos/Claude-Code-Game-Studios
T=$(mktemp -d) && git clone --depth=1 https://github.com/Donchitos/Claude-Code-Game-Studios "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/skill-test" ~/.claude/skills/donchitos-claude-code-game-studios-skill-test && rm -rf "$T"
.claude/skills/skill-test/SKILL.mdSkill Test
Validates
.claude/skills/*/SKILL.md files for structural compliance and
behavioral correctness. No external dependencies — runs entirely within the
existing skill/hook/template architecture.
Four modes:
| Mode | Command | Purpose | Token Cost |
|---|---|---|---|
| | Structural linter — 7 compliance checks per skill | Low (~1k/skill) |
| | Behavioral verifier — evaluates assertions in test spec | Medium (~5k/skill) |
| | Category rubric — checks skill against its category-specific metrics | Low (~2k/skill) |
| | Coverage report — skills, agent specs, last test dates | Low (~3k total) |
Phase 1: Parse Arguments
Determine mode from the first argument:
→ run 7 structural checks on one skillstatic [name]
→ run 7 structural checks on all skills (Globstatic all
).claude/skills/*/SKILL.md
→ read skill + test spec, evaluate assertionsspec [name]
→ run category-specific rubric fromcategory [name]CCGS Skill Testing Framework/quality-rubric.md
→ run category rubric for every skill that has acategory all
in catalogcategory:
(or no argument) → read catalog, list all skills and agents, show coverageaudit
If argument is missing or unrecognized, output usage and stop.
Phase 2A: Static Mode — Structural Linter
For each skill being tested, read its
SKILL.md fully and run all 7 checks:
Check 1 — Required Frontmatter Fields
The file must contain all of these in the YAML frontmatter block:
name:description:argument-hint:user-invocable:allowed-tools:
FAIL if any are absent.
Check 2 — Multiple Phases
The skill must have ≥2 numbered phase headings. Look for patterns like:
or## Phase N## Phase N:
(numbered top-level sections)## N.- At least 2 distinct
headings if phases aren't explicitly numbered##
FAIL if fewer than 2 phase-like headings are found.
Check 3 — Verdict Keywords
The skill must contain at least one of:
PASS, FAIL, CONCERNS, APPROVED,
BLOCKED, COMPLETE, READY, COMPLIANT, NON-COMPLIANT
FAIL if none are present.
Check 4 — Collaborative Protocol Language
The skill must contain ask-before-write language. Look for:
(canonical form)"May I write"
or"before writing"
near file-write instructions"approval"
+"ask"
in close proximity (within same section)"write"
WARN if absent (some read-only skills legitimately skip this). FAIL if
allowed-tools includes Write or Edit but no ask-before-write language is found.
Check 5 — Next-Step Handoff
The skill must end with a recommended next action or follow-up path. Look for:
- A final section mentioning another skill (e.g.,
,/story-done
)/gate-check - "Recommended next" or "next step" phrasing
- A "Follow-Up" or "After this" section
WARN if absent.
Check 6 — Fork Context Complexity
If frontmatter contains
context: fork, the skill should have ≥5 phase headings
(## level or numbered Phase N headers). Fork context is for complex multi-phase
skills; simple skills should not use it.
WARN if
context: fork is set but fewer than 5 phases found.
Check 7 — Argument Hint Plausibility
argument-hint must be non-empty. If the skill body mentions multiple modes
(e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the
hint against the first phase's "Parse Arguments" section.
WARN if hint is
"" or if documented modes don't match hint.
Static Mode Output Format
For a single skill:
=== Skill Static Check: /[name] === Check 1 — Frontmatter Fields: PASS Check 2 — Multiple Phases: PASS (7 phases found) Check 3 — Verdict Keywords: PASS (PASS, FAIL, CONCERNS) Check 4 — Collaborative Protocol: PASS ("May I write" found) Check 5 — Next-Step Handoff: WARN (no follow-up section found) Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set) Check 7 — Argument Hint: PASS Verdict: WARNINGS (1 warning, 0 failures) Recommended: Add a "Follow-Up Actions" section at the end of the skill.
For
static all, produce a summary table then list any non-compliant skills:
=== Skill Static Check: All 52 Skills === Skill | Result | Issues -----------------------|--------------|------- gate-check | COMPLIANT | design-review | COMPLIANT | story-readiness | WARNINGS | Check 5: no handoff ... Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT Aggregate Verdict: N WARNINGS / N FAILURES
Phase 2B: Spec Mode — Behavioral Verifier
Step 1 — Locate Files
Find skill at
.claude/skills/[name]/SKILL.md.
Look up the spec path from CCGS Skill Testing Framework/catalog.yaml — use the
spec: field for the matching skill entry.
If either is missing:
- Missing skill: "Skill '[name]' not found in
.".claude/skills/ - Missing spec path in catalog: "No spec path set for '[name]' in catalog.yaml."
- Spec file not found at path: "Spec file missing at [path]. Run
to see coverage gaps."/skill-test audit
Step 2 — Read Both Files
Read the skill file and test spec file completely.
Step 3 — Evaluate Assertions
For each Test Case in the spec:
- Read the Fixture description (assumed state of project files)
- Read the Expected behavior steps
- Read each Assertion checkbox
For each assertion, evaluate whether the skill's written instructions, if followed correctly given the fixture state, would satisfy it. This is a Claude-evaluated reasoning check, not code execution.
Mark each assertion:
- PASS — skill instructions clearly satisfy this assertion
- PARTIAL — skill instructions partially address it, but with ambiguity
- FAIL — skill instructions would NOT satisfy this assertion given the fixture
For Protocol Compliance assertions (always present):
- Check whether the skill requires "May I write" before file writes
- Check whether the skill presents findings before requesting approval
- Check whether the skill ends with a recommended next step
- Check whether the skill avoids auto-creating files without approval
Step 4 — Build Report
=== Skill Spec Test: /[name] === Date: [date] Spec: CCGS Skill Testing Framework/skills/[category]/[name].md Case 1: [Happy Path — name] Fixture: [summary] Assertions: [PASS] [assertion text] [FAIL] [assertion text] Reason: The skill's Phase 3 says "..." but the fixture state means "..." Case Verdict: FAIL Case 2: [Edge Case — name] ... Case Verdict: PASS Protocol Compliance: [PASS] Uses "May I write" before file writes [PASS] Presents findings before asking approval [WARN] No explicit next-step handoff at end Overall Verdict: FAIL (1 case failed, 1 warning)
Step 5 — Offer to Write Results
"May I write these results to
CCGS Skill Testing Framework/results/skill-test-spec-[name]-[date].md
and update CCGS Skill Testing Framework/catalog.yaml?"
If yes:
- Write results file to
CCGS Skill Testing Framework/results/ - Update the skill's entry in
:CCGS Skill Testing Framework/catalog.yamllast_spec: [date]last_spec_result: PASS|PARTIAL|FAIL
Phase 2D: Category Mode — Rubric Evaluation
Step 1 — Locate Skill and Category
Find skill at
.claude/skills/[name]/SKILL.md.
Look up category: field in CCGS Skill Testing Framework/catalog.yaml.
If skill not found: "Skill '[name]' not found." If no
category: field: "No category assigned for '[name]' in catalog.yaml.
Add category: [name] to the skill entry first."
For
category all: collect all skills with a category: field and process each.
category: utility skills are evaluated against U1 (static checks pass) and U2
(gate mode correct if applicable) only — skip to the static mode for U1.
Step 2 — Read Rubric Section
Read
CCGS Skill Testing Framework/quality-rubric.md.
Extract the section matching the skill's category (e.g., ### gate, ### team).
Step 3 — Read Skill
Read the skill's
SKILL.md fully.
Step 4 — Evaluate Rubric Metrics
For each metric in the category's rubric table:
- Check whether the skill's written instructions clearly satisfy the criterion
- Mark PASS, FAIL, or WARN
- For FAIL/WARN, identify the exact gap in the skill text (quote the relevant section or note its absence)
Step 5 — Output Report
=== Skill Category Check: /[name] ([category]) === Metric G1 — Review mode read: PASS Metric G2 — Full mode directors: FAIL Gap: Phase 3 spawns only CD-PHASE-GATE; TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE absent Metric G3 — Lean mode: PHASE-GATE only: PASS Metric G4 — Solo mode: no directors: PASS Metric G5 — No auto-advance: PASS Verdict: FAIL (1 failure, 0 warnings) Fix: Add TD-PHASE-GATE, PR-PHASE-GATE, and AD-PHASE-GATE to the full-mode director panel in Phase 3.
Step 6 — Offer to Update Catalog
"May I update
CCGS Skill Testing Framework/catalog.yaml to record this category check
(last_category, last_category_result) for [name]?"
Phase 2C: Audit Mode — Coverage Report
Step 1 — Read Catalog
Read
CCGS Skill Testing Framework/catalog.yaml. If missing, note that catalog doesn't exist
yet (first-run state).
Step 2 — Enumerate All Skills and Agents
Glob
.claude/skills/*/SKILL.md to get the complete list of skills.
Extract skill name from each path (directory name).
Also read the
agents: section from CCGS Skill Testing Framework/catalog.yaml to get the
complete list of agents.
Step 3 — Build Skill Coverage Table
For each skill:
- Check if a spec file exists (use the
path from catalog, or globspec:
)CCGS Skill Testing Framework/skills/*/[name].md - Look up
,last_static
,last_static_result
,last_spec
,last_spec_result
,last_category
,last_category_result
from catalog (or mark as "never" / "—" if not in catalog)category - Priority comes from catalog
field (critical/high/medium/low)priority:
Step 3b — Build Agent Coverage Table
For each agent in catalog's
agents: section:
- Check if a spec file exists (use the
path from catalog, or globspec:
)CCGS Skill Testing Framework/agents/*/[name].md - Look up
,last_spec
,last_spec_result
from catalogcategory
Step 4 — Output Report
=== Skill Test Coverage Audit === Date: [date] SKILLS (72 total) Specs written: 72 (100%) | Never static tested: 72 | Never category tested: 72 Skill | Cat | Has Spec | Last Static | S.Result | Last Cat | C.Result | Priority -----------------------|----------|----------|-------------|----------|----------|----------|---------- gate-check | gate | YES | never | — | never | — | critical design-review | review | YES | never | — | never | — | critical ... AGENTS (49 total) Agent specs written: 49 (100%) Agent | Category | Has Spec | Last Spec | Result -----------------------|------------|----------|-------------|-------- creative-director | director | YES | never | — technical-director | director | YES | never | — ... Top 5 Priority Gaps (skills with no spec, critical/high priority): (none if all specs are written) Skill coverage: 72/72 specs (100%) Agent coverage: 49/49 specs (100%)
No file writes in audit mode.
Offer: "Would you like to run
/skill-test static all to check structural
compliance across all skills? /skill-test category all to run category rubric
checks? Or /skill-test spec [name] to run a specific behavioral test?"
Phase 3: Recommended Next Steps
After any mode completes, offer contextual follow-up:
- After
: "Runstatic [name]
to validate behavioral correctness if a test spec exists."/skill-test spec [name] - After
with failures: "Address NON-COMPLIANT skills first. Runstatic all
individually for detailed remediation guidance."/skill-test static [name] - After
PASS: "Updatespec [name]
to record this pass date. Consider runningCCGS Skill Testing Framework/catalog.yaml
to find the next spec gap."/skill-test audit - After
FAIL: "Review the failing assertions and update the skill or the test spec to resolve the mismatch."spec [name] - After
: "Start with the critical-priority gaps. Use the spec template ataudit
to create new specs."CCGS Skill Testing Framework/templates/skill-test-spec.md