Claude-Code-Game-Studios skill-test

Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report).

install
source · Clone the upstream repo
git clone https://github.com/Donchitos/Claude-Code-Game-Studios
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Donchitos/Claude-Code-Game-Studios "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/skill-test" ~/.claude/skills/donchitos-claude-code-game-studios-skill-test && rm -rf "$T"
manifest: .claude/skills/skill-test/SKILL.md
source content

Skill Test

Validates

.claude/skills/*/SKILL.md
files for structural compliance and behavioral correctness. No external dependencies — runs entirely within the existing skill/hook/template architecture.

Four modes:

ModeCommandPurposeToken Cost
static
/skill-test static [name|all]
Structural linter — 7 compliance checks per skillLow (~1k/skill)
spec
/skill-test spec [name]
Behavioral verifier — evaluates assertions in test specMedium (~5k/skill)
category
/skill-test category [name|all]
Category rubric — checks skill against its category-specific metricsLow (~2k/skill)
audit
/skill-test audit
Coverage report — skills, agent specs, last test datesLow (~3k total)

Phase 1: Parse Arguments

Determine mode from the first argument:

  • static [name]
    → run 7 structural checks on one skill
  • static all
    → run 7 structural checks on all skills (Glob
    .claude/skills/*/SKILL.md
    )
  • spec [name]
    → read skill + test spec, evaluate assertions
  • category [name]
    → run category-specific rubric from
    CCGS Skill Testing Framework/quality-rubric.md
  • category all
    → run category rubric for every skill that has a
    category:
    in catalog
  • audit
    (or no argument) → read catalog, list all skills and agents, show coverage

If argument is missing or unrecognized, output usage and stop.


Phase 2A: Static Mode — Structural Linter

For each skill being tested, read its

SKILL.md
fully and run all 7 checks:

Check 1 — Required Frontmatter Fields

The file must contain all of these in the YAML frontmatter block:

  • name:
  • description:
  • argument-hint:
  • user-invocable:
  • allowed-tools:

FAIL if any are absent.

Check 2 — Multiple Phases

The skill must have ≥2 numbered phase headings. Look for patterns like:

  • ## Phase N
    or
    ## Phase N:
  • ## N.
    (numbered top-level sections)
  • At least 2 distinct
    ##
    headings if phases aren't explicitly numbered

FAIL if fewer than 2 phase-like headings are found.

Check 3 — Verdict Keywords

The skill must contain at least one of:

PASS
,
FAIL
,
CONCERNS
,
APPROVED
,
BLOCKED
,
COMPLETE
,
READY
,
COMPLIANT
,
NON-COMPLIANT

FAIL if none are present.

Check 4 — Collaborative Protocol Language

The skill must contain ask-before-write language. Look for:

  • "May I write"
    (canonical form)
  • "before writing"
    or
    "approval"
    near file-write instructions
  • "ask"
    +
    "write"
    in close proximity (within same section)

WARN if absent (some read-only skills legitimately skip this). FAIL if

allowed-tools
includes
Write
or
Edit
but no ask-before-write language is found.

Check 5 — Next-Step Handoff

The skill must end with a recommended next action or follow-up path. Look for:

  • A final section mentioning another skill (e.g.,
    /story-done
    ,
    /gate-check
    )
  • "Recommended next" or "next step" phrasing
  • A "Follow-Up" or "After this" section

WARN if absent.

Check 6 — Fork Context Complexity

If frontmatter contains

context: fork
, the skill should have ≥5 phase headings (
##
level or numbered Phase N headers). Fork context is for complex multi-phase skills; simple skills should not use it.

WARN if

context: fork
is set but fewer than 5 phases found.

Check 7 — Argument Hint Plausibility

argument-hint
must be non-empty. If the skill body mentions multiple modes (e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the hint against the first phase's "Parse Arguments" section.

WARN if hint is

""
or if documented modes don't match hint.


Static Mode Output Format

For a single skill:

=== Skill Static Check: /[name] ===

Check 1 — Frontmatter Fields:    PASS
Check 2 — Multiple Phases:       PASS (7 phases found)
Check 3 — Verdict Keywords:      PASS (PASS, FAIL, CONCERNS)
Check 4 — Collaborative Protocol: PASS ("May I write" found)
Check 5 — Next-Step Handoff:     WARN (no follow-up section found)
Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set)
Check 7 — Argument Hint:         PASS

Verdict: WARNINGS (1 warning, 0 failures)
Recommended: Add a "Follow-Up Actions" section at the end of the skill.

For

static all
, produce a summary table then list any non-compliant skills:

=== Skill Static Check: All 52 Skills ===

Skill                  | Result       | Issues
-----------------------|--------------|-------
gate-check             | COMPLIANT    |
design-review          | COMPLIANT    |
story-readiness        | WARNINGS     | Check 5: no handoff
...

Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT
Aggregate Verdict: N WARNINGS / N FAILURES

Phase 2B: Spec Mode — Behavioral Verifier

Step 1 — Locate Files

Find skill at

.claude/skills/[name]/SKILL.md
. Look up the spec path from
CCGS Skill Testing Framework/catalog.yaml
— use the
spec:
field for the matching skill entry.

If either is missing:

  • Missing skill: "Skill '[name]' not found in
    .claude/skills/
    ."
  • Missing spec path in catalog: "No spec path set for '[name]' in catalog.yaml."
  • Spec file not found at path: "Spec file missing at [path]. Run
    /skill-test audit
    to see coverage gaps."

Step 2 — Read Both Files

Read the skill file and test spec file completely.

Step 3 — Evaluate Assertions

For each Test Case in the spec:

  1. Read the Fixture description (assumed state of project files)
  2. Read the Expected behavior steps
  3. Read each Assertion checkbox

For each assertion, evaluate whether the skill's written instructions, if followed correctly given the fixture state, would satisfy it. This is a Claude-evaluated reasoning check, not code execution.

Mark each assertion:

  • PASS — skill instructions clearly satisfy this assertion
  • PARTIAL — skill instructions partially address it, but with ambiguity
  • FAIL — skill instructions would NOT satisfy this assertion given the fixture

For Protocol Compliance assertions (always present):

  • Check whether the skill requires "May I write" before file writes
  • Check whether the skill presents findings before requesting approval
  • Check whether the skill ends with a recommended next step
  • Check whether the skill avoids auto-creating files without approval

Step 4 — Build Report

=== Skill Spec Test: /[name] ===
Date: [date]
Spec: CCGS Skill Testing Framework/skills/[category]/[name].md

Case 1: [Happy Path — name]
  Fixture: [summary]
  Assertions:
    [PASS] [assertion text]
    [FAIL] [assertion text]
       Reason: The skill's Phase 3 says "..." but the fixture state means "..."
  Case Verdict: FAIL

Case 2: [Edge Case — name]
  ...
  Case Verdict: PASS

Protocol Compliance:
  [PASS] Uses "May I write" before file writes
  [PASS] Presents findings before asking approval
  [WARN] No explicit next-step handoff at end

Overall Verdict: FAIL (1 case failed, 1 warning)

Step 5 — Offer to Write Results

"May I write these results to

CCGS Skill Testing Framework/results/skill-test-spec-[name]-[date].md
and update
CCGS Skill Testing Framework/catalog.yaml
?"

If yes:

  • Write results file to
    CCGS Skill Testing Framework/results/
  • Update the skill's entry in
    CCGS Skill Testing Framework/catalog.yaml
    :
    • last_spec: [date]
    • last_spec_result: PASS|PARTIAL|FAIL

Phase 2D: Category Mode — Rubric Evaluation

Step 1 — Locate Skill and Category

Find skill at

.claude/skills/[name]/SKILL.md
. Look up
category:
field in
CCGS Skill Testing Framework/catalog.yaml
.

If skill not found: "Skill '[name]' not found." If no

category:
field: "No category assigned for '[name]' in catalog.yaml. Add
category: [name]
to the skill entry first."

For

category all
: collect all skills with a
category:
field and process each.
category: utility
skills are evaluated against U1 (static checks pass) and U2 (gate mode correct if applicable) only — skip to the static mode for U1.

Step 2 — Read Rubric Section

Read

CCGS Skill Testing Framework/quality-rubric.md
. Extract the section matching the skill's category (e.g.,
### gate
,
### team
).

Step 3 — Read Skill

Read the skill's

SKILL.md
fully.

Step 4 — Evaluate Rubric Metrics

For each metric in the category's rubric table:

  1. Check whether the skill's written instructions clearly satisfy the criterion
  2. Mark PASS, FAIL, or WARN
  3. For FAIL/WARN, identify the exact gap in the skill text (quote the relevant section or note its absence)

Step 5 — Output Report

=== Skill Category Check: /[name] ([category]) ===

Metric G1 — Review mode read:      PASS
Metric G2 — Full mode directors:   FAIL
  Gap: Phase 3 spawns only CD-PHASE-GATE; TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE absent
Metric G3 — Lean mode: PHASE-GATE only: PASS
Metric G4 — Solo mode: no directors:    PASS
Metric G5 — No auto-advance:       PASS

Verdict: FAIL (1 failure, 0 warnings)
Fix: Add TD-PHASE-GATE, PR-PHASE-GATE, and AD-PHASE-GATE to the full-mode director
     panel in Phase 3.

Step 6 — Offer to Update Catalog

"May I update

CCGS Skill Testing Framework/catalog.yaml
to record this category check (
last_category
,
last_category_result
) for [name]?"


Phase 2C: Audit Mode — Coverage Report

Step 1 — Read Catalog

Read

CCGS Skill Testing Framework/catalog.yaml
. If missing, note that catalog doesn't exist yet (first-run state).

Step 2 — Enumerate All Skills and Agents

Glob

.claude/skills/*/SKILL.md
to get the complete list of skills. Extract skill name from each path (directory name).

Also read the

agents:
section from
CCGS Skill Testing Framework/catalog.yaml
to get the complete list of agents.

Step 3 — Build Skill Coverage Table

For each skill:

  • Check if a spec file exists (use the
    spec:
    path from catalog, or glob
    CCGS Skill Testing Framework/skills/*/[name].md
    )
  • Look up
    last_static
    ,
    last_static_result
    ,
    last_spec
    ,
    last_spec_result
    ,
    last_category
    ,
    last_category_result
    ,
    category
    from catalog (or mark as "never" / "—" if not in catalog)
  • Priority comes from catalog
    priority:
    field (critical/high/medium/low)

Step 3b — Build Agent Coverage Table

For each agent in catalog's

agents:
section:

  • Check if a spec file exists (use the
    spec:
    path from catalog, or glob
    CCGS Skill Testing Framework/agents/*/[name].md
    )
  • Look up
    last_spec
    ,
    last_spec_result
    ,
    category
    from catalog

Step 4 — Output Report

=== Skill Test Coverage Audit ===
Date: [date]

SKILLS (72 total)
Specs written: 72 (100%) | Never static tested: 72 | Never category tested: 72

Skill                  | Cat      | Has Spec | Last Static | S.Result | Last Cat | C.Result | Priority
-----------------------|----------|----------|-------------|----------|----------|----------|----------
gate-check             | gate     | YES      | never       | —        | never    | —        | critical
design-review          | review   | YES      | never       | —        | never    | —        | critical
...

AGENTS (49 total)
Agent specs written: 49 (100%)

Agent                  | Category   | Has Spec | Last Spec   | Result
-----------------------|------------|----------|-------------|--------
creative-director      | director   | YES      | never       | —
technical-director     | director   | YES      | never       | —
...

Top 5 Priority Gaps (skills with no spec, critical/high priority):
(none if all specs are written)

Skill coverage:  72/72 specs (100%)
Agent coverage:  49/49 specs (100%)

No file writes in audit mode.

Offer: "Would you like to run

/skill-test static all
to check structural compliance across all skills?
/skill-test category all
to run category rubric checks? Or
/skill-test spec [name]
to run a specific behavioral test?"


Phase 3: Recommended Next Steps

After any mode completes, offer contextual follow-up:

  • After
    static [name]
    : "Run
    /skill-test spec [name]
    to validate behavioral correctness if a test spec exists."
  • After
    static all
    with failures: "Address NON-COMPLIANT skills first. Run
    /skill-test static [name]
    individually for detailed remediation guidance."
  • After
    spec [name]
    PASS: "Update
    CCGS Skill Testing Framework/catalog.yaml
    to record this pass date. Consider running
    /skill-test audit
    to find the next spec gap."
  • After
    spec [name]
    FAIL: "Review the failing assertions and update the skill or the test spec to resolve the mismatch."
  • After
    audit
    : "Start with the critical-priority gaps. Use the spec template at
    CCGS Skill Testing Framework/templates/skill-test-spec.md
    to create new specs."