Skillshub skill-comply
Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/affaan-m/everything-claude-code/skill-comply" ~/.claude/skills/comeonoliver-skillshub-skill-comply && rm -rf "$T"
manifest:
skills/affaan-m/everything-claude-code/skill-comply/SKILL.mdsource content
skill-comply: Automated Compliance Measurement
Measures whether coding agents actually follow skills, rules, or agent definitions by:
- Auto-generating expected behavioral sequences (specs) from any .md file
- Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing)
- Running
and capturing tool call traces via stream-jsonclaude -p - Classifying tool calls against spec steps using LLM (not regex)
- Checking temporal ordering deterministically
- Generating self-contained reports with spec, prompts, and timelines
Supported Targets
- Skills (
): Workflow skills like search-first, TDD guidesskills/*/SKILL.md - Rules (
): Mandatory rules like testing.md, security.md, git-workflow.mdrules/common/*.md - Agent definitions (
): Whether an agent gets invoked when expected (internal workflow verification not yet supported)agents/*.md
When to Activate
- User runs
/skill-comply <path> - User asks "is this rule actually being followed?"
- After adding new rules/skills, to verify agent compliance
- Periodically as part of quality maintenance
Usage
# Full run uv run python -m scripts.run ~/.claude/rules/common/testing.md # Dry run (no cost, spec + scenarios only) uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md # Custom models uv run python -m scripts.run --gen-model haiku --model sonnet <path>
Key Concept: Prompt Independence
Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it.
Report Contents
Reports are self-contained and include:
- Expected behavioral sequence (auto-generated spec)
- Scenario prompts (what was asked at each strictness level)
- Compliance scores per scenario
- Tool call timelines with LLM classification labels
Advanced (optional)
For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself.