Awesome-omni-skill writing-skills
Skill creation and editing workflow for Claude Code skills with validation and deployment verification. Use when creating new skills, editing existing skills, or verifying skills work before deployment. Do NOT use for CLAUDE.md design, hooks, or general Claude Code workflow (use code-agent-meta-patterns).
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/devops/writing-skills-jlaws" ~/.claude/skills/diegosouzapw-awesome-omni-skill-writing-skills-0bec00 && rm -rf "$T"
skills/devops/writing-skills-jlaws/SKILL.mdWriting Skills
Writing skills IS Test-Driven Development applied to process documentation.
Personal skills live in agent-specific directories (
for Claude Code, ~/.claude/skills
for Codex)~/.codex/skills
You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
When to Create a Skill
Create when: Technique wasn't intuitively obvious, reusable across projects, pattern applies broadly, others would benefit.
Don't create for: One-off solutions, standard well-documented practices, project-specific conventions (use CLAUDE.md), mechanically enforceable constraints (automate instead).
Skill Types
- Technique: Concrete method with steps (condition-based-waiting)
- Pattern: Way of thinking about problems (flatten-with-flags)
- Reference: API docs, syntax guides, tool documentation
Directory Structure
skills/ skill-name/ SKILL.md # Main reference (required) supporting-file.* # Only if needed
Separate files for: heavy reference (100+ lines), reusable tools. Keep everything else inline.
SKILL.md Structure
Frontmatter: Only
name (letters/numbers/hyphens) and description (max 1024 chars, third-person, starts with "Use when...")
CRITICAL: Description = triggering conditions ONLY. Never summarize the skill's workflow in description. Testing showed Claude follows description shortcuts instead of reading skill body.
# BAD: Summarizes workflow description: Use when executing plans - dispatches subagent per task with code review between tasks # GOOD: Just triggers description: Use when executing implementation plans with independent tasks in the current session
Body structure:
# Skill Name ## Overview (1-2 sentences, core principle) ## When to Use (flowchart IF decision non-obvious, bullets with symptoms) ## Core Pattern (before/after code) ## Quick Reference (table/bullets) ## Common Mistakes (what goes wrong + fixes)
Claude Search Optimization (CSO)
- Use concrete triggers/symptoms in description, not language-specific symptoms
- Include keywords Claude would search: error messages, symptoms, tool names
- Name by what you DO:
notcondition-based-waitingasync-test-helpers - Gerunds work well:
,creating-skillsdebugging-with-logs
Token Efficiency:
- Getting-started workflows: <150 words
- Frequently-loaded skills: <200 words
- Other skills: <500 words
- Move details to
, use cross-references, compress examples--help
Cross-References: Use
**REQUIRED SUB-SKILL:** Use superpowers:skill-name. Never use @ links (force-loads, burns context).
Flowchart Usage
Use ONLY for: non-obvious decisions, process loops, "when to use A vs B". Never for: reference material, code examples, linear instructions.
Conventions and rendering: see
graphviz-conventions.dot and render-graphs.js in this skill directory.
The Iron Law (Same as TDD)
NO SKILL WITHOUT A FAILING TEST FIRST
Applies to new skills AND edits. Write skill before testing? Delete it. Start over. No exceptions.
RED-GREEN-REFACTOR for Skills
RED: Run pressure scenario WITHOUT skill. Document exact failures and rationalizations.
GREEN: Write minimal skill addressing those specific failures. Re-test with skill.
REFACTOR: Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
Testing Skills With Subagents
For detailed testing methodology, see
references/CLAUDE_MD_TESTING.md.
Run scenarios without skill (RED), write skill addressing failures (GREEN), close loopholes (REFACTOR).
RED Phase: Baseline Testing
Run pressure scenario WITHOUT skill. Document exact failures.
Process:
- Create pressure scenarios (3+ combined pressures)
- Run WITHOUT skill
- Document choices and rationalizations verbatim
- Identify patterns
Pressure Types
| Pressure | Example |
|---|---|
| Time | Emergency, deadline, deploy window |
| Sunk cost | Hours of work, "waste" to delete |
| Authority | Senior says skip it |
| Exhaustion | End of day, want to go home |
| Pragmatic | "Being pragmatic vs dogmatic" |
Best tests combine 3+ pressures.
Key Elements
- Concrete A/B/C choices (not open-ended)
- Real constraints (specific times, consequences)
- Real file paths
- Make agent act ("What do you do?" not "What should you do?")
REFACTOR Phase: Close Loopholes
Capture new rationalizations verbatim. For each, add:
- Explicit negation in rules
- Rationalization table entry
- Red flag entry
- Updated description with violation symptoms
Re-test after each refactor. Continue until no new rationalizations.
Meta-Testing
After agent chooses wrong: "You read the skill and chose wrong. How could it be clearer?"
Three responses:
- "Skill WAS clear, I chose to ignore" -> Need stronger foundational principle
- "Skill should have said X" -> Add their suggestion
- "I didn't see section Y" -> Make it more prominent
Persuasion Principles for Skill Design
LLMs respond to the same persuasion principles as humans. Meincke et al. (2025) tested 7 principles with N=28,000 AI conversations. Persuasion techniques doubled compliance rates (33% -> 72%, p < .001).
Effective Principles
- Authority: Imperative language ("YOU MUST", "Never", "Always"), non-negotiable framing. For discipline-enforcing skills.
- Commitment: Require announcements, force explicit choices. For ensuring skills are followed.
- Scarcity: Time-bound requirements ("Before proceeding"), sequential dependencies. For immediate verification.
- Social Proof: Universal patterns ("Every time", "Always"), failure modes. For documenting universal practices.
- Unity: Collaborative language ("our codebase", "we're colleagues"). For collaborative workflows.
- Reciprocity & Liking: Use sparingly or avoid.
| Skill Type | Use | Avoid |
|---|---|---|
| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
| Guidance/technique | Moderate Authority + Unity | Heavy authority |
| Collaborative | Unity + Commitment | Authority, Liking |
| Reference | Clarity only | All persuasion |
Anthropic Best Practices
Skill Categories (from Anthropic's Guide)
| Category | Description | Examples |
|---|---|---|
| 1. Document & Asset Creation | Generate files from templates/specs | Commit messages, PR descriptions, config files |
| 2. Workflow Automation | Multi-step processes with tool use | Code review, deployment, refactoring |
| 3. MCP Enhancement | Extend Claude with external tool integrations | API wrappers, database queries, service connectors |
Success Criteria Methodology
- Define what "good output" looks like before writing the skill
- Create 3+ concrete evaluation scenarios with expected outcomes
- Test against scenarios, measure pass rate
- Iterate until pass rate meets threshold (aim for >80%)
Core Principles
- Concise is key: Only add what Claude doesn't already know. Challenge each piece: "Does Claude need this?"
- Degrees of freedom: High (text instructions) for multiple valid approaches; Medium (pseudocode) for preferred patterns; Low (exact scripts) for fragile operations
Anti-Patterns
- Offering too many library options (provide a default with escape hatch)
- Assuming packages installed (list dependencies)
- Deeply nested references (keep one level deep)
- Time-sensitive information (use "old patterns" section)
Evaluation-Driven Development
- Run Claude on tasks without Skill, document failures
- Create 3+ evaluation scenarios
- Establish baseline performance
- Write minimal instructions to pass evaluations
- Iterate: evaluate, compare baseline, refine
Quick Checklist (from Anthropic's Reference A)
- Frontmatter:
,name
(with trigger phrases),descriptionallowed-tools - Description starts with "Use when..." and includes negative triggers
- Body is concise (under 500 words for most skills)
- At least one code example or concrete output
- No redundant information Claude already knows
- Cross-references use relative paths, not
links@
Skill Creation Checklist
RED Phase:
- Create pressure scenarios (3+ combined pressures for discipline skills)
- Run WITHOUT skill - document baseline failures verbatim
- Identify rationalization patterns
GREEN Phase:
- YAML frontmatter: name (letters/numbers/hyphens), description (Use when..., third-person)
- Address specific baseline failures
- Keywords throughout for search
- One excellent code example (not multi-language)
- Run WITH skill - verify compliance
REFACTOR Phase:
- Add counters for new rationalizations
- Build rationalization table and red flags list
- Re-test until bulletproof
Deployment:
- Commit and push