Awesome-omni-skill writing-skills

Skill creation and editing workflow for Claude Code skills with validation and deployment verification. Use when creating new skills, editing existing skills, or verifying skills work before deployment. Do NOT use for CLAUDE.md design, hooks, or general Claude Code workflow (use code-agent-meta-patterns).

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/devops/writing-skills-jlaws" ~/.claude/skills/diegosouzapw-awesome-omni-skill-writing-skills-0bec00 && rm -rf "$T"

manifest: skills/devops/writing-skills-jlaws/SKILL.md

source content

Writing Skills

Writing skills IS Test-Driven Development applied to process documentation.

Personal skills live in agent-specific directories (

~/.claude/skills

for Claude Code,
~/.codex/skills
for Codex)

You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).

Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.

When to Create a Skill

Create when: Technique wasn't intuitively obvious, reusable across projects, pattern applies broadly, others would benefit.

Don't create for: One-off solutions, standard well-documented practices, project-specific conventions (use CLAUDE.md), mechanically enforceable constraints (automate instead).

Skill Types

Technique: Concrete method with steps (condition-based-waiting)
Pattern: Way of thinking about problems (flatten-with-flags)
Reference: API docs, syntax guides, tool documentation

Directory Structure

skills/
  skill-name/
    SKILL.md              # Main reference (required)
    supporting-file.*     # Only if needed

Separate files for: heavy reference (100+ lines), reusable tools. Keep everything else inline.

SKILL.md Structure

Frontmatter: Only

name

(letters/numbers/hyphens) and

description

(max 1024 chars, third-person, starts with "Use when...")

CRITICAL: Description = triggering conditions ONLY. Never summarize the skill's workflow in description. Testing showed Claude follows description shortcuts instead of reading skill body.

# BAD: Summarizes workflow
description: Use when executing plans - dispatches subagent per task with code review between tasks

# GOOD: Just triggers
description: Use when executing implementation plans with independent tasks in the current session

Body structure:

# Skill Name
## Overview (1-2 sentences, core principle)
## When to Use (flowchart IF decision non-obvious, bullets with symptoms)
## Core Pattern (before/after code)
## Quick Reference (table/bullets)
## Common Mistakes (what goes wrong + fixes)

Claude Search Optimization (CSO)

Use concrete triggers/symptoms in description, not language-specific symptoms
Include keywords Claude would search: error messages, symptoms, tool names

Name by what you DO:

condition-based-waiting

not

async-test-helpers

Gerunds work well:
```
creating-skills
```
,
```
debugging-with-logs
```

Token Efficiency:

Getting-started workflows: <150 words
Frequently-loaded skills: <200 words
Other skills: <500 words
Move details to
```
--help
```
, use cross-references, compress examples

Cross-References: Use

**REQUIRED SUB-SKILL:** Use superpowers:skill-name

. Never use

links (force-loads, burns context).

Flowchart Usage

Use ONLY for: non-obvious decisions, process loops, "when to use A vs B". Never for: reference material, code examples, linear instructions.

Conventions and rendering: see

graphviz-conventions.dot

and

render-graphs.js

in this skill directory.

The Iron Law (Same as TDD)

NO SKILL WITHOUT A FAILING TEST FIRST

Applies to new skills AND edits. Write skill before testing? Delete it. Start over. No exceptions.

RED-GREEN-REFACTOR for Skills

RED: Run pressure scenario WITHOUT skill. Document exact failures and rationalizations.

GREEN: Write minimal skill addressing those specific failures. Re-test with skill.

REFACTOR: Agent found new rationalization? Add explicit counter. Re-test until bulletproof.

Testing Skills With Subagents

For detailed testing methodology, see

references/CLAUDE_MD_TESTING.md

Run scenarios without skill (RED), write skill addressing failures (GREEN), close loopholes (REFACTOR).

RED Phase: Baseline Testing

Run pressure scenario WITHOUT skill. Document exact failures.

Process:

Create pressure scenarios (3+ combined pressures)
Run WITHOUT skill
Document choices and rationalizations verbatim
Identify patterns

Pressure Types

Pressure	Example
Time	Emergency, deadline, deploy window
Sunk cost	Hours of work, "waste" to delete
Authority	Senior says skip it
Exhaustion	End of day, want to go home
Pragmatic	"Being pragmatic vs dogmatic"

Best tests combine 3+ pressures.

Key Elements

Concrete A/B/C choices (not open-ended)
Real constraints (specific times, consequences)
Real file paths
Make agent act ("What do you do?" not "What should you do?")

REFACTOR Phase: Close Loopholes

Capture new rationalizations verbatim. For each, add:

Explicit negation in rules
Rationalization table entry
Red flag entry
Updated description with violation symptoms

Re-test after each refactor. Continue until no new rationalizations.

Meta-Testing

After agent chooses wrong: "You read the skill and chose wrong. How could it be clearer?"

Three responses:

"Skill WAS clear, I chose to ignore" -> Need stronger foundational principle
"Skill should have said X" -> Add their suggestion
"I didn't see section Y" -> Make it more prominent

Persuasion Principles for Skill Design

LLMs respond to the same persuasion principles as humans. Meincke et al. (2025) tested 7 principles with N=28,000 AI conversations. Persuasion techniques doubled compliance rates (33% -> 72%, p < .001).

Effective Principles

Authority: Imperative language ("YOU MUST", "Never", "Always"), non-negotiable framing. For discipline-enforcing skills.
Commitment: Require announcements, force explicit choices. For ensuring skills are followed.
Scarcity: Time-bound requirements ("Before proceeding"), sequential dependencies. For immediate verification.
Social Proof: Universal patterns ("Every time", "Always"), failure modes. For documenting universal practices.
Unity: Collaborative language ("our codebase", "we're colleagues"). For collaborative workflows.
Reciprocity & Liking: Use sparingly or avoid.

Skill Type	Use	Avoid
Discipline-enforcing	Authority + Commitment + Social Proof	Liking, Reciprocity
Guidance/technique	Moderate Authority + Unity	Heavy authority
Collaborative	Unity + Commitment	Authority, Liking
Reference	Clarity only	All persuasion

Anthropic Best Practices

Skill Categories (from Anthropic's Guide)

Category	Description	Examples
1. Document & Asset Creation	Generate files from templates/specs	Commit messages, PR descriptions, config files
2. Workflow Automation	Multi-step processes with tool use	Code review, deployment, refactoring
3. MCP Enhancement	Extend Claude with external tool integrations	API wrappers, database queries, service connectors

Success Criteria Methodology

Define what "good output" looks like before writing the skill
Create 3+ concrete evaluation scenarios with expected outcomes
Test against scenarios, measure pass rate
Iterate until pass rate meets threshold (aim for >80%)

Core Principles

Concise is key: Only add what Claude doesn't already know. Challenge each piece: "Does Claude need this?"
Degrees of freedom: High (text instructions) for multiple valid approaches; Medium (pseudocode) for preferred patterns; Low (exact scripts) for fragile operations

Anti-Patterns

Offering too many library options (provide a default with escape hatch)
Assuming packages installed (list dependencies)
Deeply nested references (keep one level deep)
Time-sensitive information (use "old patterns" section)

Evaluation-Driven Development

Run Claude on tasks without Skill, document failures
Create 3+ evaluation scenarios
Establish baseline performance
Write minimal instructions to pass evaluations
Iterate: evaluate, compare baseline, refine

Quick Checklist (from Anthropic's Reference A)

Frontmatter:
```
name
```
,
```
description
```
(with trigger phrases),
```
allowed-tools
```
Description starts with "Use when..." and includes negative triggers
Body is concise (under 500 words for most skills)
At least one code example or concrete output
No redundant information Claude already knows
Cross-references use relative paths, not
```
@
```
links

Skill Creation Checklist

RED Phase:

Create pressure scenarios (3+ combined pressures for discipline skills)
Run WITHOUT skill - document baseline failures verbatim
Identify rationalization patterns

GREEN Phase:

YAML frontmatter: name (letters/numbers/hyphens), description (Use when..., third-person)
Address specific baseline failures
Keywords throughout for search
One excellent code example (not multi-language)
Run WITH skill - verify compliance

REFACTOR Phase:

Add counters for new rationalizations
Build rationalization table and red flags list
Re-test until bulletproof

Deployment:

Commit and push