Meta-Skill-Engineering skill-creator

install

source · Clone the upstream repo

git clone https://github.com/merceralex397-collab/Meta-Skill-Engineering

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/Meta-Skill-Engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skill-creator" ~/.claude/skills/merceralex397-collab-meta-skill-engineering-skill-creator-1d07b7 && rm -rf "$T"

manifest: skill-creator/SKILL.md

source content

Purpose

Create new agent skills and iteratively improve them through structured draft-test-review-improve cycles.

When to use

User says "create a skill for X", "write a skill that…", "I need a skill to handle…"
Repeated task pattern needs capturing as a reusable procedure
Existing conversation contains a workflow the user wants to turn into a skill
User has a draft skill and wants to iterate on it with test feedback
Capability should be packaged for reuse across projects

When NOT to use

Skill exists and needs improvement without full creation iteration →
```
skill-improver
```
Only the description/trigger needs fixing →
```
skill-trigger-optimization
```
Skill needs porting to a different environment →
```
skill-adaptation
```
User wants to install a packaged skill →
```
skill-installer
```
User wants to split one broad skill into several →
```
skill-variant-splitting
```
User wants a standalone evaluation without creation →
```
skill-evaluation
```
User wants to find external skills before building →
```
community-skill-harvester
```

Procedure

Phase 1 — Capture intent

Start by understanding what the user wants the skill to do. The current conversation may already contain a workflow the user wants to capture. If so, extract answers from the conversation history first — tools used, step sequence, corrections made, input/output formats observed.

Gather answers to:

What should this skill enable the agent to do?
When should this skill trigger? (specific user phrases and contexts)
What is the expected output format?
Are there edge cases, dependencies, or environmental requirements?

Ask questions about edge cases, input/output formats, example files, success criteria, and dependencies. Research available documentation and similar skills if useful context exists.

Phase 2 — Write the SKILL.md

Step 1 — Define the skill's job in one sentence

Write: "This skill [verb] when [trigger] and produces [output]."

If you cannot write this sentence, the scope is wrong — narrow until it works.

Step 2 — Choose the name

Lowercase, hyphens, 2–4 words, under 64 characters
Describe what it does (verb-noun), not when it's used
Must match the parent directory name

Step 3 — Write the YAML frontmatter

---
name: skill-name
description: >-
  [Action verb] [specific object] when [task conditions].
  Use this for [2-3 realistic trigger phrases in quotes].
  Do not use for [adjacent non-matching cases with named alternatives].
---

Description rules — this is the highest-leverage field in the skill:

Start with an action verb (not a noun phrase)
Include 2–3 realistic trigger phrases users would actually say
State what the skill produces
End with "Do not use for…" naming adjacent skills
Make it slightly assertive about when to trigger — agents tend to under-trigger rather than over-trigger
Include keywords and contexts that should activate the skill

Flag a description if it: is under 12 words, has no action verb first, has no condition, lacks trigger examples, could apply to multiple skills, or has no negative boundary.

Step 4 — Write the body sections

Every SKILL.md body contains these sections in order:

Purpose (required) — 2–3 sentences. What problem does it solve? What output does it produce?

When to use (required) — 4–6 specific trigger phrases or observable conditions as "Use when:" plus 3–4 confusion cases as "Do NOT use when:" with named alternatives.

Procedure (required) — Numbered steps. Each starts with a verb. Each is completable and verifiable. No meta-commentary, no hedge verbs. Use action verbs: Read, List, Write, Check, Run, Compare.

Output contract (required) — Exact format with template showing section names, field names, or schemas. Include a concrete filled example where possible — agents produce more consistent output when shown examples rather than only format descriptions.

Failure handling (required) — Name the 2–3 most common failure modes with specific recovery actions. Not "if something goes wrong, report the issue" but "if target file does not exist: stop, report missing path, ask user to confirm location."

References (optional) — Real URLs to authoritative documentation.

Step 5 — Calibrate instruction depth

Match instruction specificity to task fragility:

High freedom (prose): Multiple valid approaches, context-dependent
Medium freedom (pseudocode): Preferred pattern with acceptable variation
Low freedom (exact steps/scripts): Fragile operations, consistency critical

Explain the why behind important instructions. Agents follow reasoning-based instructions more reliably than rigid imperatives without context. If you find yourself writing ALWAYS or NEVER in all caps, reframe as reasoning.

Step 6 — Manage skill size

Keep SKILL.md under 500 lines. Skills load at three levels:

Metadata (name + description) — always in context (~100 words)
SKILL.md body — loaded when skill triggers (target: under 5k words)
Bundled resources — loaded on demand by the agent

If approaching the limit, extract reference material into

references/

and link clearly from SKILL.md. For large reference files (>300 lines), include a table of contents.

Organize by variant when supporting multiple domains:

skill-name/
├── SKILL.md
└── references/
    ├── variant-a.md
    ├── variant-b.md
    └── variant-c.md

Step 7 — Validate against common authoring mistakes

Check the completed skill for:

Tutorial instead of procedure — "Let me explain…", background sections. A skill is an operating manual, not a textbook. Cut everything the agent doesn't need mid-task.
Goals instead of steps — "Ensure quality" with no HOW. Every step must be a concrete verb the agent can execute.
Reference material inline — SKILL.md >200 lines with lookup tables or API schemas. Extract to
```
references/
```
.
Description written last — Write the description FIRST. It defines the scope everything else must serve.
Missing negative boundaries — Every skill must say what it's NOT for with named alternatives.
Circular triggers — "when task involves X" without defining X.
Implicit capability assumptions — Steps that assume tools the agent may not have. Declare dependencies explicitly.
No output example — Format described in prose but not exemplified.

Phase 3 — Create test cases

After writing the skill draft, create 2–5 realistic test prompts — the kind of thing a real user would actually say when they need this skill.

Good test prompts are:

Concrete and specific with realistic detail (file paths, names, context)
A mix of lengths and styles (formal, casual, terse, detailed)
Include edge cases and near-miss scenarios
Include cases that should NOT trigger the skill

Save test cases to

evals/evals.json

{
  "skill_name": "example-skill",
  "evals": [
    {
      "id": 1,
      "prompt": "User's realistic task prompt",
      "should_trigger": true,
      "expected_output": "Description of expected result"
    }
  ]
}

Share the test cases with the user: "Here are test cases I'd like to try. Do these look right, or do you want to add or change any?"

Phase 4 — Test and review

For each test case, execute the skill's procedure against the test prompt and capture the output.

If baseline comparison is possible:

New skill: Run without the skill as baseline
Improving existing skill: Use the previous version as baseline

Organize results by iteration:

workspace/
└── iteration-1/
    ├── test-1/
    │   ├── with_skill/
    │   └── baseline/
    └── test-2/
        ├── with_skill/
        └── baseline/

Present results to the user for review. For each test case, show:

The prompt
The skill's output
The baseline output (if available)
Any quantitative metrics (pass rates, timing)

Ask for specific feedback: "How does this look? What would you change?"

Phase 5 — Improve and iterate

Based on user feedback and test results:

Generalize from feedback — Avoid overfitting to specific test cases. The skill will be used across many different prompts. Prefer reasoning-based improvements over rigid rules.
Keep the skill lean — Remove instructions that aren't pulling their weight. Read test transcripts to identify unproductive steps.
Look for repeated work — If test runs independently produce similar helper scripts or patterns, bundle that as a script in
```
scripts/
```
.
Apply improvements and rerun all test cases into a new iteration directory.
Repeat until:
- The user says they're satisfied
- Feedback is empty (everything looks good)
- No meaningful progress is being made

Phase 6 — Finalize the skill folder

skill-name/
├── SKILL.md          # Frontmatter + body sections
├── evals/            # Test cases (if created)
├── scripts/          # Optional: deterministic automation
├── references/       # Optional: large docs for progressive disclosure
└── assets/           # Optional: templates, static files used in output

Only create subdirectories if they contain actual files.

After finalization, recommend next steps:

Run
```
skill-trigger-optimization
```
to optimize the description for routing
Run
```
skill-testing-harness
```
to build a formal eval suite
Run
```
skill-evaluation
```
to validate routing accuracy and output quality
Run
```
skill-safety-review
```
if the skill executes code or writes files

Output contract

Deliver a complete skill folder containing:

SKILL.md with valid YAML frontmatter and all required body sections
evals/ with test cases if iteration was performed
references/ if the skill needs progressive disclosure
scripts/ if the skill includes deterministic automation
assets/ if the skill provides templates or static files

The SKILL.md must pass all Phase 2 Step 7 validation checks before delivery.

Failure handling

Scope too broad: Skill handles multiple distinct tasks → split via
```
skill-variant-splitting
```
or narrow the scope
Cannot write one-sentence definition: Scope is wrong → keep narrowing until the sentence works
Overlaps existing skill: Check catalog → merge or differentiate explicitly in descriptions
No clear trigger phrases: Ask user what words they'd use when they need this capability
Description never fires in practice: Add more trigger phrase variations, make description more assertive about when to activate
User wants to skip testing: Proceed without iteration but note that untested skills have unknown quality

Skill structure reference

skill-name/
├── SKILL.md (required)     # Frontmatter + instructions
│   ├── YAML frontmatter    # name, description (routing logic)
│   └── Markdown body       # procedure, output contract, failure handling
└── Bundled Resources (optional)
    ├── scripts/    - Executable code for deterministic/repetitive tasks
    ├── references/ - Docs loaded into context as needed
    └── assets/     - Files used in output (templates, icons, fonts)

Next steps

Build test infrastructure →
```
skill-testing-harness
```
Evaluate routing and output quality →
```
skill-evaluation
```
Compare variants if multiple drafts →
```
skill-benchmarking
```
Optimize trigger description →
```
skill-trigger-optimization
```
Review for safety hazards →
```
skill-safety-review
```
Record provenance →
```
skill-provenance
```
Package for distribution →
```
skill-packaging
```

References

Agent Skills specification: https://agentskills.io/specification
What are skills: https://agentskills.io/what-are-skills