Meta-Skill-Engineering skill-creator
git clone https://github.com/merceralex397-collab/Meta-Skill-Engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/Meta-Skill-Engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skill-creator" ~/.claude/skills/merceralex397-collab-meta-skill-engineering-skill-creator-1d07b7 && rm -rf "$T"
skill-creator/SKILL.mdPurpose
Create new agent skills and iteratively improve them through structured draft-test-review-improve cycles.
When to use
- User says "create a skill for X", "write a skill that…", "I need a skill to handle…"
- Repeated task pattern needs capturing as a reusable procedure
- Existing conversation contains a workflow the user wants to turn into a skill
- User has a draft skill and wants to iterate on it with test feedback
- Capability should be packaged for reuse across projects
When NOT to use
- Skill exists and needs improvement without full creation iteration →
skill-improver - Only the description/trigger needs fixing →
skill-trigger-optimization - Skill needs porting to a different environment →
skill-adaptation - User wants to install a packaged skill →
skill-installer - User wants to split one broad skill into several →
skill-variant-splitting - User wants a standalone evaluation without creation →
skill-evaluation - User wants to find external skills before building →
community-skill-harvester
Procedure
Phase 1 — Capture intent
Start by understanding what the user wants the skill to do. The current conversation may already contain a workflow the user wants to capture. If so, extract answers from the conversation history first — tools used, step sequence, corrections made, input/output formats observed.
Gather answers to:
- What should this skill enable the agent to do?
- When should this skill trigger? (specific user phrases and contexts)
- What is the expected output format?
- Are there edge cases, dependencies, or environmental requirements?
Ask questions about edge cases, input/output formats, example files, success criteria, and dependencies. Research available documentation and similar skills if useful context exists.
Phase 2 — Write the SKILL.md
Step 1 — Define the skill's job in one sentence
Write: "This skill [verb] when [trigger] and produces [output]."
If you cannot write this sentence, the scope is wrong — narrow until it works.
Step 2 — Choose the name
- Lowercase, hyphens, 2–4 words, under 64 characters
- Describe what it does (verb-noun), not when it's used
- Must match the parent directory name
Step 3 — Write the YAML frontmatter
--- name: skill-name description: >- [Action verb] [specific object] when [task conditions]. Use this for [2-3 realistic trigger phrases in quotes]. Do not use for [adjacent non-matching cases with named alternatives]. ---
Description rules — this is the highest-leverage field in the skill:
- Start with an action verb (not a noun phrase)
- Include 2–3 realistic trigger phrases users would actually say
- State what the skill produces
- End with "Do not use for…" naming adjacent skills
- Make it slightly assertive about when to trigger — agents tend to under-trigger rather than over-trigger
- Include keywords and contexts that should activate the skill
Flag a description if it: is under 12 words, has no action verb first, has no condition, lacks trigger examples, could apply to multiple skills, or has no negative boundary.
Step 4 — Write the body sections
Every SKILL.md body contains these sections in order:
Purpose (required) — 2–3 sentences. What problem does it solve? What output does it produce?
When to use (required) — 4–6 specific trigger phrases or observable conditions as "Use when:" plus 3–4 confusion cases as "Do NOT use when:" with named alternatives.
Procedure (required) — Numbered steps. Each starts with a verb. Each is completable and verifiable. No meta-commentary, no hedge verbs. Use action verbs: Read, List, Write, Check, Run, Compare.
Output contract (required) — Exact format with template showing section names, field names, or schemas. Include a concrete filled example where possible — agents produce more consistent output when shown examples rather than only format descriptions.
Failure handling (required) — Name the 2–3 most common failure modes with specific recovery actions. Not "if something goes wrong, report the issue" but "if target file does not exist: stop, report missing path, ask user to confirm location."
References (optional) — Real URLs to authoritative documentation.
Step 5 — Calibrate instruction depth
Match instruction specificity to task fragility:
- High freedom (prose): Multiple valid approaches, context-dependent
- Medium freedom (pseudocode): Preferred pattern with acceptable variation
- Low freedom (exact steps/scripts): Fragile operations, consistency critical
Explain the why behind important instructions. Agents follow reasoning-based instructions more reliably than rigid imperatives without context. If you find yourself writing ALWAYS or NEVER in all caps, reframe as reasoning.
Step 6 — Manage skill size
Keep SKILL.md under 500 lines. Skills load at three levels:
- Metadata (name + description) — always in context (~100 words)
- SKILL.md body — loaded when skill triggers (target: under 5k words)
- Bundled resources — loaded on demand by the agent
If approaching the limit, extract reference material into
references/ and
link clearly from SKILL.md. For large reference files (>300 lines), include
a table of contents.
Organize by variant when supporting multiple domains:
skill-name/ ├── SKILL.md └── references/ ├── variant-a.md ├── variant-b.md └── variant-c.md
Step 7 — Validate against common authoring mistakes
Check the completed skill for:
- Tutorial instead of procedure — "Let me explain…", background sections. A skill is an operating manual, not a textbook. Cut everything the agent doesn't need mid-task.
- Goals instead of steps — "Ensure quality" with no HOW. Every step must be a concrete verb the agent can execute.
- Reference material inline — SKILL.md >200 lines with lookup tables or
API schemas. Extract to
.references/ - Description written last — Write the description FIRST. It defines the scope everything else must serve.
- Missing negative boundaries — Every skill must say what it's NOT for with named alternatives.
- Circular triggers — "when task involves X" without defining X.
- Implicit capability assumptions — Steps that assume tools the agent may not have. Declare dependencies explicitly.
- No output example — Format described in prose but not exemplified.
Phase 3 — Create test cases
After writing the skill draft, create 2–5 realistic test prompts — the kind of thing a real user would actually say when they need this skill.
Good test prompts are:
- Concrete and specific with realistic detail (file paths, names, context)
- A mix of lengths and styles (formal, casual, terse, detailed)
- Include edge cases and near-miss scenarios
- Include cases that should NOT trigger the skill
Save test cases to
evals/evals.json:
{ "skill_name": "example-skill", "evals": [ { "id": 1, "prompt": "User's realistic task prompt", "should_trigger": true, "expected_output": "Description of expected result" } ] }
Share the test cases with the user: "Here are test cases I'd like to try. Do these look right, or do you want to add or change any?"
Phase 4 — Test and review
For each test case, execute the skill's procedure against the test prompt and capture the output.
If baseline comparison is possible:
- New skill: Run without the skill as baseline
- Improving existing skill: Use the previous version as baseline
Organize results by iteration:
workspace/ └── iteration-1/ ├── test-1/ │ ├── with_skill/ │ └── baseline/ └── test-2/ ├── with_skill/ └── baseline/
Present results to the user for review. For each test case, show:
- The prompt
- The skill's output
- The baseline output (if available)
- Any quantitative metrics (pass rates, timing)
Ask for specific feedback: "How does this look? What would you change?"
Phase 5 — Improve and iterate
Based on user feedback and test results:
-
Generalize from feedback — Avoid overfitting to specific test cases. The skill will be used across many different prompts. Prefer reasoning-based improvements over rigid rules.
-
Keep the skill lean — Remove instructions that aren't pulling their weight. Read test transcripts to identify unproductive steps.
-
Look for repeated work — If test runs independently produce similar helper scripts or patterns, bundle that as a script in
.scripts/ -
Apply improvements and rerun all test cases into a new iteration directory.
-
Repeat until:
- The user says they're satisfied
- Feedback is empty (everything looks good)
- No meaningful progress is being made
Phase 6 — Finalize the skill folder
skill-name/ ├── SKILL.md # Frontmatter + body sections ├── evals/ # Test cases (if created) ├── scripts/ # Optional: deterministic automation ├── references/ # Optional: large docs for progressive disclosure └── assets/ # Optional: templates, static files used in output
Only create subdirectories if they contain actual files.
After finalization, recommend next steps:
- Run
to optimize the description for routingskill-trigger-optimization - Run
to build a formal eval suiteskill-testing-harness - Run
to validate routing accuracy and output qualityskill-evaluation - Run
if the skill executes code or writes filesskill-safety-review
Output contract
Deliver a complete skill folder containing:
- SKILL.md with valid YAML frontmatter and all required body sections
- evals/ with test cases if iteration was performed
- references/ if the skill needs progressive disclosure
- scripts/ if the skill includes deterministic automation
- assets/ if the skill provides templates or static files
The SKILL.md must pass all Phase 2 Step 7 validation checks before delivery.
Failure handling
- Scope too broad: Skill handles multiple distinct tasks → split via
or narrow the scopeskill-variant-splitting - Cannot write one-sentence definition: Scope is wrong → keep narrowing until the sentence works
- Overlaps existing skill: Check catalog → merge or differentiate explicitly in descriptions
- No clear trigger phrases: Ask user what words they'd use when they need this capability
- Description never fires in practice: Add more trigger phrase variations, make description more assertive about when to activate
- User wants to skip testing: Proceed without iteration but note that untested skills have unknown quality
Skill structure reference
skill-name/ ├── SKILL.md (required) # Frontmatter + instructions │ ├── YAML frontmatter # name, description (routing logic) │ └── Markdown body # procedure, output contract, failure handling └── Bundled Resources (optional) ├── scripts/ - Executable code for deterministic/repetitive tasks ├── references/ - Docs loaded into context as needed └── assets/ - Files used in output (templates, icons, fonts)
Next steps
- Build test infrastructure →
skill-testing-harness - Evaluate routing and output quality →
skill-evaluation - Compare variants if multiple drafts →
skill-benchmarking - Optimize trigger description →
skill-trigger-optimization - Review for safety hazards →
skill-safety-review - Record provenance →
skill-provenance - Package for distribution →
skill-packaging
References
- Agent Skills specification: https://agentskills.io/specification
- What are skills: https://agentskills.io/what-are-skills