Harness-engineering harness-skill-authoring

Harness Skill Authoring

install
source · Clone the upstream repo
git clone https://github.com/Intense-Visions/harness-engineering
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-skill-authoring" ~/.claude/skills/intense-visions-harness-engineering-harness-skill-authoring-8fcbc9 && rm -rf "$T"
manifest: agents/skills/claude-code/harness-skill-authoring/SKILL.md
source content

Harness Skill Authoring

Create and extend harness skills following the rich skill format. Define purpose, choose type, write skill.yaml and SKILL.md with all required sections, validate, and test.

When to Use

  • Creating a new skill for a team's recurring workflow
  • Extending an existing skill with new phases, gates, or examples
  • Converting an informal process ("how we do code reviews") into a formal harness skill
  • When a team notices they repeat the same multi-step process and wants to codify it
  • NOT when running an existing skill (use the skill directly)
  • NOT when listing or discovering skills (use
    harness skill list
    )
  • NOT when the process is a one-off task that will not recur

Process

Iron Law

No skill ships without validation passing and test scenarios exercising every discipline section.

A skill that passes happy-path execution but has untested discipline sections (Red Flags, Gates, Rationalizations) is a trap — agents activate it but have no guardrails when they encounter edge cases. Phase 5B is not optional.


Phase 1: DEFINE — Establish the Skill's Purpose

  1. Identify the recurring process. What does the team do repeatedly? Name it. Describe it in one sentence. This becomes the skill's

    description
    in
    skill.yaml
    and the blockquote summary in
    SKILL.md
    .

  2. Define the scope boundary. A good skill does one thing well. If the process has distinct phases that could be done independently, consider whether it should be multiple skills. Signs of a skill that is too broad: more than 6 phases, multiple unrelated triggers, trying to serve two different audiences.

  3. Identify the trigger conditions. When should this skill activate?

    • manual
      — Only when explicitly invoked
    • on_new_feature
      — When starting a new feature
    • on_bug_fix
      — When fixing a bug
    • on_pr_review
      — When reviewing a pull request
    • on_project_init
      — When initializing or entering a project
    • Multiple triggers are fine if the skill genuinely applies to all of them
  4. Determine required tools. What tools does the skill need? Common sets:

    • Read-only analysis:
      Read
      ,
      Glob
      ,
      Grep
    • Code modification:
      Read
      ,
      Write
      ,
      Edit
      ,
      Glob
      ,
      Grep
      ,
      Bash
    • Full workflow: all of the above plus specialized tools

Phase 2: CHOOSE TYPE — Rigid or Flexible

  1. Choose rigid when:

    • The process has strict ordering that must not be violated
    • Skipping steps causes real damage (data loss, security holes, broken deployments)
    • Compliance or policy requires auditability of each step
    • The process includes mandatory checkpoints where human approval is needed
    • Examples: TDD cycle, deployment pipeline, security audit, database migration
  2. Choose flexible when:

    • The process has recommended steps but the order can adapt to context
    • The agent should use judgment about which steps to emphasize
    • Different situations call for different subsets of the process
    • The process is more about guidelines than rigid procedure
    • Examples: code review, onboarding, brainstorming, project initialization
  3. Key difference in SKILL.md: Rigid skills require

    ## Gates
    and
    ## Escalation
    sections. Flexible skills may omit them (though they can include them if useful).

Phase 2B: CHOOSE TYPE — Knowledge Skills

Knowledge skills encode domain reference material (design patterns, architectural guidance, best practices) as first-class skills. They are not behavioral — they do not define process phases, require tools, or maintain state. Agents receive them as contextual guidance alongside behavioral skills.

Choose

type: knowledge
when:

  • The skill encodes "know things" content, not "do things" process
  • The content is reference material an agent should consult rather than execute
  • The skill maps cleanly to a technology vertical (e.g., React patterns, TypeScript generics, OWASP rules)
  • The skill's value is educational depth that benefits from progressive disclosure
  • NOT when the skill requires tools, phases, or state — those are rigid or flexible skills
  • NOT when the skill is a harness process skill (planning, execution, brainstorming)

Knowledge skill

skill.yaml
template:

name: <technology-prefix>-<pattern-name> # e.g., react-hooks-pattern, ts-generics-pattern
version: '1.0.0'
description: <one-line summary>
cognitive_mode: advisory-guide # always advisory-guide for knowledge skills
type: knowledge
tier: 3
triggers:
  - manual
platforms:
  - claude-code
  - gemini-cli
  - cursor
tools: [] # knowledge skills declare no tools
paths:
  - '**/*.tsx' # file-type-specific globs only — see rules below
related_skills:
  - <complementary-knowledge-skill> # other knowledge skills in the same vertical
stack_signals:
  - typescript
  - react
keywords:
  - hooks
  - custom-hooks
metadata:
  author: <author-name>
  upstream: '<source-url>' # provenance tracking
state:
  persistent: false
  files: []
depends_on: []

paths
authoring rules (critical — read carefully):

The

paths
field is a list of minimatch glob patterns matched against the user's recent/changed files. A match boosts the skill's dispatch score by 0.20 (the paths scoring weight). Incorrect globs cause false activation.

Rules:

  1. Use file-type-specific globs only.
    **/*.tsx
    ,
    **/*.vue
    ,
    *.sql
    ,
    Dockerfile
    are correct.
    **/*.ts
    is incorrect — it activates for every TypeScript project regardless of context.
  2. The lesson from Vue skills:
    **/*.ts
    was initially added to Vue skills and had to be removed. TypeScript files exist in every project; Vue-specific content should only activate on
    **/*.vue
    files.
  3. Config file patterns are safe.
    playwright.config.ts
    ,
    prisma/schema.prisma
    ,
    drizzle.config.ts
    are sufficiently specific.
  4. When in doubt, leave
    paths: []
    .
    A skill with
    paths: []
    scores 0.0 on the paths dimension but can still surface via keyword and stack_signals matching.
  5. Test your globs. After adding paths, verify the skill surfaces when the target file is present and does NOT surface when an unrelated file is present (e.g., a Vue skill must not activate when only
    .ts
    files are present).

related_skills
authoring rules:

The

related_skills
field lists skill names that complement the current skill. These are surfaced as secondary knowledge recommendations when the current skill is dispatched — this works for both knowledge skills (when auto-injected) and behavioral skills (when recommended).

Rules:

  1. Only reference skills that genuinely complement each other. A React hooks skill relates to React compound pattern, not to OWASP security rules.
  2. Reference by exact skill name (directory name, e.g.,
    react-compound-pattern
    , not
    React Compound Pattern
    ).
  3. Keep lists short (2–5 entries). More than 5 related skills dilutes the signal.
  4. Bidirectionality is not required but is good practice — if skill A lists skill B, skill B should list skill A.

Knowledge skill

SKILL.md
structure:

Knowledge skills use a two-section disclosure model. The

## Instructions
section (~5K tokens max) is auto-injected into agent context on high-confidence matches. The
## Details
section is loaded on-demand for educational depth.

# <Pattern Name>

> <One-sentence description>

## When to Use

- [Specific activation conditions]
- NOT when [boundary conditions]

## Instructions

[Agent-facing directives — concise, actionable, <5K tokens]
[This section is auto-injected on high-confidence dispatch]

## Details

[Educational depth — patterns, examples, trade-offs]
[This section is loaded on-demand when the agent explicitly requests it]

## Source

[Link to original source / upstream]

The

## Details
heading is the boundary marker. Everything before it is the Instructions section. The progressive disclosure split happens at
\n## Details
.

Schema constraints enforced automatically:

  • type: knowledge
    phases
    must be empty or omitted
  • type: knowledge
    tools
    must be empty
  • type: knowledge
    state.persistent
    must be false
  • type: knowledge
    cognitive_mode
    defaults to
    advisory-guide
  • type: knowledge
    tier
    defaults to 3

Phase 3: WRITE SKILL.YAML — Define Metadata

  1. Create the skill directory under

    agents/skills/<platform>/<skill-name>/
    .

  2. Write

    skill.yaml
    with all required fields:

name: <skill-name> # Kebab-case, matches directory name
version: '1.0.0' # Semver
description: <one-line summary> # What this skill does
triggers:
  - <trigger-1>
  - <trigger-2>
platforms:
  - claude-code # Which agent platforms support this skill
tools:
  - <tool-1> # Tools the skill requires
  - <tool-2>
cli:
  command: harness skill run <skill-name>
  args:
    - name: <arg-name>
      description: <arg-description>
      required: <true|false>
mcp:
  tool: run_skill
  input:
    skill: <skill-name>
type: <rigid|flexible>
state:
  persistent: <true|false> # Does this skill maintain state across sessions?
  files:
    - <state-file-path> # List state files if persistent
depends_on:
  - <prerequisite-skill> # Skills that must be available (not necessarily run first)
# Optional fields (required for knowledge skills, optional for behavioral skills)
paths: [] # File-type-specific glob patterns for dispatch boost
related_skills: [] # Complementary skill names (surfaces related knowledge when dispatched)
metadata: # Provenance and authorship metadata
  author: <author>
  upstream: '<source-url>'
  1. Validate the YAML. Ensure proper indentation, correct field names, and valid values. The
    name
    field must match the directory name exactly.

Phase 4: WRITE SKILL.MD — Author the Skill Content

  1. Start with the heading and summary:
# <Skill Name>

> <One-sentence description of what the skill does and why.>
  1. Write

    ## When to Use
    . Include both positive (when TO use) and negative (when NOT to use) conditions. Be specific. Negative conditions prevent misapplication and point to the correct alternative skill.

  2. Write

    ## Process
    . This is the core of the skill. Guidelines for writing good process sections:

    • Use phases to organize. Group related steps into named phases (e.g., ASSESS, IMPLEMENT, VERIFY). Each phase should have a clear purpose and completion criteria.
    • Number every step. Steps within a phase are numbered. This makes them referenceable ("go back to Phase 2, step 3").
    • Be prescriptive about actions. Say "Run
      harness validate
      " not "consider validating." Say "Read the file" not "you might want to read the file."
    • Include decision points. When the process branches, state the conditions clearly: "If X, do A. If Y, do B."
    • State what NOT to do. Prohibitions prevent common mistakes: "Do not proceed to Phase 3 if validation fails."
    • For rigid skills: Add an Iron Law at the top — the one inviolable principle. Then define phases with mandatory ordering and explicit gates between them.
    • For flexible skills: Describe the recommended flow but acknowledge that adaptation is expected. Focus on outcomes rather than exact commands.
  3. Write

    ## Harness Integration
    . List every harness CLI command the skill uses, with a brief description of when to use it. This section connects the skill to the harness toolchain.

  4. Write

    ## Success Criteria
    . Define how to know the skill was executed well. Each criterion should be observable and verifiable — not subjective.

  5. Write

    ## Examples
    . At least one concrete example showing the full process from start to finish. Use realistic project names, file paths, and commands. Show both the commands and their expected outputs.

  6. For rigid skills, write

    ## Gates
    . Gates are hard stops — conditions that must be true to proceed. Each gate should state what happens if violated. Format: "<condition> = <consequence>."

  7. For rigid skills, write

    ## Escalation
    . Define when to stop and ask for help. Each escalation condition should describe the symptom, the likely cause, and what to report.

  8. Write

    ## Rationalizations to Reject
    . Every user-facing skill must include this section. It contains domain-specific rationalizations that prevent agents from skipping steps with plausible-sounding excuses. Format requirements:

    • Table format:
      | Rationalization | Reality |
      with a header separator row
    • 3-8 entries per skill, each specific to the skill's domain
    • No generic filler. Every entry must address a rationalization that is plausible in the context of this specific skill
    • Do not repeat universal rationalizations. The following three are always in effect for all skills and must NOT appear in individual skill tables:
    RationalizationReality
    "It's probably fine""Probably" is not evidence. Verify before asserting.
    "This is best practice"Best practice in what context? Cite the source and confirm it applies here.
    "We can fix it later"If worth flagging, document now with a concrete follow-up plan.

    Example of a good domain-specific entry (for a code review skill):

    RationalizationReality
    "The tests pass so the logic must be correct"Passing tests prove the tested paths work. They say nothing about untested paths, edge cases, or whether the tests themselves are correct.

Phase 5: VALIDATE — Verify the Skill

  1. Run

    harness skill validate
    to check:

    • skill.yaml
      has all required fields and valid values
    • SKILL.md
      has all required sections (
      ## When to Use
      ,
      ## Process
      ,
      ## Harness Integration
      ,
      ## Success Criteria
      ,
      ## Examples
      ,
      ## Rationalizations to Reject
      )
    • Rigid skills have
      ## Gates
      and
      ## Escalation
      sections
    • The
      name
      in
      skill.yaml
      matches the directory name
    • Referenced tools exist
    • Referenced dependencies exist
  2. Fix any validation errors. Common issues:

    • Missing required section in
      SKILL.md
    • name
      field does not match directory name
    • Invalid trigger name
    • Missing
      type
      field in
      skill.yaml
  3. Test by running the skill:

    harness skill run <name>
    . Verify it loads correctly and the process instructions make sense in context.

Phase 5B: TDD FOR SKILLS — Test Before Signing Off

Apply test-driven thinking to skill authoring. A skill is not complete until it has been tested against scenarios that exercise its discipline sections.

  1. Write a test scenario. Before declaring the skill complete, define 2-3 concrete scenarios that should trigger the skill's discipline mechanisms:

    • One scenario that should trigger a Red Flag (if the skill has Red Flags)
    • One scenario that should trigger a Rationalization rejection
    • One scenario that should trigger a Gate (for rigid skills)
    ## Skill Test Scenarios
    
    ### Scenario 1: Red Flag — [quoted phrase from Red Flags section]
    
    Input: [describe the situation]
    Expected: Agent stops, cites the Red Flag, and takes corrective action
    
    ### Scenario 2: Rationalization — [quoted phrase from Rationalizations section]
    
    Input: [describe the tempting shortcut]
    Expected: Agent rejects the rationalization and follows the prescribed process
    
    ### Scenario 3: Gate — [gate condition]
    
    Input: [describe a state that violates the gate]
    Expected: Agent halts and does not proceed past the gate
    
  2. Mentally execute each scenario. Walk through the skill's process with the test input. Does the skill's prose clearly direct the agent to the correct behavior? If not, the skill needs revision — not the test.

  3. Check for gaps. If you cannot construct a scenario that triggers a discipline section, the section may be too abstract. Revise it to include a concrete quoted phrase or condition.

  4. Document test scenarios in a comment block at the end of SKILL.md or in a companion

    tests.md
    file. These serve as regression tests for future skill edits.

Skill Quality Checklist

Evaluate every skill along two dimensions:

Clear activationAmbiguous activationMissing activation
Specific implementationGood skillWasted — good instructions nobody findsBroken
Vague implementationTrap — agents activate but flounderBad skillEmpty shell
Missing implementationStubStubDoes not exist
  • Good skill = clear activation + specific implementation. The agent knows when to use it and exactly what to do.
  • Clear activation + vague implementation = trap. The skill fires correctly but the agent has no concrete instructions, leading to inconsistent results.
  • Ambiguous activation + specific implementation = wasted. Great instructions that never get used because the agent does not know when to activate the skill.

Knowledge skills follow the same quality matrix. "Clear activation" for a knowledge skill means

paths
globs and
stack_signals
are specific enough to surface the skill without false positives. "Specific implementation" means the
## Instructions
section is concise and actionable.

Use this checklist as a final quality gate before declaring a skill complete.

Harness Integration

  • harness skill validate
    — Validate a skill's
    skill.yaml
    and
    SKILL.md
    against the schema and structure requirements.
  • harness skill run <name>
    — Execute a skill to test it in context.
  • harness skill list
    — List all available skills, useful for checking that a new skill appears after creation.
  • harness add skill <name> --type <type>
    — Scaffold a new skill directory with template files (alternative to manual creation).

Success Criteria

  • skill.yaml
    exists with all required fields and passes schema validation
  • SKILL.md
    exists with all required sections filled with substantive content (not placeholders)
  • The skill name in
    skill.yaml
    matches the directory name
  • harness skill validate
    passes with zero errors
  • The process section has clear, numbered, actionable steps organized into phases
  • When to Use includes both positive and negative conditions
  • At least one concrete example demonstrates the full process
  • Rigid skills include Gates and Escalation sections with specific conditions and consequences
  • The skill can be loaded and run with
    harness skill run <name>

Red Flags

FlagCorrective Action
"This skill is simple enough to ship without test scenarios"STOP. Phase 5B exists because untested discipline sections are decoration. If you cannot construct a scenario that triggers each discipline section, the section is too abstract — revise it.
"I'll add the Rationalizations/Gates/Escalation section after the skill is working"STOP. Discipline sections are required sections. A skill that "works" without guardrails is a trap — agents activate but flounder at edge cases. Write the real content now.
"The skill works, I tested it by running it once"STOP. A single happy-path run does not test discipline sections. Write scenarios that trigger Red Flags, Gates, and Rationalizations.
// placeholder for future phases
or
// TODO: add gate conditions
in SKILL.md
STOP. Placeholder sections are stubs. A skill with stub discipline sections passes validation but fails in practice. Write the real content or remove the section entirely.

Review-never-fixes: When reviewing or extending an existing skill, identify issues but do not fix them inline. Document findings, propose fixes, and let the skill author decide. Reviewing and editing are separate roles.

Rationalizations to Reject

RationalizationReality
"This skill is too simple to need all required sections"Every section exists for a reason. A short section is fine; a missing section means the skill was not fully thought through.
"The process section covers it — no need for explicit success criteria"Process describes what to do. Success criteria describe how to know it worked. They serve different purposes.
"Rationalizations to Reject is meta — this skill does not need it"This section is required for all user-facing skills, including this one. No exceptions.
"I will add examples later once the skill is proven"Examples are a required section. A skill without examples forces the agent to guess at correct behavior. Write at least one example now.
"The When to Use section is obvious from the name"Negative conditions (when NOT to use) prevent misapplication. The skill name conveys nothing about boundary conditions.
"The skill works — I tested it by running it once"A single happy-path run does not test discipline sections. Write scenarios that trigger Red Flags, Gates, and Rationalizations. A skill that passes happy path but fails discipline scenarios is a trap.
"This is an internal skill, so discipline sections are unnecessary"Agents rationalize skipping steps in internal skills too. Every user-facing skill requires the full discipline stack — no exemptions for internal or simple skills.
"I will copy the Rationalizations from a similar skill and adapt them"Domain-specific means domain-specific. Copied rationalizations address the source skill's shortcuts, not this skill's. Write fresh entries based on what an agent executing THIS skill would try to skip.

Examples

Example: Creating a Flexible Skill for Database Migration Review

DEFINE:

Process: The team reviews database migrations before applying them.
Scope: Review only — not creating or applying migrations.
Triggers: manual (invoked when a migration PR is opened).
Tools: Read, Glob, Grep, Bash.

CHOOSE TYPE: Flexible — the review steps can vary based on migration complexity. Some migrations need data impact analysis, others do not.

WRITE skill.yaml:

name: review-db-migration
version: '1.0.0'
description: Review database migration files for safety and correctness
triggers:
  - manual
platforms:
  - claude-code
tools:
  - Read
  - Glob
  - Grep
  - Bash
cli:
  command: harness skill run review-db-migration
  args:
    - name: migration-file
      description: Path to the migration file to review
      required: true
mcp:
  tool: run_skill
  input:
    skill: review-db-migration
type: flexible
state:
  persistent: false
  files: []
depends_on: []

WRITE SKILL.md:

# Review Database Migration

> Review database migration files for safety, correctness, and
> reversibility before they are applied to any environment.

## When to Use

- When a new migration file has been created and needs review
- When a migration PR is opened
- NOT when writing migrations (write first, then review)
- NOT when applying migrations to environments (that is a deployment concern)

## Process

### Phase 1: ANALYZE — Understand the Migration

1. Read the migration file completely...
   [... full process content ...]

## Harness Integration

- `harness validate` — Verify project health after migration review
  [... etc ...]

VALIDATE:

harness skill validate review-db-migration  # Pass
harness skill run review-db-migration       # Loads correctly

Example: Creating a Rigid Skill for Release Deployment

DEFINE:

Process: Deploy a release to production. Strict ordering — cannot skip steps.
Triggers: manual.
Tools: Bash, Read, Glob.

CHOOSE TYPE: Rigid — skipping the smoke test or rollback verification step could cause production outages. Mandatory checkpoints for human approval before each environment promotion.

WRITE SKILL.md (key rigid sections):

## Gates

- **Tests must pass before build.** If the test suite fails, do not
  proceed to build. Fix the tests first.
- **Staging must be verified before production.** If staging smoke tests
  fail, do not promote to production. Roll back staging and investigate.
- **Human approval required at each promotion.** Use [checkpoint:human-verify]
  before promoting from staging to production. No auto-promotion.

## Escalation

- **When staging smoke tests fail on a test that passed locally:**
  Report: "Smoke test [name] fails in staging but passes locally.
  Likely cause: environment-specific configuration or data difference.
  Need to investigate before proceeding."
- **When rollback verification fails:** This is critical. Report immediately:
  "Rollback to version [X] failed. Current state: [description].
  Manual intervention required."

Gates

  • No shipping without validation.
    harness skill validate
    must pass with zero errors before declaring a skill complete. Validation failures are not warnings — they are hard stops.
  • No shipping without test scenarios. Phase 5B is mandatory. A skill without test scenarios that exercise its discipline sections (Red Flags, Gates, Rationalizations) is not complete, even if validation passes.
  • No placeholder sections. Every required section must contain substantive content. A
    ## Rationalizations to Reject
    section with generic entries or a
    ## Gates
    section with no conditions is a stub, not a section.
  • No skipping negative conditions. The
    ## When to Use
    section must include both positive (when TO use) and negative (when NOT to use) conditions. Missing negatives cause misapplication.

Escalation

  • When you cannot identify domain-specific rationalizations: The skill's process section is too vague. If you cannot imagine an agent trying to shortcut the process, the process lacks enough prescriptive steps. Revise the process first, then rationalizations will become obvious.
  • When validation fails on a section you believe is present: The section heading may not match the expected format exactly. Check for typos, extra whitespace, or wrong heading level.
  • When the skill spans more than 6 phases: The skill is doing too much. Propose splitting into two skills with clear handoff points. A skill that tries to cover too many concerns becomes vague in each one.
  • When you cannot write a test scenario that triggers a Gate: The Gate condition is too abstract. Revise it to include specific, observable conditions that an agent would encounter.