Harness-engineering harness-tdd

Harness TDD

install
source · Clone the upstream repo
git clone https://github.com/Intense-Visions/harness-engineering
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-tdd" ~/.claude/skills/intense-visions-harness-engineering-harness-tdd-5edc09 && rm -rf "$T"
manifest: agents/skills/claude-code/harness-tdd/SKILL.md
source content

Harness TDD

Red-green-refactor cycle integrated with harness validation. No production code exists without a failing test first.

When to Use

  • Implementing any new feature, function, module, or component
  • Fixing any bug (write a test that reproduces the bug first)
  • Adding behavior to existing code
  • When
    on_new_feature
    or
    on_bug_fix
    triggers fire
  • NOT when doing pure refactoring with existing test coverage (use harness-refactoring instead)
  • NOT when writing documentation, configuration, or non-behavioral files
  • NOT when spiking/prototyping (but convert spikes to TDD before merging)

Process

Iron Law

No production code may exist without a failing test that demanded its creation.

If you find yourself writing production code first, STOP. Delete it. Write the test first. This is not a guideline — it is a hard constraint.

Phase 1: RED — Write a Failing Test

  1. Identify the smallest behavior to test. One assertion per test. One behavior per cycle. If you are testing two things, split into two cycles.

  2. Write the test file or add to the appropriate test file. Follow the project's existing test conventions (file naming, framework, location).

  3. Write ONE minimal test that asserts the expected behavior. The test should:

    • Have a clear, descriptive name that states what behavior is expected
    • Set up only the minimal fixtures needed
    • Make a single assertion about the expected outcome
    • NOT test implementation details — test observable behavior
  4. Run the test suite. Use the project's test runner (e.g.,

    npx vitest run path/to/test
    ,
    npm test
    ,
    pytest
    ).

  5. MANDATORY: Watch the test FAIL. Read the failure message. Confirm it fails for the RIGHT reason — the behavior is not yet implemented, not because the test is broken. If the test passes, either the behavior already exists (skip this cycle) or the test is wrong (fix the test).

  6. Record the failure. Note the test name and failure reason. This is your contract for the GREEN phase.

Phase 2: GREEN — Write the Simplest Code to Pass

  1. Write the MINIMUM production code that makes the failing test pass. Do not write code for future tests. Do not add error handling you have not tested. Do not generalize.

  2. Resist the urge to write "good" code. The GREEN phase is about correctness, not elegance. Hardcoded values are acceptable if they pass the test. Duplication is acceptable. You will clean up in REFACTOR.

  3. Run the FULL test suite (not just the new test). All tests must pass.

  4. MANDATORY: Watch the test PASS. Read the output. Confirm all tests are green. If any test fails, fix the production code (not the tests) until all pass.

  5. Do not proceed to REFACTOR if any test is red. Fix first.

Phase 3: REFACTOR — Clean Up While Green

  1. With all tests passing, look for opportunities to improve:

    • Remove duplication (DRY)
    • Extract methods or functions for clarity
    • Rename for better readability
    • Simplify conditionals
    • Improve structure without changing behavior
  2. Run the full test suite after EVERY change. If a test breaks during refactoring, undo the last change immediately. Refactoring must not change behavior.

  3. Keep refactoring steps small. One rename, one extraction, one simplification at a time. Run tests between each.

  4. If no refactoring is needed, skip this phase. Not every cycle requires cleanup.

Phase 4: VALIDATE — Run Harness Checks

  1. Run

    harness check-deps
    to verify dependency boundaries are respected. New code must not introduce forbidden imports or layer violations.

  2. Run

    harness validate
    to verify the full project health. This catches architectural drift, documentation gaps, and constraint violations.

  3. If either check fails, fix the issue before committing. The fix may require another RED-GREEN-REFACTOR cycle if it involves behavioral changes.

  4. Commit the cycle. Each RED-GREEN-REFACTOR-VALIDATE cycle produces one atomic commit. The commit message references what behavior was added (not "add test" — describe the behavior).

Graph Refresh

If a knowledge graph exists at

.harness/graph/
, refresh it after code changes to keep graph queries accurate:

harness scan [path]

Skipping this step means subsequent graph queries (impact analysis, dependency health, test advisor) may return stale results.

Uncertainty Surfacing

When you encounter an unknown during a RED-GREEN-REFACTOR cycle, classify it immediately:

  • Blocking: Cannot write a meaningful test without resolving this (e.g., unclear expected behavior). STOP and surface to human with options.
  • Assumption: Can write the test if assumption is stated explicitly (e.g., "input is always non-null"). Document the assumption in a test comment and continue. If the assumption proves wrong, the test must be revised.
  • Deferrable: Does not affect the current cycle (e.g., performance characteristics). Record for a future cycle.

Do not bury unknowns in test code. An unstated assumption in a test is a test that passes for the wrong reason.

Cycle Rhythm

Repeat the 4 phases for each new behavior. A typical feature requires 3-10 cycles. Each cycle should take 2-15 minutes. If a cycle takes longer than 15 minutes, the step is too large — break it down.

Ordering within a feature:

  1. Start with the happy path (simplest success case)
  2. Add edge cases one at a time
  3. Add error handling cases
  4. Add integration points last

Harness Integration

  • harness check-deps
    — Run in VALIDATE phase after each cycle. Catches forbidden imports and layer boundary violations introduced by new code.
  • harness validate
    — Run in VALIDATE phase after each cycle. Full project health check including architecture, documentation, and constraints.
  • harness cleanup
    — Run periodically (every 3-5 cycles) to detect entropy accumulation. Address any issues before they compound.
  • Test runner — Use the project's configured test runner. Harness does not prescribe a test framework but the test must actually execute and report results.

Success Criteria

  • Every production function/method has at least one corresponding test
  • Every test was observed to fail before the production code was written
  • Every test was observed to pass after the production code was written
  • harness check-deps
    passes after each cycle
  • harness validate
    passes after each cycle
  • Each cycle is an atomic commit with a descriptive message
  • No test tests implementation details (only observable behavior)
  • No production code exists that was not demanded by a failing test

Rationalizations to Reject

RationalizationReality
"I know exactly what the implementation should be, so I will write it first and add the test after"Code before test equals delete it. The gate is explicit: if production code is written before a failing test exists, delete the production code and start correctly.
"The test passed on the first run, so TDD is working"If the test passed without implementing the production code, either the behavior already exists or the test is wrong. You must watch the test FAIL for the right reason before proceeding to GREEN.
"I will test multiple behaviors in this one test to be efficient"One test, one assertion, one behavior. Multi-behavior tests make it impossible to pinpoint which behavior broke when the test fails.
"Harness validate can wait until the end of the feature since it slows down the cycle"No skipping VALIDATE. Every cycle must end with harness check-deps and harness validate. A passing test with a failing validation means the implementation violated a project constraint.
"This edge case is unlikely, so I will skip writing a test for it"If the edge case can happen, it needs a test. Unlikely is not impossible. The test is cheap; the production bug is expensive.
"The existing tests cover this behavior implicitly, so no new test is needed"Implicit coverage is not TDD. If you cannot point to a specific test that asserts the specific behavior, write one. Implicit coverage breaks silently when the implying test changes.

Examples

Example: Adding a
calculateTotal
function

RED:

// cart.test.ts
it('calculates total for items with quantity and price', () => {
  const items = [
    { name: 'Widget', price: 10, quantity: 2 },
    { name: 'Gadget', price: 25, quantity: 1 },
  ];
  expect(calculateTotal(items)).toBe(45);
});

Run tests. Observe:

ReferenceError: calculateTotal is not defined
. Correct failure — function does not exist yet.

GREEN:

// cart.ts
export function calculateTotal(items: Array<{ price: number; quantity: number }>): number {
  return items.reduce((sum, item) => sum + item.price * item.quantity, 0);
}

Run tests. Observe: all tests pass.

REFACTOR: No refactoring needed for this simple function. Skip.

VALIDATE:

harness check-deps   # Pass
harness validate     # Pass
git add cart.ts cart.test.ts
git commit -m "feat(cart): calculate total from item price and quantity"

Next cycle (RED): Write a test for empty array input. Watch it fail (or pass — if it passes, the behavior is already handled). Continue.

Red Flags

FlagCorrective Action
"I'll write the test after since I know what the code should do"STOP. Test-after is not TDD. Delete the production code, write the test, watch it fail.
"The test is trivial/obvious so I don't need to watch it fail"STOP. Observing failure proves the test catches the defect. A test you haven't seen fail might pass for the wrong reason.
"I'll batch these small tests together to save time"STOP. Each RED-GREEN-REFACTOR cycle is atomic. Batching obscures which behavior broke when a test fails.
// removed old validation
or
// TODO: re-add error handling
replacing functional code
STOP. Code-to-comment replacement is deletion with a fig leaf. Either keep the code or delete it cleanly with a test proving it is unnecessary.

Gates

These are hard stops. Violating any gate means the process has broken down.

  • Code before test = delete it. If production code is written before a failing test exists, delete the production code and start the cycle correctly.
  • Must watch fail. If you did not observe the test fail with the correct failure reason, the RED phase is incomplete. Do not proceed to GREEN.
  • Must watch pass. If you did not observe all tests pass after writing production code, the GREEN phase is incomplete. Do not proceed to REFACTOR.
  • No skipping VALIDATE. Every cycle must end with
    harness check-deps
    and
    harness validate
    . Skipping creates architectural debt that compounds.
  • No multi-behavior tests. One test, one assertion, one behavior. Tests that assert multiple unrelated things must be split.
  • No "I'll write tests later." There is no later. The test comes first or the code does not get written.

Escalation

  • After 3 failed attempts to make a test pass: Stop coding. The design may be wrong. Re-examine the interface, the test assumptions, or the architecture. Consider whether the feature needs a different approach. Consult the plan or spec.
  • When a test cannot be written without complex mocking: This is a design smell. The code under test has too many dependencies. Refactor the existing code to be more testable before proceeding, or reconsider the abstraction boundary.
  • When harness checks repeatedly fail: The new code may be violating architectural constraints intentionally. Escalate to the human to decide whether to update the constraints or change the approach.
  • When the cycle is taking more than 15 minutes: The step is too large. Break the current behavior into smaller sub-behaviors and test each one separately.
  • When you are unsure what to test next: Review the spec or plan. If no spec exists, use the harness-brainstorming skill to clarify requirements before writing more tests.