Vibecosystem self-healing

Self-healing codebase system. Monitors runtime errors, test failures, and build breaks. Automatically diagnoses root cause, generates fix, validates fix, and applies if safe. Zero human intervention for routine failures.

install

source · Clone the upstream repo

git clone https://github.com/vibeeval/vibecosystem

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/self-healing" ~/.claude/skills/vibeeval-vibecosystem-self-healing && rm -rf "$T"

manifest: skills/self-healing/SKILL.md

Self-Healing Codebase

Automated fault detection and repair. When a test fails, a build breaks, or a runtime error is logged, the healing pipeline wakes up, finds the root cause, generates a minimal fix, validates it, and applies it -- all without human intervention, as long as confidence is above the safety threshold.

Trigger Conditions

The healing pipeline activates on any of these events:

Trigger	Signal	Source
Test failure	Any test exits non-zero	CI logs, `npm test` output
Build break	Compiler/bundler error	CI logs, `npm run build` output
Runtime error	Unhandled exception, 5xx response	Application logs, Vercel logs
Type error	`tsc --noEmit` fails	Pre-commit hook, CI
Lint failure	ESLint/Prettier error blocking CI	CI logs

Signals NOT Handled

The following always require human review -- do NOT auto-heal:

Security vulnerabilities (auth bypass, injection, XSS)
Database migrations (schema changes, data loss risk)
Dependency version conflicts (breaking API changes)
Flaky tests (non-deterministic failures -- use
```
replay
```
agent instead)

The 4-Phase Healing Pipeline

DETECT    ->    DIAGNOSE    ->    FIX    ->    VALIDATE
  |                |               |               |
Trigger        Root cause       Minimal         Re-run
fires          identified       patch           test suite
               confidence       applied         Pass? DONE
               scored           to branch       Fail? ROLLBACK

Phase 1: Detect

Capture the full failure context:

- Error message (exact, not summarized)
- Stack trace (all frames, not truncated)
- File and line number of origin
- Recent git changes touching that file (last 3 commits)
- Environment (Node version, OS, CI vs local)

Pass all context to the Diagnose phase. Incomplete context = low confidence = no auto-fix.

Phase 2: Diagnose

The

sleuth

agent performs root cause analysis:

Read the failing file at the reported line
Trace the call stack upward (max 3 frames)
Check if a recent git commit introduced the regression (
```
git log -5 -- <file>
```
)
Classify the failure type:

Type	Examples	Confidence Boost
Regression	Line added in last commit causes error	+20%
Null reference	`Cannot read property X of undefined`	+10%
Import error	Module not found, wrong path	+15%
Type mismatch	TS type error at known location	+15%
Config error	Missing env var, wrong format	+10%
Logic error	Wrong condition, off-by-one	0% (harder to auto-fix)

Confidence score = base 50% + type bonus + evidence quality bonus (0-30%).

If final confidence < 80%: log the diagnosis, notify in

thoughts/HEALING.md

, stop. Do not attempt fix.

Phase 3: Fix

The

spark

agent generates the minimal fix:

Before touching any file:

git stash push -u -m "self-healing: pre-fix stash for <error-id>"

Fix rules:

Change the minimum number of lines necessary
Do not refactor surrounding code
Do not add features
Do not change function signatures
Add a comment:
```
// self-healing fix: <short reason>
```

Fix size limits:

Maximum 20 lines changed per fix
Maximum 3 files touched per fix
If fix requires more than this, confidence is forced to 0 -- escalate to human

Phase 4: Validate

The

verifier

agent runs the relevant test suite:

# For test failures: run the specific failing test + its file's full suite
npm test -- --testPathPattern="<failing-file>"

# For build breaks: re-run the build
npm run build

# For type errors: re-run type check
npx tsc --noEmit

# For runtime errors: run integration tests if available
npm run test:integration

Decision:

Validation PASS:
  -> git stash drop (discard the safety stash)
  -> Commit the fix: "fix: self-healing repair for <error-id>"
  -> Log to thoughts/HEALING.md
  -> Update MTTR metric

Validation FAIL:
  -> git stash pop --index (restore pre-fix state exactly)
  -> Log failure to thoughts/HEALING.md
  -> Escalate to human with full diagnosis report
  -> Do NOT retry automatically (prevents loop)

Safety Rules

Confidence gate: Never apply a fix when confidence < 80%
Scope gate: Never touch security code, DB migrations, or auth logic
Size gate: Never change more than 20 lines or 3 files in one heal
No retry loop: If validation fails, stop. Human takes over.
Always stash: Every fix attempt must be preceded by a git stash
One fix per trigger: One error = one heal attempt. If the error recurs after a successful fix, it is a different problem -- do not re-heal automatically.

Agent Roles

Agent	Phase	Responsibility
`sleuth`	Diagnose	Root cause analysis, confidence scoring
`spark`	Fix	Minimal patch generation
`verifier`	Validate	Test suite execution, pass/fail decision
`self-learner`	Post-heal	Record the pattern for future prevention
`coroner`	Post-heal	Check if the same bug exists elsewhere in the codebase

Healing Log Format

Append each healing event to

thoughts/HEALING.md

## Healing Event: 2026-04-07T10:15:00Z

### Trigger
- Type: Test failure
- File: src/api/users.test.ts
- Error: "Cannot read properties of undefined (reading 'id')"
- Stack: users.test.ts:42 -> users.ts:87 -> db/query.ts:31

### Diagnosis (sleuth)
- Root cause: db/query.ts:31 returns `null` when no row found; callers assume non-null
- Regression: commit abc123 (2026-04-06) removed the null guard
- Classification: Null reference (confidence: +10%)
- Evidence: stack trace + git blame confirm exact line + recent commit
- Final confidence: 85%

### Fix (spark)
- File: src/api/users.ts line 87
- Change: Added null check before accessing .id
- Lines changed: 3
- Stash: self-healing-20260407-101500

### Validation (verifier)
- Command: npm test -- --testPathPattern="src/api/users"
- Result: PASS (14 tests, 0 failures)
- Decision: KEEP

### Outcome
- Status: HEALED
- Time to repair: 4 minutes
- MTTR contribution: recorded

---

## Healing Event: 2026-04-07T14:22:00Z

### Trigger
- Type: Build break
- Error: Module '"./config"' has no exported member 'DATABASE_URL'

### Diagnosis (sleuth)
- Root cause: config.ts was refactored, export renamed to DB_URL
- Confidence: 91%

### Fix (spark)
- File: src/api/db.ts line 3
- Change: import { DB_URL as DATABASE_URL } from './config'
- Lines changed: 1

### Validation (verifier)
- Command: npm run build
- Result: PASS

### Outcome
- Status: HEALED
- Time to repair: 2 minutes

Metrics

Track in

thoughts/HEALING.md

header section (update after each event):

# Self-Healing Metrics (updated: 2026-04-07)

| Metric | Value |
|--------|-------|
| Total healing events | 23 |
| Successful auto-fixes | 19 (82.6%) |
| Escalated to human | 4 (17.4%) |
| Average MTTR (auto-fixed) | 3.8 minutes |
| Average MTTR (escalated) | 47 minutes |
| Most common trigger | Test failure (61%) |
| Most common fix type | Null reference guard |

Key Metrics Defined

MTTR (Mean Time To Repair): Time from trigger detection to validation PASS
Auto-fix success rate: Healed / (Healed + Escalated)
False fix rate: Fixes that passed validation but broke something else (caught by post-deploy monitoring)

Scope Limits (Non-Negotiable)

The healing pipeline will NEVER auto-fix:

- Files matching: **/auth/**, **/security/**, **/middleware/auth*
- Files matching: **/migrations/**, **/schema.*
- Environment variables or secrets
- Package versions in package.json / requirements.txt
- CI/CD configuration (.github/**, vercel.json, Dockerfile)
- Any file touched by a security-related commit message

If the failing code is in a scoped-out path, the pipeline logs the trigger and immediately escalates with the full diagnosis -- no fix attempt.

Activation

This skill is referenced by:

PostToolUse hooks when test/build commands exit non-zero
The
```
sentinel
```
agent during incident response
The
```
verifier
```
agent when a final validation fails

To invoke manually:

Use self-healing to diagnose and fix the failing test in src/api/users.test.ts.
Error: "Cannot read properties of undefined (reading 'id')"
Stack trace: [paste full trace]