openclaw-self-healing

4-tier autonomous self-healing system for OpenClaw Gateway with persistent learning, reasoning logs, and multi-channel alerts. Features Claude Code as Level 3 emergency doctor for AI-powered diagnosis and repair.

install
source · Clone the upstream repo
git clone https://github.com/Ramsbaby/openclaw-self-healing
Claude Code · Install into ~/.claude/skills/
git clone --depth=1 https://github.com/Ramsbaby/openclaw-self-healing ~/.claude/skills/ramsbaby-openclaw-self-healing-openclaw-self-healing
OpenClaw · Install into ~/.openclaw/skills/
git clone --depth=1 https://github.com/Ramsbaby/openclaw-self-healing ~/.openclaw/skills/ramsbaby-openclaw-self-healing-openclaw-self-healing
manifest: SKILL.md
source content

OpenClaw Self-Healing System

"The system that heals itself — or calls for help when it can't."

A 4-tier autonomous self-healing system for OpenClaw Gateway.

Architecture

Level 1: Watchdog (180s)     → Process monitoring (OpenClaw built-in)
Level 2: Health Check (300s) → HTTP 200 + 3 retries
Level 3: Claude Recovery     → 30min AI-powered diagnosis 🧠
Level 4: Discord Alert       → Human escalation

What's Special (v2.0)

  • World's first Claude Code as Level 3 emergency doctor
  • Persistent Learning - Automatic recovery documentation (symptom → cause → solution → prevention)
  • Reasoning Logs - Explainable AI decision-making process
  • Multi-Channel Alerts - Discord + Telegram support
  • Metrics Dashboard - Success rate, recovery time, trending analysis
  • Production-tested (verified recovery Feb 5-6, 2026)
  • macOS LaunchAgent integration

Quick Setup

1. Install Dependencies

brew install tmux
npm install -g @anthropic-ai/claude-code

2. Configure Environment

# Copy template to OpenClaw config directory
cp .env.example ~/.openclaw/.env

# Edit and add your Discord webhook (optional)
nano ~/.openclaw/.env

3. Install Scripts

# Copy scripts
cp scripts/*.sh ~/openclaw/scripts/
chmod +x ~/openclaw/scripts/*.sh

# Install LaunchAgent
cp launchagent/com.openclaw.healthcheck.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist

4. Verify

# Check Health Check is running
launchctl list | grep openclaw.healthcheck

# View logs
tail -f ~/openclaw/memory/healthcheck-$(date +%Y-%m-%d).log

Scripts

ScriptLevelDescription
gateway-healthcheck.sh
2HTTP 200 check + 3 retries + escalation
emergency-recovery.sh
3Claude Code PTY session for AI diagnosis (v1)
emergency-recovery-v2.sh
3Enhanced with learning + reasoning logs (v2) ⭐
emergency-recovery-monitor.sh
4Discord/Telegram notification on failure
metrics-dashboard.sh
-Visualize recovery statistics (NEW)

Configuration

All settings via environment variables in

~/.openclaw/.env
:

VariableDefaultDescription
DISCORD_WEBHOOK_URL
(none)Discord webhook for alerts
OPENCLAW_GATEWAY_URL
http://localhost:18789/
Gateway health check URL
HEALTH_CHECK_MAX_RETRIES
3
Restart attempts before escalation
EMERGENCY_RECOVERY_TIMEOUT
1800
Claude recovery timeout (30 min)

Testing

Test Level 2 (Health Check)

# Run manually
bash ~/openclaw/scripts/gateway-healthcheck.sh

# Expected output:
# ✅ Gateway healthy

Test Level 3 (Claude Recovery)

# Inject a config error (backup first!)
cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak

# Wait for Health Check to detect and escalate (~8 min)
tail -f ~/openclaw/memory/emergency-recovery-*.log

Links

License

MIT License - do whatever you want with it.

Built by @ramsbaby + Jarvis 🦞