Joelclaw joelclaw-system-check

Run a comprehensive health check of the joelclaw system — k8s cluster, worker, Inngest, Redis, Typesense/OTEL, tests, TypeScript, repo sync, memory pipeline, pi-tools, git config, active loops, disk, stale tests. Outputs a 1-10 score with per-component breakdown. Use when: 'system health', 'health check', 'is everything working', 'system status', 'how's the system', 'check everything', or at session start to orient.

install
source · Clone the upstream repo
git clone https://github.com/joelhooks/joelclaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/joelhooks/joelclaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/joelclaw-system-check" ~/.claude/skills/joelhooks-joelclaw-joelclaw-system-check && rm -rf "$T"
manifest: skills/joelclaw-system-check/SKILL.md
source content

joelclaw System Health Check

Run

scripts/health.sh
for a full system health report with 1-10 score.

~/Code/joelhooks/joelclaw/skills/joelclaw-system-check/scripts/health.sh

What It Checks (16 components)

CheckWhatGreen (10)Yellow (5-7)Red (1-3)
k8s clusterpods in
joelclaw
namespace
4/4 Running, 0 restartspartial podsno pods
pdsAT Proto PDS on :2583version + collectionspod running, port-forward downpod not running
workersystem-bus on :311116+ functionsresponding, low countdown
inngest server:8288 reachablerespondingdown
redis/gatewayRedis + gateway session queuesconnected, low pending queueconnected, backlog risingunavailable
typesense/otelTypesense health + OTEL query pathhealthy + queryablehealthy, query degradedunavailable
tests
bun test
in system-bus
0 failfailures
tsc
tsc --noEmit
cleantype errors
repo syncmonorepo HEAD vs
origin/main
in syncahead/behindrepo unavailable
memory pipeline
joelclaw inngest memory-health
healthy checksdegraded checksfailing checks
pi-toolsextension deps installedall 3 depsmissing
git configuser.name + email setsetmissing
active loops
joelclaw loop list
queryablequery degradedunavailable
gogcliGoogle Workspace authaccount authed, token validtoken stored, no passwordnot configured
diskfree space + loop tmp<80% used>80%
stale tests
__tests__/
+ acceptance tests
cleanpresent

When to Run

  • Session start — orient on system state before doing work
  • After loops complete — verify nothing broke
  • After infra changes — k8s, worker, Redis config
  • When something feels off — quick triage

Fixing Common Issues

Repo drift:

cd ~/Code/joelhooks/joelclaw && git fetch origin && git status -sb

pi-tools broken:

cd ~/.pi/agent/git/github.com/joelhooks/pi-tools && bun add @sinclair/typebox @mariozechner/pi-coding-agent @mariozechner/pi-tui @mariozechner/pi-ai

PDS unreachable:

kubectl port-forward -n joelclaw svc/bluesky-pds 2583:3000 &
(or if pod down:
kubectl rollout restart deployment/bluesky-pds -n joelclaw
)

Worker down:

joelclaw inngest restart-worker --register

Stale tests:

rm -rf ~/Code/joelhooks/joelclaw/packages/system-bus/__tests__/ && find ~/Code/joelhooks/joelclaw/packages/system-bus/src -name "*.acceptance.test.ts" -delete

Loop tmp bloat:

rm -rf /tmp/agent-loop/loop-*/
(only when no loops are running)

Inngest Hung-Run Quick Triage

When a run appears stuck after first step:

joelclaw run <run-id>

If trace shows

Finalization
failure with
"Unable to reach SDK URL"
:

  1. Verify registration/health:

    joelclaw inngest status

  2. Verify function is present where expected:

    joelclaw functions | rg -i "manifest-archive|<function-name>"

  3. Check for stale app registrations in Inngest UI/API and remove stale SDK URLs.

  4. Assume possible handler blocking (not just network): review recent step code for filesystem/Redis/subprocess blocking before step response.