Joelclaw sync-system-bus
Deploy the system-bus-worker to the joelclaw Kubernetes cluster from local machine. Use when syncing changes in packages/system-bus to k8s, especially because the GitHub Actions deploy job targets a non-existent self-hosted runner and cannot complete deploys automatically.
git clone https://github.com/joelhooks/joelclaw
T=$(mktemp -d) && git clone --depth=1 https://github.com/joelhooks/joelclaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/sync-system-bus" ~/.claude/skills/joelhooks-joelclaw-sync-system-bus && rm -rf "$T"
skills/sync-system-bus/SKILL.mdSync System Bus Worker
Deploy
system-bus-worker to the local joelclaw k8s cluster (Talos v1.12.4 / k8s v1.35.0).
Important:
.github/workflows/system-bus-worker-deploy.yml has a deploy job on self-hosted. That runner does not exist, so deploys must be completed locally.
Quick Deploy
The publish script handles everything — build, auth, push, k8s apply, rollout, verification:
cd ~/Code/joelhooks/joelclaw k8s/publish-system-bus-worker.sh
Optional: pass a tag (defaults to timestamp):
k8s/publish-system-bus-worker.sh a6de1e0
GHCR Auth Order
publish-system-bus-worker.sh now authenticates in this order:
env var (if provided)GHCR_TOKEN
(agent-secrets)secrets lease ghcr_pat
fallbackgh auth token
If your
gh auth token lacks read:packages/write:packages, push will 403. Use ghcr_pat.
What the Script Does
- Builds ARM64 Docker image (required — Talos/Colima node is aarch64)
- Authenticates to GHCR (prefers
leaseagent-secrets
; falls back toghcr_pat
) with temp Docker configgh auth token - Pushes
andghcr.io/joelhooks/system-bus-worker:${TAG}:latest - Updates the image ref in
k8s/system-bus-worker.yaml
the manifestkubectl apply- Waits for rollout (
)--timeout=180s - Probes the new pod's health endpoint
Post-Deploy Verification
joelclaw refresh # Re-register functions with Inngest joelclaw functions | grep "<new-function>" # Verify new function appears joelclaw status # Full health check joelclaw runs --count 3 # Confirm runs are flowing
Restart Safety (ADR-0156)
The worker is stateless between Inngest steps. Each step is a separate HTTP call; Inngest stores step output server-side. This means k8s rolling restarts are safe — Inngest retries the in-flight step against the new pod.
Critical rule: NEVER set
on Inngest functions. With retries: 0, a worker restart during step execution kills the run permanently. With retries ≥ 1, Inngest retries and hits the new pod.retries: 0
Current story-pipeline has
retries: 2 specifically to survive the ~1s restart window during deploys.
What happens during deploy
Step executing on old pod → old pod terminates → step fails (SDK unreachable) → Inngest retries after backoff → new pod handles retry → step completes
All previously completed steps are memoized. Only the in-flight step reruns.
Long-running steps (codex implement: 5-10 min)
If a deploy kills a codex step mid-execution, the step reruns from scratch on the new pod (5-10 min wasted but not fatal). For time-critical deploys during active loops, check
joelclaw loop status first and deploy between stories.
Manual Steps (if script fails)
Build
cd ~/Code/joelhooks/joelclaw TAG=$(git rev-parse --short HEAD) IMAGE="ghcr.io/joelhooks/system-bus-worker:${TAG}" docker build --platform linux/arm64 -t "$IMAGE" -t ghcr.io/joelhooks/system-bus-worker:latest -f packages/system-bus/Dockerfile .
Push
gh auth token | docker login ghcr.io -u $(gh api user -q .login) --password-stdin docker push "$IMAGE" docker push ghcr.io/joelhooks/system-bus-worker:latest
Deploy
kubectl -n joelclaw set image deployment/system-bus-worker system-bus-worker="$IMAGE" kubectl -n joelclaw rollout status deployment/system-bus-worker --timeout=180s
Verify
joelclaw refresh joelclaw status
Log
slog write --action deploy --tool system-bus-worker --detail "deployed ${IMAGE}" --reason "sync worker changes"
Talon Rebuild (Adding Secrets / Changing Worker Supervision)
Talon is a Rust binary that supervises the worker process. It leases secrets from
agent-secrets and injects them as env vars. When adding new webhook secrets or changing supervision behavior:
# 1. Add secret to agent-secrets secrets add my_new_secret --value "the-secret-value" # 2. Update Talon source — add mapping to SECRET_MAPPINGS array # File: ~/Code/joelhooks/joelclaw/infra/talon/src/worker.rs # ("my_new_secret", "MY_NEW_SECRET_ENV_VAR"), # 3. Recompile (fast — ~3s incremental) export PATH="$HOME/.cargo/bin:$PATH" cd ~/Code/joelhooks/joelclaw/infra/talon cargo build --release # 4. Install + re-sign (macOS kills unsigned binaries) cp target/release/talon ~/.local/bin/talon codesign -fs - ~/.local/bin/talon # 5. Restart via launchd launchctl bootout gui/$(id -u)/com.joel.talon sleep 1 launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.joel.talon.plist sleep 12 # 6. Verify curl -s http://localhost:3111/ | jq '.status' curl -X PUT http://localhost:3111/api/inngest # Force function sync
Current SECRET_MAPPINGS (worker.rs)
| Secret Name | Env Var |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
Talon Key Paths
| What | Path |
|---|---|
| Binary | |
| Source | |
| LaunchAgent plist | |
| Logs | / |
| ADR | |
Gotcha: codesign -fs -
is required
codesign -fs -After
cargo build, the binary has adhoc linker-signed signature. macOS launchd may SIGKILL:9 it. Re-signing with codesign -fs - fixes this.
Common Gotchas
| Problem | Cause | Fix |
|---|---|---|
in pod | Built for amd64, not arm64 | Rebuild with |
GHCR push fails with on blob HEAD | missing package scopes | Use via or export with package scope |
error | Docker config has credsStore | Script uses temp config dir — if manual, remove |
| Function missing after deploy | Not in index file | Add to both AND |
| Function still missing | Stale Inngest registration | then check again |
| "Unable to reach SDK URL" | Worker pod not ready | Wait for rollout, then |
| Runs stuck after deploy | on the function | Set minimum (ADR-0156) |
| Stale app registrations | Multiple apps registered | Delete old registrations in Inngest dashboard () |
Key Paths
| What | Path |
|---|---|
| Publish script | |
| Dockerfile | |
| k8s manifest | |
| Host function index | |
| Cluster function index | |
| Worker entry | |
| GH Actions workflow | |
| ADR-0156 | |