Claude-skill-registry installation-orchestrator

Expert management of install.sh (2000+ lines). Use for installation troubleshooting, idempotency checks, secret generation, volume migration, 11 services startup order (including heuristics and semantic), and user onboarding.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/installation-orchestrator" ~/.claude/skills/majiayu000-claude-skill-registry-installation-orchestrator && rm -rf "$T"
manifest: skills/data/installation-orchestrator/SKILL.md
source content

Installation Orchestrator (v2.0.0)

Overview

Expert management of install.sh (2000+ lines bash) including idempotency, secret generation, volume migration, 11-service orchestration with 3-branch detection startup, and troubleshooting installation failures.

When to Use This Skill

  • Troubleshooting installation failures
  • Managing install.sh modifications
  • Secret generation and validation
  • Volume migration between versions
  • Idempotency checks
  • User onboarding flow
  • 3-branch service startup order (v2.0.0)

v2.0.0 Architecture

11 Docker Services

Core Services:
  - clickhouse (data storage, port 8123)
  - grafana (monitoring, port 3001)
  - n8n (workflow engine, port 5678)

3-Branch Detection (v2.0.0):
  - heuristics-service (Branch A, port 5005, 30% weight)
  - semantic-service (Branch B, port 5006, 35% weight)
  - prompt-guard-api (Branch C, port 8000, 35% weight)

PII Detection:
  - presidio-pii-api (port 5001)
  - language-detector (port 5002)

Web Interface:
  - web-ui-backend (port 8787)
  - web-ui-frontend (via proxy)
  - proxy (Caddy, port 80)

Installation Flow

1. Pre-flight Checks

- Docker installed and running
- Ports available (80, 5678, 8123, 3001, 8787, 5005, 5006, 8000)
- Disk space >10GB
- No existing .install-state.lock

2. Secret Generation

CLICKHOUSE_PASSWORD=$(openssl rand -base64 32)
GF_SECURITY_ADMIN_PASSWORD=$(openssl rand -base64 32)
SESSION_SECRET=$(openssl rand -base64 64)
JWT_SECRET=$(openssl rand -base64 32)
WEB_UI_ADMIN_PASSWORD=$(openssl rand -base64 24)

3. Service Startup Order (v2.0.0)

Phase 1 - Data Layer:
  1. clickhouse (data storage)
  2. grafana (monitoring)

Phase 2 - Detection Core:
  3. n8n (workflow engine)
  4. heuristics-service (Branch A - fast pattern matching)
  5. semantic-service (Branch B - embedding analysis)
  6. prompt-guard-api (Branch C - LLM validation, optional)

Phase 3 - PII Services:
  7. presidio-pii-api (dual-language PII)
  8. language-detector (hybrid detection)

Phase 4 - Web Interface:
  9. web-ui-backend (Express API)
  10. web-ui-frontend (React app)
  11. proxy (Caddy reverse proxy)

4. Health Checks (v2.0.0)

# Core services
for service in clickhouse grafana n8n web-ui; do
  wait_for_health $service 120s || fail
done

# 3-Branch detection services (v2.0.0)
wait_for_health heuristics-service 60s || warn "Branch A degraded"
wait_for_health semantic-service 90s || warn "Branch B degraded"
wait_for_health prompt-guard-api 120s || warn "Branch C degraded"

# PII services
wait_for_health presidio-pii-api 90s || warn "PII detection degraded"
wait_for_health language-detector 30s || warn "Language detection degraded"

5. Idempotency Lock

touch .install-state.lock
echo "INSTALL_DATE=$(date)" >> .install-state.lock
echo "VERSION=2.0.0" >> .install-state.lock
echo "SERVICES=11" >> .install-state.lock

Common Tasks

Task 1: Fresh Installation

./install.sh

# Prompts:
# 1. Generate secrets? [Y/n]
# 2. Set admin password (or auto-generate)
# 3. Delete existing vigil_data? [y/N]
# 4. Download Llama model? [Y/n] (for Branch C)

Task 2: Troubleshoot Failed Installation

# Check state
cat .install-state.lock

# View logs
docker-compose logs --tail=100

# Check 3-branch services specifically (v2.0.0)
docker logs vigil-heuristics-service --tail 50
docker logs vigil-semantic-service --tail 50
docker logs vigil-prompt-guard-api --tail 50

# Retry specific service
docker-compose up -d heuristics-service
docker logs vigil-heuristics-service

# Clean slate
rm .install-state.lock .env vigil_data -rf
./install.sh

Task 3: Validate Environment

./scripts/validate-env.sh

# Checks:
# - All required env vars present
# - Passwords meet requirements (min 8 chars)
# - Ports not in use (including 5005, 5006 for branches)
# - Docker network exists (vigil-net)
# - 11 services defined in docker-compose.yml

Task 4: Migrate Volumes (v1.x → v2.0.0)

# Backup old data
docker run --rm -v vigil_clickhouse_data:/data -v $(pwd):/backup alpine \
  tar czf /backup/clickhouse-v1.x-$(date +%Y%m%d).tar.gz /data

# Run v2.0.0 migration SQL (adds branch columns)
docker exec vigil-clickhouse clickhouse-client < services/monitoring/sql/migrations/v2.0.0.sql

# Verify migration (branch columns added)
docker exec vigil-clickhouse clickhouse-client -q "
  DESCRIBE n8n_logs.events_processed
" | grep -E "branch_[abc]_score|arbiter_decision"

# Expected output:
# branch_a_score    Float32
# branch_b_score    Float32
# branch_c_score    Float32
# arbiter_decision  String

Task 5: Verify 3-Branch Services (v2.0.0)

#!/bin/bash
# scripts/verify-branches.sh

echo "🔍 Verifying 3-Branch Detection Services..."

# Branch A: Heuristics
BRANCH_A=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:5005/health)
if [ "$BRANCH_A" == "200" ]; then
  echo "✅ Branch A (Heuristics): Healthy"
else
  echo "❌ Branch A (Heuristics): Down (HTTP $BRANCH_A)"
fi

# Branch B: Semantic
BRANCH_B=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:5006/health)
if [ "$BRANCH_B" == "200" ]; then
  echo "✅ Branch B (Semantic): Healthy"
else
  echo "❌ Branch B (Semantic): Down (HTTP $BRANCH_B)"
fi

# Branch C: LLM Guard
BRANCH_C=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/health)
if [ "$BRANCH_C" == "200" ]; then
  echo "✅ Branch C (LLM Guard): Healthy"
else
  echo "⚠️  Branch C (LLM Guard): Down (HTTP $BRANCH_C) - Optional"
fi

echo ""
echo "3-Branch Status: $([ "$BRANCH_A" == "200" ] && [ "$BRANCH_B" == "200" ] && echo "OPERATIONAL" || echo "DEGRADED")"

Troubleshooting

Issue: Port already in use

# Check all v2.0.0 ports
for port in 80 5678 8123 3001 8787 5001 5002 5005 5006 8000; do
  lsof -i :$port && echo "Port $port in use"
done

# Kill specific process
kill -9 $(lsof -t -i:5005)

Issue: Branch service won't start

# Check heuristics-service
docker logs vigil-heuristics-service --tail 100
# Common issue: missing patterns directory
# Fix: docker-compose build heuristics-service

# Check semantic-service
docker logs vigil-semantic-service --tail 100
# Common issue: model download failed
# Fix: docker exec vigil-semantic-service python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"

Issue: ClickHouse won't start

# Check volume permissions
ls -la vigil_data/clickhouse/

# Reset volume
docker-compose down -v
docker volume rm vigil_clickhouse_data
./install.sh

Issue: Secrets not loaded

# Verify .env file
cat .env | grep -E "(CLICKHOUSE|JWT|SESSION)_"

# Reload
docker-compose down
docker-compose up -d

Issue: Semantic service model download fails

# Pre-download model (run before install)
docker run --rm -v vigil_semantic_models:/models python:3.11-slim bash -c "
  pip install sentence-transformers &&
  python -c \"from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2', cache_folder='/models')\"
"

# Restart semantic service
docker-compose restart semantic-service

Port Reference (v2.0.0)

PortServiceDescription
80proxyCaddy reverse proxy (main entry)
3001grafanaMonitoring dashboard
5001presidio-pii-apiDual-language PII detection
5002language-detectorHybrid language detection
5005heuristics-serviceBranch A (30% weight)
5006semantic-serviceBranch B (35% weight)
5678n8nWorkflow engine
8000prompt-guard-apiBranch C (35% weight)
8123clickhouseAnalytics database
8787web-ui-backendConfiguration API

Quick Reference

# Fresh install
./install.sh

# Status check (all 11 services)
./scripts/status.sh

# Verify 3-branch detection (v2.0.0)
./scripts/verify-branches.sh

# View logs
./scripts/logs.sh

# Restart
./scripts/restart.sh

# Uninstall
docker-compose down -v
rm -rf vigil_data .env .install-state.lock

Integration Points

With docker-vigil-orchestration:

when: Service won't start
action:
  1. Check vigil-net network connectivity
  2. Verify service dependencies
  3. Check port conflicts
  4. Review Docker resource limits

With clickhouse-grafana-monitoring:

when: Migration to v2.0.0
action:
  1. Run SQL migration script
  2. Verify branch columns exist
  3. Test ClickHouse queries
  4. Update Grafana dashboards

Last Updated: 2025-12-09 Install Script: 2000+ lines bash Services: 11 containers (v2.0.0) 3-Branch Ports: 5005 (Heuristics), 5006 (Semantic), 8000 (LLM Guard)

Version History

  • v2.0.0 (Current): 11 services, 3-branch detection startup, migration scripts
  • v1.6.11: 9 services, sequential detection