Learn-skills.dev observability-control

Manage observability stack lifecycle (start, stop, backup, restore, upgrade). Use when controlling the LGTM stack for Claude Code monitoring.

install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/adaptationio/skrillz/observability-control" ~/.claude/skills/neversight-learn-skills-dev-observability-control && rm -rf "$T"
manifest: data/skills-md/adaptationio/skrillz/observability-control/SKILL.md
source content

Observability Control

Manage the lifecycle of the observability stack for Claude Code telemetry.

Stack Locations

EnvironmentDocker Compose Path
Primary Stack
/mnt/c/data/github/botaniqal-medtech/botaniqal-medtech/observability/docker-compose.yml
Skill-based Stack
/mnt/c/data/github/.observability/docker-compose.yml

Components

ServicePortPurpose
Grafana3000Dashboards and visualization
Prometheus9090Metrics storage
Loki3100Log aggregation
Tempo3200Distributed tracing
OTEL Collector4317/4318Telemetry receiver
Promtail-Log shipping

Operations

start

Start observability stack.

docker compose -f /mnt/c/data/github/botaniqal-medtech/botaniqal-medtech/observability/docker-compose.yml up -d

stop

Stop stack gracefully (preserves data).

docker compose -f /mnt/c/data/github/botaniqal-medtech/botaniqal-medtech/observability/docker-compose.yml down

restart [service]

Restart specific service or all services.

# Restart all
docker compose -f /path/docker-compose.yml restart

# Restart specific
docker restart loki

status

Health check all components.

docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "(otel|loki|grafana|prometheus|tempo)"

Output: Running services, health status.

health

Verify service endpoints.

curl -s http://localhost:3000/api/health  # Grafana
curl -s http://localhost:9090/-/healthy   # Prometheus
curl -s http://localhost:3100/ready       # Loki
curl -s http://localhost:3200/ready       # Tempo

backup

Export dashboards and configurations.

# Backup dashboards
curl -s http://localhost:3000/api/search -u admin:admin | \
  jq -r '.[].uid' | \
  xargs -I {} curl -s http://localhost:3000/api/dashboards/uid/{} -u admin:admin > backup/dashboards.json

Output:

.observability/backups/YYYYMMDD_HHMMSS/

restore <backup-path>

Restore from backup.

curl -X POST http://localhost:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -u admin:admin \
  -d @backup/dashboards.json

logs [service]

View logs from stack components.

docker logs loki --tail 100
docker logs otel-collector --tail 100
docker logs grafana --tail 100

fix-permissions

Fix volume permission issues (common with Tempo).

docker volume rm observability_tempo-data
docker volume create observability_tempo-data
docker run --rm -v observability_tempo-data:/tempo alpine chown -R 10001:10001 /tempo
docker restart tempo

Quick Commands

# Check all services status
docker ps | grep -E "(otel|loki|grafana|prometheus|tempo|promtail)"

# View recent logs for issues
docker logs otel-collector --tail 50 2>&1 | grep -i error

# Test OTLP endpoint
curl -v http://localhost:4317

# Query Loki for recent data
curl -s "http://localhost:3100/loki/api/v1/labels"

# List Grafana dashboards
curl -s http://localhost:3000/api/search -u admin:admin | python3 -c "import sys,json; [print(d['title']) for d in json.load(sys.stdin)]"

Troubleshooting

OTEL Collector Unhealthy

docker logs otel-collector --tail 30
# Common fix: Ensure Prometheus has --web.enable-remote-write-receiver

Loki Unhealthy

docker logs loki --tail 30
# Common fix: Disable frontend_worker for single-node mode

Tempo Permission Denied

# Fix volume permissions
docker volume rm observability_tempo-data
docker volume create observability_tempo-data
docker run --rm -v observability_tempo-data:/tempo alpine chown -R 10001:10001 /tempo
docker restart tempo

No Data in Grafana

  1. Check telemetry env vars:
    env | grep OTEL
  2. Check hooks configured:
    cat .claude/settings.json
  3. Verify Loki receiving:
    curl "http://localhost:3100/loki/api/v1/labels"

Access Points

ServiceURLCredentials
Grafanahttp://localhost:3000admin/admin
Prometheushttp://localhost:9090-
Lokihttp://localhost:3100-
OTLP gRPClocalhost:4317-
OTLP HTTPlocalhost:4318-

Scripts

  • scripts/start-stack.sh
    - Start observability stack
  • scripts/stop-stack.sh
    - Stop stack gracefully
  • scripts/health-check.sh
    - Check all service health
  • scripts/backup-dashboards.sh
    - Export Grafana dashboards
  • scripts/restore-dashboards.sh
    - Import dashboards from backup