Learn-skills.dev observability-control
Manage observability stack lifecycle (start, stop, backup, restore, upgrade). Use when controlling the LGTM stack for Claude Code monitoring.
install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/adaptationio/skrillz/observability-control" ~/.claude/skills/neversight-learn-skills-dev-observability-control && rm -rf "$T"
manifest:
data/skills-md/adaptationio/skrillz/observability-control/SKILL.mdsource content
Observability Control
Manage the lifecycle of the observability stack for Claude Code telemetry.
Stack Locations
| Environment | Docker Compose Path |
|---|---|
| Primary Stack | |
| Skill-based Stack | |
Components
| Service | Port | Purpose |
|---|---|---|
| Grafana | 3000 | Dashboards and visualization |
| Prometheus | 9090 | Metrics storage |
| Loki | 3100 | Log aggregation |
| Tempo | 3200 | Distributed tracing |
| OTEL Collector | 4317/4318 | Telemetry receiver |
| Promtail | - | Log shipping |
Operations
start
startStart observability stack.
docker compose -f /mnt/c/data/github/botaniqal-medtech/botaniqal-medtech/observability/docker-compose.yml up -d
stop
stopStop stack gracefully (preserves data).
docker compose -f /mnt/c/data/github/botaniqal-medtech/botaniqal-medtech/observability/docker-compose.yml down
restart [service]
restart [service]Restart specific service or all services.
# Restart all docker compose -f /path/docker-compose.yml restart # Restart specific docker restart loki
status
statusHealth check all components.
docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "(otel|loki|grafana|prometheus|tempo)"
Output: Running services, health status.
health
healthVerify service endpoints.
curl -s http://localhost:3000/api/health # Grafana curl -s http://localhost:9090/-/healthy # Prometheus curl -s http://localhost:3100/ready # Loki curl -s http://localhost:3200/ready # Tempo
backup
backupExport dashboards and configurations.
# Backup dashboards curl -s http://localhost:3000/api/search -u admin:admin | \ jq -r '.[].uid' | \ xargs -I {} curl -s http://localhost:3000/api/dashboards/uid/{} -u admin:admin > backup/dashboards.json
Output:
.observability/backups/YYYYMMDD_HHMMSS/
restore <backup-path>
restore <backup-path>Restore from backup.
curl -X POST http://localhost:3000/api/dashboards/db \ -H "Content-Type: application/json" \ -u admin:admin \ -d @backup/dashboards.json
logs [service]
logs [service]View logs from stack components.
docker logs loki --tail 100 docker logs otel-collector --tail 100 docker logs grafana --tail 100
fix-permissions
fix-permissionsFix volume permission issues (common with Tempo).
docker volume rm observability_tempo-data docker volume create observability_tempo-data docker run --rm -v observability_tempo-data:/tempo alpine chown -R 10001:10001 /tempo docker restart tempo
Quick Commands
# Check all services status docker ps | grep -E "(otel|loki|grafana|prometheus|tempo|promtail)" # View recent logs for issues docker logs otel-collector --tail 50 2>&1 | grep -i error # Test OTLP endpoint curl -v http://localhost:4317 # Query Loki for recent data curl -s "http://localhost:3100/loki/api/v1/labels" # List Grafana dashboards curl -s http://localhost:3000/api/search -u admin:admin | python3 -c "import sys,json; [print(d['title']) for d in json.load(sys.stdin)]"
Troubleshooting
OTEL Collector Unhealthy
docker logs otel-collector --tail 30 # Common fix: Ensure Prometheus has --web.enable-remote-write-receiver
Loki Unhealthy
docker logs loki --tail 30 # Common fix: Disable frontend_worker for single-node mode
Tempo Permission Denied
# Fix volume permissions docker volume rm observability_tempo-data docker volume create observability_tempo-data docker run --rm -v observability_tempo-data:/tempo alpine chown -R 10001:10001 /tempo docker restart tempo
No Data in Grafana
- Check telemetry env vars:
env | grep OTEL - Check hooks configured:
cat .claude/settings.json - Verify Loki receiving:
curl "http://localhost:3100/loki/api/v1/labels"
Access Points
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin/admin |
| Prometheus | http://localhost:9090 | - |
| Loki | http://localhost:3100 | - |
| OTLP gRPC | localhost:4317 | - |
| OTLP HTTP | localhost:4318 | - |
Scripts
- Start observability stackscripts/start-stack.sh
- Stop stack gracefullyscripts/stop-stack.sh
- Check all service healthscripts/health-check.sh
- Export Grafana dashboardsscripts/backup-dashboards.sh
- Import dashboards from backupscripts/restore-dashboards.sh