Learn-skills.dev observability-stack-setup
Automated LGTM + Alloy observability stack deployment using Docker Compose. Use when setting up Claude Code observability infrastructure locally.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/adaptationio/skrillz/observability-stack-setup" ~/.claude/skills/neversight-learn-skills-dev-observability-stack-setup && rm -rf "$T"
data/skills-md/adaptationio/skrillz/observability-stack-setup/SKILL.mdObservability Stack Setup
Automated deployment of the complete LGTM (Loki, Grafana, Tempo, Mimir/Prometheus) + Alloy observability stack for Claude Code monitoring.
When to Use
- Setting up Claude Code observability for the first time
- Deploying local development observability infrastructure
- Need to monitor Claude Code operations (tool calls, costs, errors, performance)
- Want pre-configured dashboards for Claude Code analysis
What This Skill Does
Automatically deploys and configures:
- Grafana Alloy: OTEL collector (receives telemetry from Claude Code)
- Loki: Log aggregation (stores all Claude Code logs)
- Tempo: Distributed tracing (tracks tool calls, API requests)
- Prometheus: Metrics storage (token usage, costs, performance)
- Grafana: Visualization with pre-built Claude Code dashboards
Quick Start
Prerequisites
# Verify Docker installed docker --version # Requires ≥ 20.10 # Verify Docker Compose installed docker compose version # Requires ≥ 2.0
Deploy Stack
Invoke this skill and it will:
- Create
directory structure.observability/ - Generate all configuration files
- Start the stack with
docker compose up -d - Import Claude Code dashboards
- Verify all services healthy
- Output access URLs and next steps
Estimated time: 5-10 minutes
What Gets Deployed
Services
| Service | Port | Purpose |
|---|---|---|
| Grafana | 3000 | Dashboards and visualization |
| Grafana Alloy | 4317 (gRPC), 4318 (HTTP), 12345 (metrics) | OTLP receiver |
| Loki | 3100 | Log storage and querying |
| Tempo | 3200 | Trace storage and querying |
| Prometheus | 9090 | Metrics storage and querying |
Volumes
All data persisted in
.observability/volumes/:
- Alloy configuration and statealloy-data/
- Log storageloki-data/
- Trace storagetempo-data/
- Metrics storageprometheus-data/
- Dashboards, datasources, settingsgrafana-data/
Pre-built Dashboards
-
Claude Code Overview
- Session count, duration, active time
- Token usage and cost trends
- Error rates by tool
- Top operations
-
Tool Performance Matrix
- Call counts per tool
- Average/P95/P99 latency
- Success/failure rates
- Most common errors
-
Cost Analysis
- Daily/weekly/monthly costs
- Token usage breakdown
- Budget tracking
- Cost projections
-
Error Tracking
- Error timeline
- Error types distribution
- Affected tools
- Recent error details
-
Session Analysis
- Session duration distribution
- Sessions per day/week
- Conversation depth
- Active vs idle time
Workflow
Step 1: Verify Prerequisites
Checks Docker and Docker Compose installed with compatible versions.
Step 2: Create Directory Structure
.observability/ ├── docker-compose.yml # Main stack definition ├── alloy/ │ └── config.yaml # OTLP receiver + exporters config ├── grafana/ │ ├── datasources/ │ │ ├── loki.yml # Loki datasource │ │ ├── prometheus.yml # Prometheus datasource │ │ └── tempo.yml # Tempo datasource │ └── dashboards/ │ ├── claude-code-overview.json │ ├── tool-performance.json │ ├── cost-analysis.json │ ├── error-tracking.json │ └── session-analysis.json └── volumes/ # Persistent data ├── alloy/ ├── loki/ ├── tempo/ ├── prometheus/ └── grafana/
Step 3: Generate Configurations
Creates all configuration files from templates (see
references/ for details).
Step 4: Start Stack
docker compose -f .observability/docker-compose.yml up -d
Step 5: Health Checks
Verifies each service:
- Alloy:
http://localhost:12345/metrics - Loki:
http://localhost:3100/ready - Tempo:
http://localhost:3200/ready - Prometheus:
http://localhost:9090/-/healthy - Grafana:
http://localhost:3000/api/health
Step 6: Import Dashboards
Uses Grafana API to import all pre-built dashboards.
Step 7: Output Success
Displays:
- Access URLs for all services
- Default credentials (admin/admin)
- OTLP endpoint for Claude Code configuration
- Next step: Enable Claude Code telemetry
Configuration Details
Grafana Alloy (OTLP Collector)
Receives telemetry from Claude Code via OTLP protocol:
- gRPC endpoint:
localhost:4317 - HTTP endpoint:
localhost:4318
Routes telemetry to backends:
- Logs → Loki
- Traces → Tempo
- Metrics → Prometheus
Retention Policies
Default: 365 days (configurable in docker-compose.yml)
- Loki: 365 days (
)-ingester.max-chunk-age=365d - Tempo: 365 days (
)-storage.trace.local.path retention - Prometheus: 365 days (
)--storage.tsdb.retention.time=365d
Privacy Settings
Full logging enabled (no redactions):
- User prompts: Full content logged
- File paths: Complete paths visible
- Tool execution: Full command details
- API requests: All parameters visible
This configuration assumes observability for personal use with full data access.
Troubleshooting
Port Already in Use
If ports 3000, 3100, 3200, 4317, 4318, 9090, or 12345 are in use:
Option 1: Stop conflicting services
# Find process using port sudo lsof -i :3000 # Stop the process sudo kill <PID>
Option 2: Modify ports in
docker-compose.yml
Services Not Starting
Check logs:
docker compose -f .observability/docker-compose.yml logs [service_name]
Common issues:
- Insufficient disk space (check with
)df -h - Insufficient memory (Alloy needs ~512MB, others ~256MB each)
- Permission issues on volume directories
Dashboards Not Appearing
Manually import:
# Copy dashboard JSON to container docker cp .observability/grafana/dashboards/claude-code-overview.json \ observability-grafana-1:/tmp/ # Import via API curl -X POST http://localhost:3000/api/dashboards/db \ -H "Content-Type: application/json" \ -u admin:admin \ -d @.observability/grafana/dashboards/claude-code-overview.json
Next Steps
After stack is running:
- Enable Claude Code telemetry: Use
skillclaude-code-telemetry-enable - Use Claude Code: Run tools, read files, execute commands
- View dashboards: Open http://localhost:3000, explore pre-built dashboards
- Verify data flowing: Check Grafana → Explore → Loki/Prometheus/Tempo
Stopping the Stack
Graceful shutdown (preserves data):
docker compose -f .observability/docker-compose.yml down
Complete removal (deletes data):
docker compose -f .observability/docker-compose.yml down -v
References
- Complete Docker Compose configurationreferences/docker-compose-full.yml
- Grafana Alloy OTLP receiver configurationreferences/alloy-config.yaml
- Datasource YAML configurationsreferences/grafana-datasources/
- Pre-built dashboard JSON filesreferences/dashboards/
- Common issues and solutionsreferences/troubleshooting.md
Scripts
- Main setup script (automated deployment)scripts/setup-stack.sh
- Health check all servicesscripts/verify-health.sh
- Import Grafana dashboardsscripts/import-dashboards.sh
Version Information
Component Versions (latest as of 2025-11-22):
- Grafana: 11.5.2
- Grafana Alloy: 1.5.0
- Loki: 3.4.2
- Tempo: 2.7.1
- Prometheus: 2.55.0
All versions pinned in docker-compose.yml for reproducibility.