Learn-skills.dev observability-analyzer
Query and analyze Claude Code observability data (metrics, logs, traces). Use when analyzing performance, costs, errors, tool usage, sessions, conversations, or subagents.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/adaptationio/skrillz/observability-analyzer" ~/.claude/skills/neversight-learn-skills-dev-observability-analyzer && rm -rf "$T"
data/skills-md/adaptationio/skrillz/observability-analyzer/SKILL.mdObservability Analyzer
Query Claude Code telemetry and generate insights from metrics, logs, and traces. Works with both default OTEL telemetry and enhanced hook-based telemetry.
Data Sources
| Source | Job Name | Contains |
|---|---|---|
| Default OTEL | | API metrics, token usage, costs |
| Enhanced Hooks | | Sessions, conversations, tools, subagents |
Operations
query-metrics <promql>
query-metrics <promql>Execute PromQL query against Prometheus.
query-metrics 'sum(claude_code_token_usage)[7d]'
query-logs <logql>
query-logs <logql>Execute LogQL query against Loki.
query-logs '{job="claude_code_enhanced", event_type="tool_call"} | json' --since 24h
analyze-errors
analyze-errorsDetect and group error patterns from enhanced telemetry.
{job="claude_code_enhanced", event_type="tool_result", status="error"} | json
Output: Error types, frequencies, affected tools, recommendations.
analyze-performance
analyze-performanceIdentify slow operations and response sizes.
{job="claude_code_enhanced", event_type="tool_result"} | json | response_length > 50000
Output: Large responses, estimated token costs, slow patterns.
analyze-costs
analyze-costsCalculate token usage from content size estimates.
sum by (repo) (sum_over_time({job="claude_code_enhanced", event_type="context_utilization"} | json | unwrap estimated_session_tokens [24h]))
Output: Token estimates by repo, session costs, projections.
analyze-tools
analyze-toolsTool usage statistics and sequences.
sum by (tool) (count_over_time({job="claude_code_enhanced", event_type="tool_call"} | json [24h]))
Output: Call frequency, success rates, tool sequences, common patterns.
analyze-sessions
analyze-sessionsSession lifecycle and duration analytics.
{job="claude_code_enhanced", event_type="session_end"} | json
Output: Session durations, turn counts, tools per session, termination reasons.
analyze-conversations
analyze-conversationsConversation and prompt analytics.
sum by (pattern) (count_over_time({job="claude_code_enhanced", event_type="user_prompt"} | json [24h]))
Output: Prompt patterns (question/debugging/creation/ultrathink), turn distribution.
analyze-subagents
analyze-subagentsSubagent/Task tool usage.
{job="claude_code_enhanced", event_type="tool_call", tool="Task"} | json
Output: Subagent types used, completion rates, parallel execution patterns.
analyze-skills
analyze-skillsSkill invocation analytics.
sum by (skill_name) (count_over_time({job="claude_code_enhanced", event_type="skill_usage"} | json [24h]))
Output: Most used skills, skill usage by repo, trends.
analyze-context
analyze-contextContext window utilization.
{job="claude_code_enhanced", event_type="context_utilization"} | json | context_percentage > 50
Output: High utilization sessions, compaction events, token efficiency.
analyze-repos
analyze-reposRepository/project activity.
sum by (repo, tool) (count_over_time({job="claude_code_enhanced", event_type="tool_call"} | json [24h]))
Output: Activity per repo, tool usage by project, branch patterns.
generate-report
generate-reportComprehensive analysis report (all dimensions). Output: Markdown report with errors, performance, costs, sessions, conversations, tools.
Key Queries
Enhanced Telemetry (Loki)
# All events (last hour) {job="claude_code_enhanced"} | json # Session analytics {job="claude_code_enhanced", event_type="session_end"} | json | duration_seconds > 300 # Tool errors {job="claude_code_enhanced", event_type="tool_result", status="error"} | json # High context usage {job="claude_code_enhanced", event_type="context_utilization"} | json | context_percentage > 75 # Subagent spawns {job="claude_code_enhanced", event_type="tool_call", tool="Task"} | json # Skill invocations {job="claude_code_enhanced", event_type="skill_usage"} | json # Prompt patterns {job="claude_code_enhanced", event_type="user_prompt"} | json | pattern="ultrathink" # Tool sequences {job="claude_code_enhanced", event_type="tool_call"} | json | line_format "{{.tool_name}} → {{.previous_tool}}" # Context compaction {job="claude_code_enhanced", event_type="context_compact"} | json # Permission requests {job="claude_code_enhanced", event_type="permission_request"} | json
Default OTEL (Prometheus)
# Total token usage (7 days) sum(increase(claude_code_token_usage[7d])) # Error rate by tool sum by (tool_name) (rate(claude_code_tool_result{status="failure"}[1h])) # P95 tool latency histogram_quantile(0.95, claude_code_tool_duration_bucket) # Daily costs sum(increase(claude_code_cost_usage[24h]))
Event Types Reference
| Event Type | Description | Key Fields |
|---|---|---|
| Session initialization | source, permission_mode |
| Session termination | duration_seconds, turn_count, tools_used |
| User message submitted | pattern, prompt_length, estimated_tokens |
| Tool invocation | tool_name, tool_details, sequence_position |
| Tool completion | status, response_length, is_error |
| Skill invoked | skill_name |
| Token estimate | estimated_session_tokens, context_percentage |
| Compaction event | trigger (manual/auto) |
| Task agent finished | total_subagents |
| Permission dialog | notification_type |
| System notification | notification_type |
Grafana Dashboards
- Claude Code Overview - High-level metrics
- Tool Performance - Tool latencies and success rates
- Cost Analysis - Token usage and costs
- Error Tracking - Error patterns and trends
- Session Analytics - Session-level insights
- Enhanced Analytics - Model/skill/context/repo tracking
- Deep Analytics - Comprehensive conversation and tool analysis
Access: http://localhost:3000 (admin/admin)
Scripts
- PromQL query helperscripts/query-prometheus.sh
- LogQL query helperscripts/query-loki.sh
- Error analysis automationscripts/analyze-errors.sh
- Session analyticsscripts/analyze-sessions.sh
- Full analysis reportscripts/generate-report.sh