Claude-skill-registry check-logs
Query and analyze logs using Grafana Loki for the Kagenti platform, search for errors, and investigate issues
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/check-logs" ~/.claude/skills/majiayu000-claude-skill-registry-check-logs && rm -rf "$T"
manifest:
skills/data/check-logs/SKILL.mdsource content
Check Logs Skill
This skill helps you query and analyze logs from the Kagenti platform using Loki via Grafana.
When to Use
- User asks "show me logs for X"
- Investigating errors or failures
- After deployments to check for issues
- Debugging pod crashes or restarts
- Analyzing application behavior
What This Skill Does
- Query Logs: Search logs by namespace, pod, container, or log level
- Error Detection: Find errors and warnings in logs
- Log Aggregation: View logs across multiple pods
- Time-based Queries: Query logs for specific time ranges
- Log Patterns: Detect common issues from log patterns
Examples
Query Logs in Grafana UI
Access Grafana: https://grafana.localtest.me:9443 Navigate: Explore → Select Loki datasource
Log Dashboard: https://grafana.localtest.me:9443/d/loki-logs/loki-logs
Query Examples in Grafana Explore:
# All logs from observability namespace {kubernetes_namespace_name="observability"} # Logs from specific pod {kubernetes_pod_name=~"prometheus.*"} # Logs with errors {kubernetes_namespace_name="observability"} |= "error" # Logs from last 5 minutes with level=error {kubernetes_namespace_name="observability"} | json | level="error" # Count errors per namespace sum by (kubernetes_namespace_name) (count_over_time({kubernetes_namespace_name=~".+"} |= "error" [5m]))
Query Logs via CLI (Promtail/Loki)
# Query Loki for recent errors in observability namespace kubectl exec -n observability deployment/grafana -- \ curl -s -G 'http://loki.observability.svc:3100/loki/api/v1/query_range' \ --data-urlencode 'query={kubernetes_namespace_name="observability"} |= "error"' \ --data-urlencode 'limit=100' \ --data-urlencode 'start='$(date -u -v-5M +%s)000000000 \ --data-urlencode 'end='$(date -u +%s)000000000 | python3 -m json.tool
Check Logs for Specific Pod
# Get logs for a specific pod using kubectl kubectl logs -n observability deployment/prometheus --tail=100 # Get logs from previous container (if crashed) kubectl logs -n observability pod/prometheus-xxx --previous # Follow logs in real-time kubectl logs -n observability deployment/grafana -f --tail=20 # Get logs from specific container in pod kubectl logs -n observability pod/alertmanager-xxx -c alertmanager --tail=50
Search for Errors Across Platform
# Get recent error logs from all namespaces for ns in observability keycloak oauth2-proxy istio-system kiali-system; do echo "=== Errors in $ns ===" kubectl logs -n $ns --all-containers=true --tail=50 2>&1 | grep -i "error\|fatal\|exception" | head -5 echo done
Check Logs for Failed Pods
# Find pods with issues and check their logs kubectl get pods -A | grep -E "Error|CrashLoop|ImagePull" | while read ns pod rest; do echo "=== Logs for $pod in $ns ===" kubectl logs -n $ns $pod --tail=30 --previous 2>/dev/null || kubectl logs -n $ns $pod --tail=30 echo done
Query Log Volume by Namespace
# In Grafana Explore (Loki datasource) sum by (kubernetes_namespace_name) ( rate({kubernetes_namespace_name=~".+"}[5m]) )
Search for Specific Error Pattern
# Find connection errors {kubernetes_namespace_name="observability"} |~ "connection (refused|timeout|reset)" # Find authentication failures {kubernetes_namespace_name=~"keycloak|oauth2-proxy"} |~ "auth.*fail|unauthorized|forbidden" # Find OOM kills {kubernetes_namespace_name=~".+"} |~ "OOM|out of memory|oom.*kill"
Log Levels and Filtering
Standard Log Levels
- error: Critical errors requiring attention
- warn/warning: Warnings that may indicate issues
- info: Informational messages
- debug: Detailed debugging information
- trace: Very detailed trace information
Filter by Log Level
# Only errors {kubernetes_namespace_name="observability"} | json | level="error" # Errors and warnings {kubernetes_namespace_name="observability"} | json | level=~"error|warn" # Everything except debug {kubernetes_namespace_name="observability"} | json | level!="debug"
Common Log Queries for Platform Components
Prometheus Logs
kubectl logs -n observability deployment/prometheus --tail=100 # Check for scrape errors kubectl logs -n observability deployment/prometheus | grep -i "scrape\|error"
Grafana Logs
kubectl logs -n observability deployment/grafana --tail=100 # Check for datasource errors kubectl logs -n observability deployment/grafana | grep -i "datasource\|error"
Keycloak Logs
kubectl logs -n keycloak statefulset/keycloak --tail=100 # Check for authentication errors kubectl logs -n keycloak statefulset/keycloak | grep -i "auth\|login\|error"
Istio Proxy (Sidecar) Logs
# Check sidecar logs for a specific pod POD=$(kubectl get pod -n observability -l app=alertmanager -o jsonpath='{.items[0].metadata.name}') kubectl logs -n observability $POD -c istio-proxy --tail=50
AlertManager Logs
kubectl logs -n observability deployment/alertmanager -c alertmanager --tail=100 # Check for notification errors kubectl logs -n observability deployment/alertmanager -c alertmanager | grep -i "notif\|error\|fail"
Log Analysis Patterns
Detect Crash Loops
# Find pods restarting frequently kubectl get pods -A | awk '{if ($4 > 5) print $0}' # Check logs before crash kubectl logs -n <namespace> <pod-name> --previous | tail -50
Find HTTP Errors
{kubernetes_namespace_name=~".+"} |~ "HTTP.*[45]\\d{2}"
Find Timeout Errors
{kubernetes_namespace_name=~".+"} |~ "timeout|timed out|deadline exceeded"
Find Database Connection Issues
{kubernetes_namespace_name=~".+"} |~ "database.*error|connection.*refused|SQL.*error"
Troubleshooting with Logs
Issue: Service Not Starting
- Check pod events:
kubectl describe pod <pod-name> -n <namespace> - Check container logs:
kubectl logs <pod-name> -n <namespace> - Check init container logs:
kubectl logs <pod-name> -n <namespace> -c <init-container>
Issue: High Error Rate
- Query error logs:
{kubernetes_namespace_name="X"} |= "error" [5m] - Group by component:
sum by (kubernetes_pod_name) (count_over_time({...} |= "error" [5m])) - Identify pattern in error messages
Issue: Performance Degradation
- Check for warnings:
{kubernetes_namespace_name="X"} |= "warn" - Look for timeout messages
- Check for resource exhaustion messages
Grafana Loki Dashboard Features
Loki Logs Dashboard: https://grafana.localtest.me:9443/d/loki-logs/loki-logs
Features:
- Namespace filter: Select specific namespace
- Pod filter: Filter by pod name
- Log level: Filter by error/warn/info/debug
- Time range: Select time window
- Log volume graphs: See log rate over time
- Log table: Browse actual log lines
Panels:
- Log Volume by Level: See errors vs warnings over time
- Log Volume by Namespace: Compare activity across namespaces
- Logs per Second: Current log ingestion rate
- Log Lines: Actual log content with search
Related Documentation
- Loki Documentation
- LogQL Query Language
- CLAUDE.md Troubleshooting
- Alert Runbooks - Many reference logs
Pro Tips
- Use time ranges: Always specify time range to limit data
- Filter early: Add namespace/pod filters before log level filters (more efficient)
- Use regex carefully: Complex regex can be slow on large log volumes
- Check both current and previous: For crashed pods, use
--previous - Tail first: Use
to limit output, then increase if needed--tail=N
🤖 Generated with Claude Code