Claude-skill-registry check-alerts
Check currently firing Grafana alerts, analyze alert status, and investigate alert issues in the Kagenti platform
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/check-alerts" ~/.claude/skills/majiayu000-claude-skill-registry-check-alerts && rm -rf "$T"
manifest:
skills/data/check-alerts/SKILL.mdsource content
Check Alerts Skill
This skill helps you check and analyze Grafana alerts in the Kagenti platform.
When to Use
- User asks "what alerts are firing?"
- User wants to check alert status
- After platform changes or deployments
- During incident investigation
- When troubleshooting platform issues
What This Skill Does
- List Firing Alerts: Show all currently active alerts
- Alert Details: Display alert severity, component, and description
- Alert History: Check recent alert state changes
- Query Alert Rules: Verify alert configuration
- Test Alert Queries: Validate PromQL queries
Examples
Check Firing Alerts
# Get all currently firing alerts from Grafana kubectl exec -n observability deployment/grafana -- \ curl -s 'http://localhost:3000/api/alertmanager/grafana/api/v2/alerts' \ -u admin:admin123 | python3 -c " import sys, json alerts = json.load(sys.stdin) firing = [a for a in alerts if a.get('status', {}).get('state') == 'active'] print(f'Firing alerts: {len(firing)}') for alert in firing: labels = alert.get('labels', {}) annotations = alert.get('annotations', {}) print(f\"\\n• {labels.get('alertname')} ({labels.get('severity')})\") print(f\" Component: {labels.get('component')}\") print(f\" Description: {annotations.get('description', 'N/A')[:100]}...\") "
List All Alert Rules
# Get all configured alert rules kubectl exec -n observability deployment/grafana -- \ curl -s 'http://localhost:3000/api/v1/provisioning/alert-rules' \ -u admin:admin123 | python3 -c " import sys, json rules = json.load(sys.stdin) print(f'Total alert rules: {len(rules)}') for rule in rules: print(f\" • {rule.get('title')} ({rule.get('labels', {}).get('severity')})\") "
Check Specific Alert Configuration
# Get configuration for a specific alert kubectl exec -n observability deployment/grafana -- \ curl -s 'http://localhost:3000/api/v1/provisioning/alert-rules' \ -u admin:admin123 | python3 -c " import sys, json rules = json.load(sys.stdin) alert_uid = 'prometheus-down' # Change this to the alert UID rule = next((r for r in rules if r.get('uid') == alert_uid), None) if rule: print(f\"Alert: {rule.get('title')}\") print(f\"Query: {rule.get('data', [{}])[0].get('model', {}).get('expr')}\") print(f\"noDataState: {rule.get('noDataState')}\") print(f\"execErrState: {rule.get('execErrState')}\") "
Test Alert Query Against Prometheus
# Test an alert's PromQL query QUERY='up{job="kubernetes-pods",app="prometheus"} == 0' kubectl exec -n observability deployment/grafana -- \ curl -s -G 'http://prometheus.observability.svc:9090/api/v1/query' \ --data-urlencode "query=${QUERY}" | python3 -m json.tool
Check Alert Evaluation State
# Check why an alert is firing or not firing kubectl exec -n observability deployment/grafana -- \ curl -s 'http://localhost:3000/api/v1/eval/rules' \ -u admin:admin123 | python3 -m json.tool
Alert Locations in Grafana UI
Access Grafana: https://grafana.localtest.me:9443 Credentials: admin / admin123
Navigation:
- Alerting → Alert rules - View all configured alerts
- Alerting → Alert list - See firing/pending alerts
- Alerting → Silences - Manage alert silences
- Alerting → Contact points - Check notification settings
- Alerting → Notification policies - View routing rules
Common Alert Issues
False Positives
- Check
configuration (should benoDataState
for most alerts)OK - Verify query matches actual resource type (Deployment vs StatefulSet)
- Test query returns correct results
Alert Not Firing When It Should
- Verify metric exists in Prometheus
- Check alert threshold is appropriate
- Verify
duration isn't too longfor - Check
isn't masking the issuenoDataState
Alert Configuration Not Loading
- Restart Grafana:
kubectl rollout restart deployment/grafana -n observability - Check ConfigMap applied:
kubectl get configmap grafana-alerting -n observability - Verify no YAML syntax errors
Related Documentation
- Alert Runbooks
- Alert Testing Guide
- CLAUDE.md Alert Monitoring
- TODO_INCIDENTS.md - Current incident tracking
Runbooks by Alert
When an alert fires, consult its runbook:
docs/runbooks/alerts/<alert-uid>.md
Example: If "Prometheus Down" alert fires →
docs/runbooks/alerts/prometheus-down.md
🤖 Generated with Claude Code