Claude-skill-registry alerting
Real-time alerting and notification system for Univers infrastructure. Use this when you need to monitor system health, service status, and send proactive alerts when thresholds are exceeded or services fail.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/alerting" ~/.claude/skills/majiayu000-claude-skill-registry-alerting && rm -rf "$T"
manifest:
skills/data/alerting/SKILL.mdsource content
Alerting Skill
This skill provides comprehensive monitoring and alerting capabilities for the Univers infrastructure ecosystem.
Capabilities
1. Real-time Monitoring
- System resource monitoring (CPU, Memory, Disk, Network)
- Service health checks (HTTP endpoints, ports, processes)
- Application-specific metrics (response times, error rates)
- Custom metric collection and aggregation
2. Alert Engine
- Threshold-based alerting
- Rate limiting and alert suppression
- Alert escalation policies
- Multi-condition alert rules
3. Notification Channels
- Email notifications with rich formatting
- Slack/Teams integration with actionable messages
- Webhook support for custom integrations
- In-app notifications and banners
4. Alert Management
- Alert acknowledgment and resolution
- Alert history and analytics
- Scheduled maintenance windows
- Alert rule testing and validation
5. Dashboards and Reports
- Real-time alert status dashboard
- Historical alert trends and analytics
- Service health overview
- Performance metrics visualization
Common Tasks
Basic Alert Setup
# Check system for alert conditions alert check system # Monitor specific services alert monitor services # Test notification channels alert test channels
Alert Rule Management
# List all alert rules alert rules list # Add new alert rule alert rules add cpu-high --threshold 80 --duration 5m # Update existing rule alert rules update memory-usage --threshold 90 # Remove alert rule alert rules remove disk-space-low
Notification Configuration
# Configure email notifications alert config email --smtp smtp.example.com --from alerts@example.com # Configure Slack integration alert config slack --webhook https://hooks.slack.com/... --channel #alerts # Test notification delivery alert test email --to admin@example.com alert test slack --message "Test alert"
Alert Operations
# View active alerts alert status # Acknowledge an alert alert acknowledge CPU_HIGH_001 # Resolve an alert alert resolve MEMORY_HIGH_003 # View alert history alert history --last 24h
Alert Rule Examples
System Resource Alerts
# High CPU Usage name: cpu-high condition: cpu_usage > 80 duration: 5m severity: warning message: "CPU usage is {{cpu_usage}}% on {{hostname}}" actions: - type: email to: ops@example.com - type: slack channel: #alerts # Critical Memory Usage name: memory-critical condition: memory_usage > 90 duration: 2m severity: critical message: "Critical memory usage: {{memory_usage}}%" actions: - type: webhook url: https://api.pagerduty.com/incidents
Service Health Alerts
# Service Down name: service-down condition: service_health == 0 duration: 1m severity: critical message: "{{service_name}} is down on {{hostname}}" actions: - type: email to: devops@example.com - type: restart service: "{{service_name}}" # High Response Time name: slow-response condition: response_time > 2000 duration: 3m severity: warning message: "{{service_name}} response time: {{response_time}}ms" actions: - type: slack channel: #performance
Application-Specific Alerts
# High Error Rate name: high-error-rate condition: error_rate > 5 duration: 5m severity: warning message: "{{application}} error rate: {{error_rate}}%" actions: - type: email to: dev-team@example.com # Database Connection Issues name: db-connection-failed condition: db_connection_status != "healthy" duration: 30s severity: critical message: "Database connection failed for {{application}}" actions: - type: webhook url: https://hooks.slack.com/...
Integration Examples
Univers Services Integration
# Monitor Univers services alert monitor univers-services # Check specific Univers endpoints alert check endpoint http://localhost:3003/health --service univers-server alert check endpoint http://localhost:6007 --service univers-ui alert check endpoint http://localhost:5173 --service univers-web # Monitor tmux sessions alert monitor tmux-sessions --alert-if-missing univers-developer
Container Integration
# Monitor Docker containers alert monitor containers --include univers-* # Check container health alert check container univers-server alert check container univers-ui
Configuration Files
Alert Rules Configuration
# ~/.config/univers/alerting/rules.yaml rules: - name: system-cpu-high type: system metric: cpu_usage operator: ">" threshold: 80 duration: 5m severity: warning - name: service-unavailable type: service check: http_status target: "http://localhost:3003/health" operator: "!=" threshold: 200 duration: 1m severity: critical
Notification Channels
# ~/.config/univers/alerting/channels.yaml channels: email: smtp_host: smtp.gmail.com smtp_port: 587 username: alerts@company.com password: ${SMTP_PASSWORD} slack: webhook_url: ${SLACK_WEBHOOK_URL} default_channel: #univers-alerts webhook: endpoint: https://api.example.com/alerts headers: Authorization: "Bearer ${API_TOKEN}"
Best Practices
- Set Meaningful Thresholds: Avoid alert fatigue by setting realistic thresholds
- Use Escalation Policies: Implement graduated alert escalation
- Provide Context: Include relevant details in alert messages
- Test Regularly: Verify alert rules and notification channels
- Document Procedures: Maintain clear runbooks for common alerts
Troubleshooting
Common Issues
- Missing Notifications: Check channel configurations and connectivity
- False Positives: Review alert thresholds and conditions
- Alert Storms: Implement rate limiting and suppression rules
- Slow Performance: Optimize alert check intervals and data collection
Debug Commands
# Check alert engine status alert status --verbose # Test specific rule alert test-rule cpu-high # Check notification delivery alert test-notification email --to test@example.com # View alert engine logs alert logs --tail 100
Version History
- v1.0 (2025-12-16): Initial alerting system implementation
- Basic monitoring, email notifications, and alert rules