Claude-code-plugins-plus-skills gamma-incident-runbook
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/gamma-pack/skills/gamma-incident-runbook" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-gamma-incident-runbook && rm -rf "$T"
manifest:
plugins/saas-packs/gamma-pack/skills/gamma-incident-runbook/SKILL.mdsource content
Gamma Incident Runbook
Overview
Systematic procedures for responding to and resolving Gamma integration incidents.
Prerequisites
- Access to monitoring dashboards
- Access to application logs
- On-call responsibilities defined
- Communication channels established
Incident Severity Levels
| Level | Description | Response Time | Escalation |
|---|---|---|---|
| P1 | Complete outage, no presentations | < 15 min | Immediate |
| P2 | Degraded, slow or partial failures | < 30 min | 1 hour |
| P3 | Minor issues, workaround available | < 2 hours | 4 hours |
| P4 | Cosmetic or non-urgent | < 24 hours | None |
Quick Diagnostics
Step 1: Check Gamma Status
set -euo pipefail # Check Gamma status page curl -s https://status.gamma.app/api/v2/status.json | jq '.status' # Check our integration health curl -s https://your-app.com/health/gamma | jq '.' # Quick connectivity test curl -w "\nTime: %{time_total}s\n" \ -H "Authorization: Bearer $GAMMA_API_KEY" \ https://api.gamma.app/v1/ping
Step 2: Review Key Metrics
set -euo pipefail # Check error rate (Prometheus) curl -s 'http://prometheus:9090/api/v1/query?query=rate(gamma_requests_total{status=~"5.."}[5m])' | jq '.data.result' # 9090: Prometheus port # Check latency P95 curl -s 'http://prometheus:9090/api/v1/query?query=histogram_quantile(0.95,rate(gamma_request_duration_seconds_bucket[5m]))' | jq '.data.result' # Prometheus port # Check rate limit curl -s 'http://prometheus:9090/api/v1/query?query=gamma_rate_limit_remaining' | jq '.data.result' # Prometheus port
Step 3: Review Recent Logs
# Last 100 error logs grep -i "gamma.*error" /var/log/app/gamma-*.log | tail -100 # Rate limit hits grep "429" /var/log/app/gamma-*.log | wc -l # HTTP 429 Too Many Requests # Timeout errors grep -i "timeout" /var/log/app/gamma-*.log | tail -50
Incident Response Procedures
Scenario 1: API Returning 5xx Errors
Symptoms:
- High error rate in monitoring
- Users reporting failed presentations
- 500/502/503 responses from Gamma
Actions:
-
Verify Gamma status: https://status.gamma.app
-
If Gamma outage confirmed:
- Enable degraded mode / show maintenance message
- Monitor status page for updates
- No action needed on our side
-
If Gamma is operational:
# Check our request patterns grep "5[0-9][0-9]" /var/log/app/gamma-*.log | \ awk '{print $1}' | sort | uniq -c | sort -rn # Look for malformed requests grep -B5 "500" /var/log/app/gamma-*.log | grep "request" -
Rollback recent deployments if issue correlates
Scenario 2: Rate Limit Exceeded (429)
Symptoms:
- 429 responses in logs
- Rate limit metrics at zero
- Slow or queued requests
Actions:
-
Immediate mitigation:
# Enable request throttling curl -X POST http://localhost:8080/admin/throttle \ -d '{"gamma": {"rps": 10}}' -
Check for runaway processes:
# Find high-volume clients grep "gamma" /var/log/app/*.log | \ awk '{print $5}' | sort | uniq -c | sort -rn | head -20 -
Enable circuit breaker:
curl -X POST http://localhost:8080/admin/circuit-breaker \ -d '{"service": "gamma", "state": "open"}' -
Long-term: Review rate limit tier with Gamma
Scenario 3: High Latency
Symptoms:
- Slow presentation creation
- Timeouts in logs
- P95 latency > 10s
Actions:
-
Check Gamma latency vs our latency:
# Direct Gamma latency for i in {1..5}; do curl -w "%{time_total}\n" -o /dev/null -s \ -H "Authorization: Bearer $GAMMA_API_KEY" \ https://api.gamma.app/v1/ping done -
If Gamma is slow:
- Increase timeouts temporarily
- Enable async mode for non-critical operations
- Queue heavy operations
-
If our infrastructure is slow:
- Check CPU/memory on app servers
- Review connection pool settings
- Check network connectivity
Scenario 4: Authentication Failures (401/403)
Symptoms:
- All requests failing with 401
- "Invalid API key" errors
- Sudden authentication failures
Actions:
-
Verify API key:
# Test key directly curl -H "Authorization: Bearer $GAMMA_API_KEY" \ https://api.gamma.app/v1/ping # Check key format echo $GAMMA_API_KEY | head -c 20 -
If key is invalid:
- Check if key was rotated
- Deploy backup key:
GAMMA_API_KEY_SECONDARY - Generate new key in Gamma dashboard
-
Notify team and update secrets
Communication Templates
Internal Notification
INCIDENT: Gamma Integration Issue Severity: P[X] Status: Investigating / Identified / Mitigating / Resolved Impact: [Description of user impact] Start Time: [ISO timestamp] Summary: [Brief description] Current Actions: - [Action 1] - [Action 2] Next Update: [Time]
User-Facing Message
We're currently experiencing issues with presentation generation. Our team is actively working to resolve this. Workaround: [If available] Status updates: [Link to status page] ETA: [If known]
Post-Incident Checklist
- Incident timeline documented
- Root cause identified
- User impact quantified
- Fix verified in production
- Monitoring gaps identified
- Preventive measures documented
- Post-mortem scheduled (for P1/P2)
Resources
Next Steps
Proceed to
gamma-data-handling for data management.