Claude-code-plugins canva-incident-runbook
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/canva-pack/skills/canva-incident-runbook" ~/.claude/skills/jeremylongshore-claude-code-plugins-canva-incident-runbook && rm -rf "$T"
manifest:
plugins/saas-packs/canva-pack/skills/canva-incident-runbook/SKILL.mdsource content
Canva Incident Runbook
Overview
Rapid incident response for Canva Connect API integration failures. Covers triage, mitigation, escalation, and postmortem.
Quick Triage (First 5 Minutes)
#!/bin/bash # canva-triage.sh — Run immediately when incident detected echo "=== Canva Triage ===" # 1. Is it Canva or us? echo -n "Canva API: " curl -s -o /dev/null -w "HTTP %{http_code} (%{time_total}s)\n" \ -H "Authorization: Bearer $CANVA_ACCESS_TOKEN" \ "https://api.canva.com/rest/v1/users/me" # 2. Check our health endpoint echo -n "Our health: " curl -s -o /dev/null -w "HTTP %{http_code}\n" \ "https://api.ourapp.com/health" # 3. Error rate (if Prometheus available) echo "Error rate (5min):" curl -s "localhost:9090/api/v1/query?query=rate(canva_api_errors_total[5m])" \ | python3 -c "import sys,json; d=json.load(sys.stdin); print(d['data']['result'])" 2>/dev/null \ || echo "Prometheus not available" # 4. Rate limit status echo -n "Rate limit remaining: " curl -sD - -o /dev/null -H "Authorization: Bearer $CANVA_ACCESS_TOKEN" \ "https://api.canva.com/rest/v1/designs?limit=1" 2>&1 \ | grep -i "x-ratelimit-remaining" || echo "unknown"
Decision Tree
API returning errors? ├── YES → What HTTP status? │ ├── 401 → Token expired → Refresh token, check rotation │ ├── 403 → Scope issue → Verify integration permissions │ ├── 429 → Rate limited → Enable backoff, check Retry-After │ ├── 5xx → Canva outage → Enable fallback, monitor status page │ └── Other → Check request format against API docs └── NO → Is our integration healthy? ├── YES → Likely resolved or intermittent → Monitor └── NO → Check our infra (pods, memory, DNS, TLS)
Severity Levels
| Level | Definition | Response Time | Example |
|---|---|---|---|
| P1 | All design operations broken | < 15 min | All API calls returning 5xx |
| P2 | Degraded — some operations fail | < 1 hour | Exports failing, designs work |
| P3 | Minor — non-critical feature down | < 4 hours | Webhooks delayed |
| P4 | No user impact | Next business day | Monitoring gap |
Immediate Mitigation by Error Type
401 — Token Expired / Revoked
# Check if token is valid curl -s -H "Authorization: Bearer $TOKEN" \ https://api.canva.com/rest/v1/users/me | python3 -m json.tool # If expired: refresh all affected users' tokens # If revoked: users must re-authorize via OAuth flow
429 — Rate Limited
# Check how long to wait curl -sD - -o /dev/null -H "Authorization: Bearer $TOKEN" \ "https://api.canva.com/rest/v1/designs" 2>&1 \ | grep -i "retry-after" # Immediate: reduce request rate # Enable queue-based rate limiting
5xx — Canva Service Error
# Check Canva status page (no official status.canva.com for API) # Check Canva developer community for reported outages # Enable graceful degradation # Return cached data where possible # Show "Design features temporarily unavailable" to users
Communication Templates
Internal (Slack)
P[1-4] INCIDENT: Canva Integration Status: INVESTIGATING | MITIGATING | RESOLVED Impact: [Describe user impact] API Response: HTTP [status code] Current action: [What you're doing] Next update: [Time] IC: @[name]
External (Status Page)
Canva Design Features — Degraded Performance We are experiencing issues with our design integration. Users may see delays or errors when creating/exporting designs. We are actively working with our design platform provider to resolve this. Last updated: [ISO 8601 timestamp]
Post-Incident
Evidence Collection
# Collect logs for the incident window kubectl logs -l app=canva-integration --since=2h > incident-canva-logs.txt # Export metrics curl "localhost:9090/api/v1/query_range?query=canva_api_errors_total&start=$(date -d '2 hours ago' +%s)&end=$(date +%s)&step=60" > metrics.json
Postmortem Template
## Incident: Canva API [Error Type] **Date:** YYYY-MM-DD HH:MM UTC **Duration:** X hours Y minutes **Severity:** P[1-4] ### Summary [1-2 sentence description] ### Timeline (UTC) - HH:MM — [First alert / error detected] - HH:MM — [Investigation started] - HH:MM — [Root cause identified] - HH:MM — [Mitigation applied] - HH:MM — [Confirmed resolved] ### Root Cause [Was it Canva-side or our integration? Token issue? Rate limit? Code bug?] ### Impact - Users affected: N - Failed operations: N designs / N exports ### Action Items - [ ] [Preventive measure] — Owner — Due date
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Can't determine if Canva is down | No status page API | Test with known-good token |
| Token refresh fails | Revoked integration | Re-authorize user |
| All users affected | Integration-level issue | Check client credentials |
| Single user affected | User-level token issue | Refresh that user's token |
Resources
Next Steps
For data handling, see
canva-data-handling.