Claude-code-plugins-plus-skills miro-incident-runbook
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/miro-pack/skills/miro-incident-runbook" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-miro-incident-runbook && rm -rf "$T"
manifest:
plugins/saas-packs/miro-pack/skills/miro-incident-runbook/SKILL.mdsource content
Miro Incident Runbook
Overview
Rapid incident response for Miro REST API v2 integration failures: triage, mitigation, recovery, and postmortem.
Severity Levels
| Level | Definition | Response | Example |
|---|---|---|---|
| P1 | Complete integration outage | < 15 min | Miro API returns 5xx on all calls |
| P2 | Degraded service | < 1 hour | High latency, partial 429s |
| P3 | Minor impact | < 4 hours | Webhook delays, single-board errors |
| P4 | No user impact | Next business day | Monitoring gaps, non-critical warnings |
Quick Triage (First 5 Minutes)
#!/bin/bash # miro-triage.sh — Run this first during any Miro incident echo "=== MIRO TRIAGE $(date -u +%H:%M:%SZ) ===" # 1. Is Miro itself down? echo -n "Miro Status: " curl -sf "https://status.miro.com/api/v2/status.json" | jq -r '.status.description' 2>/dev/null || echo "STATUS PAGE UNREACHABLE" # 2. Can we reach the API? echo -n "API Connectivity: " curl -s -o /dev/null -w "HTTP %{http_code} (%{time_total}s)" \ -H "Authorization: Bearer ${MIRO_ACCESS_TOKEN}" \ "https://api.miro.com/v2/boards?limit=1" 2>/dev/null echo "" # 3. What's our rate limit status? echo "Rate Limit:" curl -sI -H "Authorization: Bearer ${MIRO_ACCESS_TOKEN}" \ "https://api.miro.com/v2/boards?limit=1" 2>/dev/null | \ grep -i "x-ratelimit\|retry-after" || echo " No rate limit headers" # 4. Token validity echo -n "Token: " TOKEN_RESP=$(curl -s -H "Authorization: Bearer ${MIRO_ACCESS_TOKEN}" \ "https://api.miro.com/v1/oauth-token" 2>/dev/null) echo "$TOKEN_RESP" | jq -r '"scopes: \(.scopes // "INVALID"), team: \(.team.id // "N/A")"' 2>/dev/null || echo "INVALID OR EXPIRED" # 5. Our health check echo -n "App Health: " curl -sf "${APP_URL:-http://localhost:3000}/health" | jq -r '.miro.status // "UNAVAILABLE"' 2>/dev/null || echo "HEALTH CHECK FAILED"
Decision Tree
Miro API returning errors? ├── YES → What status code? │ ├── 401/403 → Token issue │ │ ├── Token expired? → Refresh token (see below) │ │ └── Scopes changed? → Re-authorize via OAuth flow │ ├── 429 → Rate limited │ │ ├── Check X-RateLimit-Remaining header │ │ ├── Honor Retry-After header │ │ └── Reduce request rate or enable queue │ ├── 404 → Board/item not found │ │ └── Verify IDs haven't changed │ └── 500/502/503 → Miro platform issue │ ├── Check status.miro.com │ ├── Enable graceful degradation │ └── Wait for Miro to resolve └── NO → Is our integration healthy? ├── YES → Intermittent. Monitor for recurrence. └── NO → Our infrastructure issue ├── Check pods/containers ├── Check memory/CPU └── Check network/DNS
Immediate Actions by Error Type
401 — Token Expired
# Refresh access token curl -s -X POST https://api.miro.com/v1/oauth/token \ -d "grant_type=refresh_token" \ -d "client_id=${MIRO_CLIENT_ID}" \ -d "client_secret=${MIRO_CLIENT_SECRET}" \ -d "refresh_token=${MIRO_REFRESH_TOKEN}" | jq # If refresh token is also expired, user must re-authorize: # Redirect to: https://miro.com/oauth/authorize?response_type=code&client_id=${MIRO_CLIENT_ID}&redirect_uri=${REDIRECT_URI}
403 — Insufficient Permissions
# Check what scopes the token has curl -s -H "Authorization: Bearer ${MIRO_ACCESS_TOKEN}" \ "https://api.miro.com/v1/oauth-token" | jq '.scopes' # Compare with what the failed endpoint requires # boards:read for GET endpoints # boards:write for POST/PATCH/DELETE endpoints # team:read / organizations:read for team/org endpoints
429 — Rate Limited
# Check current rate limit status curl -sI -H "Authorization: Bearer ${MIRO_ACCESS_TOKEN}" \ "https://api.miro.com/v2/boards?limit=1" | grep -i ratelimit # Response headers: # X-RateLimit-Limit: 100000 (credits per minute) # X-RateLimit-Remaining: 0 # Retry-After: 30 (seconds) # Immediate mitigation: pause all non-critical API calls # Long-term: implement caching + webhooks (see miro-performance-tuning)
5xx — Miro Platform Issue
# 1. Confirm it's Miro-side curl -s "https://status.miro.com/api/v2/status.json" | jq '.status' # 2. Check for ongoing incidents curl -s "https://status.miro.com/api/v2/incidents/unresolved.json" | \ jq '.incidents[] | {name, status, updated_at}' # 3. Enable graceful degradation in your app # Feature flag: MIRO_FALLBACK_ENABLED=true # Serve cached data, queue writes for retry when Miro recovers
Communication Templates
Internal (Slack/PagerDuty)
P[1-4] INCIDENT: Miro Integration Status: INVESTIGATING | IDENTIFIED | MONITORING | RESOLVED Impact: [What users experience] Root cause: [Miro-side outage | Token expired | Rate limited | Our bug] Action: [What we're doing now] ETA: [Expected resolution time] Next update: [When]
External (Status Page)
Miro Integration — Degraded Performance We are experiencing issues with our Miro integration. [Board sync / item creation / webhook processing] may be delayed. Root cause: [Brief technical explanation] Workaround: [If any — e.g., "Changes will sync when service recovers"] Last updated: [timestamp UTC]
Post-Incident Evidence Collection
# Collect evidence for postmortem INCIDENT_DIR="miro-incident-$(date +%Y%m%d-%H%M%S)" mkdir -p "$INCIDENT_DIR" # API response during incident curl -s -H "Authorization: Bearer ${MIRO_ACCESS_TOKEN}" \ "https://api.miro.com/v2/boards?limit=1" > "$INCIDENT_DIR/api-response.json" # Miro status page snapshot curl -s "https://status.miro.com/api/v2/incidents/unresolved.json" > "$INCIDENT_DIR/miro-status.json" # Application metrics (adjust query for your Prometheus) curl -s "http://prometheus:9090/api/v1/query_range?query=rate(miro_errors_total[5m])&start=$(date -d '2 hours ago' +%s)&end=$(date +%s)&step=60" > "$INCIDENT_DIR/error-metrics.json" # Package (exclude tokens) tar -czf "$INCIDENT_DIR.tar.gz" "$INCIDENT_DIR" echo "Evidence collected: $INCIDENT_DIR.tar.gz"
Postmortem Template
## Incident: Miro [Error Type] **Date:** YYYY-MM-DD **Duration:** X hours Y minutes **Severity:** P[1-4] **Impact:** [Users affected, features impacted] ### Timeline (UTC) - HH:MM — [First error detected by monitoring] - HH:MM — [On-call alerted] - HH:MM — [Root cause identified] - HH:MM — [Mitigation applied] - HH:MM — [Service restored] ### Root Cause [Technical explanation — e.g., "Access token expired and refresh logic had a bug where it used the old refresh token instead of the new one returned in the last refresh response."] ### What Went Well - [Monitoring detected the issue within 2 minutes] - [Runbook was accurate and followed] ### What Went Wrong - [Token refresh logic untested in integration tests] - [No alerting on 401 error rate] ### Action Items - [ ] Add integration test for token refresh flow — @owner — Due date - [ ] Add P1 alert for miro_errors_total{error_type="auth"} > 0 — @owner — Due date - [ ] Document token rotation procedure — @owner — Due date
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Status page unreachable | DNS/network | Use mobile or VPN |
| Token refresh fails | Refresh token revoked | User must re-authorize |
| Rate limit persists after reset | Clock skew | Use header, not local clock |
| Metrics unavailable | Prometheus down | Check application logs directly |
Resources
Next Steps
For data handling and compliance, see
miro-data-handling.