Skillshub customerio-advanced-troubleshooting
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/customerio-advanced-troubleshooting" ~/.claude/skills/comeonoliver-skillshub-customerio-advanced-troubleshooting && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/customerio-advanced-troubleshooting/SKILL.mdsource content
Customer.io Advanced Troubleshooting
Overview
Advanced debugging techniques for complex Customer.io issues: systematic investigation framework, API debug client, user profile analysis, campaign/broadcast debugging, network diagnostics, and incident response runbooks.
Prerequisites
- Access to Customer.io dashboard (admin recommended)
- Application logs access
for API testingcurl
Troubleshooting Framework
For every issue, answer these five questions first:
- What is the expected vs actual behavior?
- When did the issue start? (Check deploy history, CIO status page)
- Who is affected — one user, a segment, or everyone?
- Where in the pipeline — API call, delivery, or rendering?
- How often — every time, intermittent, or one-time?
Instructions
Step 1: API Debug Client
// lib/customerio-debug.ts import { TrackClient, APIClient, RegionUS } from "customerio-node"; export class DebugCioClient { private track: TrackClient; constructor() { this.track = new TrackClient( process.env.CUSTOMERIO_SITE_ID!, process.env.CUSTOMERIO_TRACK_API_KEY!, { region: RegionUS } ); } async debugIdentify(userId: string, attrs: Record<string, any>) { console.log(`\n--- Debug: identify("${userId}") ---`); console.log("Attributes:", JSON.stringify(attrs, null, 2)); const start = Date.now(); try { await this.track.identify(userId, attrs); const latency = Date.now() - start; console.log(`Result: SUCCESS (${latency}ms)`); return { success: true, latency }; } catch (err: any) { const latency = Date.now() - start; console.log(`Result: FAILED (${latency}ms)`); console.log(`Status: ${err.statusCode}`); console.log(`Message: ${err.message}`); console.log(`Body: ${JSON.stringify(err.body ?? err.response)}`); return { success: false, latency, statusCode: err.statusCode, message: err.message }; } } async debugTrack(userId: string, name: string, data?: any) { console.log(`\n--- Debug: track("${userId}", "${name}") ---`); console.log("Data:", JSON.stringify(data, null, 2)); const start = Date.now(); try { await this.track.track(userId, { name, data }); const latency = Date.now() - start; console.log(`Result: SUCCESS (${latency}ms)`); return { success: true, latency }; } catch (err: any) { const latency = Date.now() - start; console.log(`Result: FAILED (${latency}ms)`); console.log(`Status: ${err.statusCode}`); console.log(`Message: ${err.message}`); return { success: false, latency, statusCode: err.statusCode }; } } }
Step 2: User Investigation Script
#!/usr/bin/env bash set -euo pipefail # scripts/investigate-user.sh <user-id> USER_ID="${1:?Usage: investigate-user.sh <user-id>}" SITE_ID="${CUSTOMERIO_SITE_ID:?Missing CUSTOMERIO_SITE_ID}" API_KEY="${CUSTOMERIO_TRACK_API_KEY:?Missing CUSTOMERIO_TRACK_API_KEY}" echo "=== Investigating User: ${USER_ID} ===" echo "" # 1. Check if user exists (try to identify with minimal data) echo "--- API Connectivity Test ---" HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \ -u "${SITE_ID}:${API_KEY}" \ -X PUT "https://track.customer.io/api/v1/customers/${USER_ID}" \ -H "Content-Type: application/json" \ -d '{"_debug_check":"true"}') echo "Track API for user: HTTP ${HTTP_CODE}" echo "" echo "--- Dashboard Checklist ---" echo "Check the following in Customer.io dashboard:" echo "1. People > Search '${USER_ID}'" echo " - Does profile exist?" echo " - Does it have an email attribute?" echo " - Is there a 'Suppressed' badge?" echo "" echo "2. Activity tab:" echo " - Are events being received?" echo " - Any bounce/complaint events?" echo " - Last identify timestamp correct?" echo "" echo "3. Segments tab:" echo " - Which segments does user belong to?" echo " - Does segment match campaign audience?" echo "" echo "4. Campaigns > Find relevant campaign:" echo " - Is campaign status Active?" echo " - Does trigger event match?" echo " - Check 'Messages' tab for delivery attempts"
Step 3: Campaign Debugging
Common campaign issues and their investigation path:
| Symptom | Check First | Then Check |
|---|---|---|
| Campaign not triggering | Event name match (case-sensitive) | Campaign status (Active?) |
| User not matched | Segment conditions | User attributes match segment? |
| Email not delivered | User has attribute | Bounce/suppression status |
| Liquid template broken | has all required fields | Preview with real data in dashboard |
| Wrong email content | Correct campaign version is Active | Template variables populated |
| Delayed sends | Campaign "Wait" steps | Queue backlog in Customer.io |
// Programmatic campaign debug async function debugCampaignTrigger( userId: string, eventName: string, eventData: Record<string, any> ) { const debug = new DebugCioClient(); console.log("=== Campaign Trigger Debug ===\n"); // 1. Can we identify the user? const identifyResult = await debug.debugIdentify(userId, { _debug_campaign_check: true, }); if (!identifyResult.success) { console.log("\nBLOCKER: Cannot identify user. Fix auth first."); return; } // 2. Can we track the event? const trackResult = await debug.debugTrack(userId, eventName, eventData); if (!trackResult.success) { console.log("\nBLOCKER: Cannot track event. Check error above."); return; } console.log("\n=== API Side OK ==="); console.log("If campaign still not triggering, check in dashboard:"); console.log(`1. Event name: "${eventName}" (must match exactly, case-sensitive)`); console.log("2. Campaign status: must be Active (not Draft/Paused)"); console.log("3. Campaign audience: user must match segment/filter"); console.log("4. Campaign frequency: check if user already received"); console.log("5. Suppression: check if user is suppressed"); }
Step 4: Network Diagnostics
#!/usr/bin/env bash set -euo pipefail # scripts/cio-network-diag.sh echo "=== Customer.io Network Diagnostics ===" echo "" # DNS resolution echo "--- DNS Resolution ---" for host in track.customer.io api.customer.io status.customer.io; do IP=$(dig +short "$host" 2>/dev/null | head -1) echo "${host}: ${IP:-FAILED}" done echo "" # TLS check echo "--- TLS Certificate ---" echo | openssl s_client -connect track.customer.io:443 -servername track.customer.io 2>/dev/null \ | openssl x509 -noout -subject -issuer -dates 2>/dev/null \ || echo "TLS check failed" echo "" # Latency test echo "--- Latency (5 samples) ---" for i in $(seq 1 5); do LATENCY=$(curl -s -o /dev/null -w "%{time_total}" "https://track.customer.io") echo "Request ${i}: ${LATENCY}s" done echo "" # Status page echo "--- Platform Status ---" curl -s "https://status.customer.io/api/v2/status.json" \ | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'Status: {d[\"status\"][\"description\"]}')" \ 2>/dev/null || echo "Could not fetch status"
Step 5: Incident Response Runbooks
P1 — Complete outage (all API calls failing):
- Check https://status.customer.io — is Customer.io down?
- If CIO is up: check your credentials (rotate if compromised)
- Enable circuit breaker to stop retries hitting a dead endpoint
- Switch to fallback queue (events stored in Redis/Kafka)
- Notify affected teams
- When restored: drain fallback queue, verify event delivery
P2 — High error rate (>5% failures):
- Check error breakdown: which status codes?
- If 429: reduce concurrency, check rate limiter config
- If 5xx: check CIO status page, enable backoff
- If 401: credentials may have been rotated — check secrets manager
- Monitor error rate — escalate to P1 if not recovering
P3 — Delivery issues (messages not arriving):
- Verify user has
attribute (People > user profile)email - Check suppression status (Suppressed badge)
- Check bounce history (Activity tab > filter bounces)
- Verify sending domain (Settings > Sending Domains)
- Check campaign is Active, trigger event matches
- Review spam folder and email headers
P4 — Webhook processing failures:
- Verify webhook endpoint is publicly accessible
- Check signature verification (secret matches dashboard?)
- Review webhook event logs for parsing errors
- Check queue health if using async processing
- Verify Customer.io IP allowlist if using firewall
Diagnostic Commands
# Quick API health check curl -s -u "$CUSTOMERIO_SITE_ID:$CUSTOMERIO_TRACK_API_KEY" \ -X PUT "https://track.customer.io/api/v1/customers/health-check" \ -H "Content-Type: application/json" \ -d '{"_diag":true}' \ -w "\nHTTP: %{http_code} Time: %{time_total}s\n" # Check App API curl -s -H "Authorization: Bearer $CUSTOMERIO_APP_API_KEY" \ "https://api.customer.io/v1/campaigns" \ -w "\nHTTP: %{http_code}\n" | head -5 # Platform status curl -s "https://status.customer.io/api/v2/status.json" | python3 -m json.tool
Error Handling
| Issue | Solution |
|---|---|
| Intermittent 5xx | Transient — retry with backoff handles it |
| Consistent 401 after deploy | Credentials changed — check env vars and secrets |
| User receiving duplicate messages | Event deduplication or campaign frequency cap |
| Webhook events stop arriving | Check endpoint health, CIO IP allowlist, SSL cert validity |
Resources
- Customer.io Status Page
- Customer.io Community
- Reporting Webhooks Reference
- Troubleshooting Deliverability
Next Steps
After troubleshooting, proceed to
customerio-reliability-patterns for fault tolerance.