Skills afrexai-n8n-mastery
n8n Workflow Mastery — Complete Automation Engineering System
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/1kalin/afrexai-n8n-mastery" ~/.claude/skills/openclaw-skills-afrexai-n8n-mastery && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/1kalin/afrexai-n8n-mastery" ~/.openclaw/skills/openclaw-skills-afrexai-n8n-mastery && rm -rf "$T"
skills/1kalin/afrexai-n8n-mastery/SKILL.mdn8n Workflow Mastery — Complete Automation Engineering System
You are an expert n8n workflow architect. You design, build, debug, optimize, and scale n8n automations following production-grade methodology. Every workflow you create is complete, functional, and follows the patterns in this guide.
Phase 1: Quick Health Check (Run First)
Score the current n8n setup (1 point each, /10):
| Signal | Check |
|---|---|
| Workflow naming | Consistent format? |
| Error handling | Every workflow has error trigger node? |
| Credentials | Using n8n credential store (not hardcoded)? |
| Versioning | Workflow descriptions include version/changelog? |
| Monitoring | Error workflow connected to notification channel? |
| Retry logic | HTTP nodes have retry on failure enabled? |
| Execution data | Pruning configured (not filling disk)? |
| Sub-workflows | Complex logic broken into reusable sub-workflows? |
| Environment vars | Using env vars for URLs/configs (not magic strings)? |
| Documentation | Each workflow has description explaining purpose? |
Score 0-3: Critical — follow this guide start to finish. Score 4-6: Gaps — focus on missing areas. Score 7-10: Mature — jump to advanced patterns.
Phase 2: Workflow Architecture & Design
2.1 Workflow Strategy Brief
Before building, answer these in a YAML brief:
workflow_brief: name: "[Category] Brief Description" problem: "What manual process does this eliminate?" trigger: "What starts this workflow? (webhook/schedule/event/manual)" inputs: - source: "Where does data come from?" format: "JSON/CSV/form/email/database" volume: "How many items per run? Per day?" outputs: - destination: "Where does data go?" format: "API call/email/database/file/notification" error_handling: "What happens when it fails?" sla: "How fast must it complete? Acceptable delay?" dependencies: - service: "External API/service name" auth_type: "API key/OAuth2/Basic" rate_limit: "Calls per minute/hour" owner: "Who maintains this workflow?" review_date: "When to review/optimize?"
2.2 Workflow Naming Convention
[CATEGORY] Action — Target (vX.Y) Categories: [SYNC] — Data synchronization between systems [PROCESS] — Multi-step business processes [NOTIFY] — Alerts and notifications [INGEST] — Data collection and import [EXPORT] — Reports and data export [MONITOR] — Health checks and monitoring [AI] — LLM/AI-powered workflows [INTERNAL] — Internal tooling and utilities Examples: [SYNC] HubSpot → Postgres — Contacts (v2.1) [PROCESS] Invoice Approval — Slack + QuickBooks (v1.3) [NOTIFY] Stripe Payment — Team Alert (v1.0) [AI] Support Ticket — Auto-classify + Route (v1.2)
2.3 Workflow Complexity Tiers
| Tier | Nodes | Description | Approach |
|---|---|---|---|
| Simple | 3-7 | Linear A→B→C | Single workflow |
| Standard | 8-15 | Branches, loops, some error handling | Single workflow + error trigger |
| Complex | 16-30 | Multi-service, conditional logic, retries | Main + sub-workflows |
| Enterprise | 30+ | Orchestration, queues, state management | Orchestrator + multiple sub-workflows |
Rule: If a workflow exceeds 30 nodes, decompose into sub-workflows.
2.4 Node Organization Layout
Left → Right flow (primary path) Top → Bottom (branches and error paths) Section 1 (x: 0-600): Trigger + Input Processing Section 2 (x: 600-1200): Core Logic + Transformations Section 3 (x: 1200-1800): Output + Delivery Section 4 (x: 1800+): Error Handling + Logging Use Sticky Notes for section labels (yellow = info, red = warning, green = success path)
Phase 3: Trigger Design Patterns
3.1 Trigger Selection Matrix
| Use Case | Trigger Type | Node | When to Use |
|---|---|---|---|
| External system sends data | Webhook | Webhook | API integrations, form submissions |
| Run at specific times | Schedule | Schedule Trigger | Reports, syncs, cleanup |
| React to n8n events | Error/Workflow | Error Trigger | Error handling, workflow chaining |
| Manual testing/ad-hoc | Manual | Manual Trigger | Development, one-off runs |
| Chat/conversational | Chat | Chat Trigger | AI assistants, chatbots |
| File changes | Polling | Various | Google Drive, S3, FTP monitoring |
| Email arrives | Polling | IMAP Email | Email processing workflows |
| Database change | Polling/Webhook | Various | CDC (Change Data Capture) |
3.2 Webhook Security Checklist
webhook_security: authentication: - method: "Header Auth" setup: "Add Header Auth credential, verify X-API-Key" use_when: "Service-to-service, simple integrations" - method: "HMAC Signature" setup: "Code node to verify HMAC-SHA256 of body" use_when: "Stripe, GitHub, Shopify webhooks" - method: "JWT Bearer" setup: "Code node to verify JWT token" use_when: "OAuth2 services, custom apps" - method: "IP Allowlist" setup: "IF node checking $request.headers['x-forwarded-for']" use_when: "Known source IPs (internal services)" validation: - "Always validate incoming payload schema with IF/Switch" - "Return appropriate HTTP status (200 OK, 400 Bad Request)" - "Log all webhook calls for audit trail" - "Set webhook timeout (don't leave connections hanging)" - "Use 'Respond to Webhook' node for async processing"
3.3 Schedule Trigger Patterns
schedule_patterns: business_hours_check: cron: "*/15 9-17 * * 1-5" description: "Every 15 min during business hours (Mon-Fri)" daily_morning_report: cron: "0 8 * * 1-5" description: "8 AM weekdays" weekly_cleanup: cron: "0 2 * * 0" description: "2 AM Sunday (low traffic)" monthly_billing: cron: "0 6 1 * *" description: "1st of month, 6 AM" smart_polling: cron: "*/5 * * * *" description: "Every 5 min — use with dedup to avoid reprocessing" dedup_strategy: "Store last processed ID/timestamp in n8n static data"
Phase 4: Core Node Patterns Library
4.1 HTTP Request — Production Pattern
{ "node": "HTTP Request", "settings": { "method": "POST", "url": "={{ $env.API_BASE_URL }}/endpoint", "authentication": "predefinedCredentialType", "sendHeaders": true, "headerParameters": { "Content-Type": "application/json", "User-Agent": "n8n-automation/1.0" }, "sendBody": true, "bodyParameters": "={{ JSON.stringify($json) }}", "options": { "timeout": 30000, "retry": { "maxRetries": 3, "retryInterval": 1000, "retryOnTimeout": true }, "response": { "response": { "fullResponse": true } } } } }
HTTP Request Rules:
- Always set timeout (default 300s is too long for most APIs)
- Enable retry with exponential backoff for external APIs
- Use credential store — never hardcode API keys in URL/headers
- Set User-Agent for debugging on the receiving end
- Use
for base URLs — never hardcode domains$env.VARIABLE - Full response mode when you need status code for branching
4.2 Code Node — Data Transformation Patterns
Pattern: Map and Transform
// Transform array of items return items.map(item => { const data = item.json; return { json: { id: data.id, fullName: `${data.first_name} ${data.last_name}`.trim(), email: data.email?.toLowerCase(), createdAt: new Date(data.created_at).toISOString(), source: 'n8n-sync', // Computed fields isActive: data.status === 'active', daysSinceSignup: Math.floor( (Date.now() - new Date(data.created_at)) / 86400000 ), } }; });
Pattern: Filter + Deduplicate
const seen = new Set(); return items.filter(item => { const key = item.json.email?.toLowerCase(); if (!key || seen.has(key)) return false; seen.add(key); return true; });
Pattern: Aggregate / Group By
const groups = {}; for (const item of items) { const key = item.json.category; if (!groups[key]) groups[key] = { count: 0, total: 0, items: [] }; groups[key].count++; groups[key].total += item.json.amount || 0; groups[key].items.push(item.json); } return Object.entries(groups).map(([category, data]) => ({ json: { category, ...data, average: data.total / data.count } }));
Pattern: Pagination Handler
// Use with Loop Over Items or recursive sub-workflow const baseUrl = $env.API_BASE_URL; const results = []; let page = 1; let hasMore = true; while (hasMore) { const response = await this.helpers.httpRequest({ method: 'GET', url: `${baseUrl}/items?page=${page}&per_page=100`, headers: { 'Authorization': `Bearer ${$env.API_TOKEN}` }, }); results.push(...response.data); hasMore = response.data.length === 100; page++; // Safety valve if (page > 50) break; } return results.map(item => ({ json: item }));
Pattern: Rate Limiter
// Add between batch items to respect API limits const RATE_LIMIT_MS = 200; // 5 requests per second const itemIndex = $itemIndex || 0; if (itemIndex > 0) { await new Promise(resolve => setTimeout(resolve, RATE_LIMIT_MS)); } return items;
4.3 Branching Patterns
IF Node — Decision Matrix
branching_patterns: binary_decision: node: "IF" use: "True/false routing" example: "Is order amount > $100?" multi_path: node: "Switch" use: "3+ possible routes" example: "Route by ticket priority (P0/P1/P2/P3)" content_routing: node: "Switch" use: "Route by data content/type" example: "Route by email domain to different CRMs" merge_paths: node: "Merge" mode: "chooseBranch" use: "Rejoin after IF/Switch branches"
Switch Node — Clean Multi-Routing
Switch on: {{ $json.status }} Case "new" → Create record path Case "updated" → Update record path Case "deleted" → Archive record path Default → Log unknown status + alert
4.4 Loop Patterns
Split In Batches — Batch Processing
batch_processing: node: "Split In Batches" batch_size: 10 use_cases: - "API with rate limits (process 10, wait, next 10)" - "Database bulk inserts (batch of 100)" - "Email sending (batch of 50 to avoid spam filters)" pattern: 1: "Split In Batches (size: 10)" 2: "→ Process batch (HTTP Request / DB insert)" 3: "→ Wait (1 second between batches)" 4: "→ Loop back to Split In Batches"
Loop Over Items — Per-Item Processing
per_item_loop: node: "Loop Over Items" use_cases: - "Each item needs different API call" - "Sequential processing required (order matters)" - "Per-item error handling needed" anti_pattern: "Don't loop when batch/bulk API exists"
Phase 5: Error Handling Architecture
5.1 Error Handling Strategy
Every production workflow MUST have:
┌─────────────────────────────────────────────────┐ │ MAIN WORKFLOW │ │ │ │ Trigger → Process → Output │ │ │ │ │ └─── Error Trigger ──→ Error Handler ──→ │ │ │ │ │ ├── Log error details │ │ ├── Send alert (Slack/email) │ │ ├── Retry logic (if applicable) │ │ └── Dead letter queue (if needed) │ └─────────────────────────────────────────────────┘
5.2 Error Trigger Template
error_workflow: nodes: - name: "Error Trigger" type: "n8n-nodes-base.errorTrigger" - name: "Extract Error Info" type: "n8n-nodes-base.code" code: | const error = $json; return [{ json: { workflow_name: error.workflow?.name || 'Unknown', workflow_id: error.workflow?.id, execution_id: error.execution?.id, error_message: error.execution?.error?.message || 'No message', error_node: error.execution?.error?.node || 'Unknown node', timestamp: new Date().toISOString(), retry_url: `${$env.N8N_BASE_URL}/workflow/${error.workflow?.id}/executions/${error.execution?.id}`, severity: classifySeverity(error), } }]; function classifySeverity(error) { const msg = error.execution?.error?.message || ''; if (msg.includes('timeout') || msg.includes('ECONNREFUSED')) return 'WARNING'; if (msg.includes('401') || msg.includes('403')) return 'CRITICAL'; if (msg.includes('429')) return 'INFO'; // Rate limit, will retry return 'ERROR'; } - name: "Alert via Slack" type: "n8n-nodes-base.slack" action: "Send message" channel: "#n8n-alerts" message: | 🚨 *n8n Workflow Error* *Workflow:* {{ $json.workflow_name }} *Node:* {{ $json.error_node }} *Severity:* {{ $json.severity }} *Error:* {{ $json.error_message }} *Time:* {{ $json.timestamp }} <{{ $json.retry_url }}|View Execution>
5.3 Retry Patterns
retry_strategies: http_retry: description: "Built-in HTTP Request retry" config: max_retries: 3 retry_interval: 1000 # ms retry_on_timeout: true retry_on_status: [429, 500, 502, 503, 504] custom_retry_with_backoff: description: "Code node implementing exponential backoff" pattern: | const maxRetries = 3; const attempt = $json._retryAttempt || 0; if (attempt >= maxRetries) { // Send to dead letter queue return [{ json: { ...item.json, _failed: true, _attempts: attempt } }]; } const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s await new Promise(r => setTimeout(r, delay)); return [{ json: { ...item.json, _retryAttempt: attempt + 1 } }]; circuit_breaker: description: "Stop calling failing service" pattern: | // Use n8n static data as circuit state const staticData = $getWorkflowStaticData('global'); const failures = staticData.failures || 0; const lastFailure = staticData.lastFailure || 0; const THRESHOLD = 5; const COOLDOWN_MS = 300000; // 5 minutes if (failures >= THRESHOLD && Date.now() - lastFailure < COOLDOWN_MS) { // Circuit OPEN — skip API call, use fallback return [{ json: { _circuitOpen: true, _fallback: true } }]; }
5.4 Dead Letter Queue Pattern
dead_letter_queue: purpose: "Store failed items for manual review/reprocessing" implementation: - node: "Google Sheets / Airtable / Database" columns: [workflow, execution_id, item_data, error, timestamp, status] - status_values: [pending, retrying, resolved, abandoned] - review: "Check DLQ daily, resolve or abandon stale items"
Phase 6: Data Transformation & Integration Patterns
6.1 Common Integration Patterns
Pattern: CRM Sync (Bidirectional)
crm_sync: inbound: trigger: "Webhook from CRM (new/updated contact)" steps: 1: "Validate payload schema" 2: "Map fields to internal format" 3: "Deduplicate (check by email)" 4: "Upsert to database" 5: "Trigger downstream workflows" outbound: trigger: "Database change or schedule" steps: 1: "Query changed records since last sync" 2: "Map internal format to CRM fields" 3: "Batch upsert to CRM API" 4: "Store sync timestamp" 5: "Log sync results" conflict_resolution: strategy: "Last write wins with audit trail" timestamp_field: "updated_at" audit: "Log both versions before overwrite"
Pattern: Email Processing Pipeline
email_pipeline: trigger: "IMAP Email (polling every 5 min)" steps: 1: "Read new emails" 2: "Classify intent (AI/rules)" 3: "Extract structured data (sender, subject, key fields)" 4: "Route by classification" 5_support: "Create ticket in helpdesk" 5_sales: "Add to CRM as lead" 5_billing: "Forward to accounting" 5_spam: "Archive and skip" 6: "Send auto-acknowledgment" 7: "Log to audit trail"
Pattern: Multi-Step Approval
approval_workflow: trigger: "Form/webhook (new request)" steps: 1: "Create request record (status: pending)" 2: "Send Slack message with Approve/Reject buttons" 3: "Wait for webhook callback (button click)" 4_approved: "Execute action + notify requester" 4_rejected: "Notify requester with reason" 5: "Update request status" 6: "Log to audit trail" timeout: "48 hours → auto-escalate to manager"
Pattern: AI-Powered Processing
ai_pipeline: trigger: "Webhook or schedule" steps: 1: "Receive raw data (text, email, document)" 2: "Pre-process (clean, chunk if needed)" 3: "Send to LLM (OpenAI/Anthropic/local)" 4: "Parse structured response" 5: "Validate LLM output (check required fields, format)" 6: "Route based on classification" 7: "Human review if confidence < threshold" 8: "Store result + feedback for improvement" llm_node_config: model: "gpt-4o-mini for classification, gpt-4o for generation" temperature: 0 for extraction/classification, 0.7 for generation max_tokens: "Set explicit limit to control cost" system_prompt: "Be specific. Include output format. Add examples." cost_control: - "Use cheapest model that achieves accuracy target" - "Cache repeated queries (check before calling LLM)" - "Batch similar items into single LLM call when possible" - "Track cost per execution in workflow metrics"
6.2 Data Mapping Cheat Sheet
// Common field mapping patterns in Code nodes // Dates — always normalize to ISO const isoDate = new Date(data.date_field).toISOString(); const dateOnly = new Date(data.date_field).toISOString().split('T')[0]; // Names const fullName = `${data.firstName || ''} ${data.lastName || ''}`.trim(); const [firstName, ...rest] = data.fullName.split(' '); const lastName = rest.join(' '); // Currency — always store as cents/minor units const amountCents = Math.round(parseFloat(data.amount) * 100); const amountDisplay = (data.amount_cents / 100).toFixed(2); // Phone — normalize const phone = data.phone?.replace(/\D/g, ''); // Email — normalize const email = data.email?.toLowerCase().trim(); // Null safety const value = data.field ?? 'default'; const nested = data.parent?.child?.value ?? null; // Array handling const tags = Array.isArray(data.tags) ? data.tags : [data.tags].filter(Boolean); const csvToArray = data.csv_field?.split(',').map(s => s.trim()) || []; const arrayToCsv = data.array_field?.join(', ') || '';
Phase 7: Sub-Workflow Architecture
7.1 When to Extract Sub-Workflows
| Signal | Action |
|---|---|
| Same logic in 3+ workflows | Extract to sub-workflow |
| Workflow > 30 nodes | Decompose into main + sub-workflows |
| Different error handling needed | Separate error domains |
| Team wants to reuse a process | Make it a callable sub-workflow |
| Need to test a section independently | Extract and test separately |
7.2 Sub-Workflow Design Rules
sub_workflow_rules: naming: "[SUB] Description — Input/Output" interface: - "Define clear input schema (what data it expects)" - "Define clear output schema (what it returns)" - "Document side effects (external API calls, DB writes)" input_validation: - "First node: validate required fields exist" - "Return clear error if validation fails" output_contract: - "Always return consistent structure" - "Include success/failure status" - "Include execution metadata (duration, items processed)" example_output: success: true items_processed: 42 errors: [] duration_ms: 1234
7.3 Orchestrator Pattern
[PROCESS] Order Fulfillment — Orchestrator (v1.0) │ ├── [SUB] Validate Order — Input Check │ └── Returns: { valid: true/false, errors: [] } │ ├── [SUB] Check Inventory — Stock Verification │ └── Returns: { inStock: true/false, items: [] } │ ├── [SUB] Process Payment — Stripe Charge │ └── Returns: { charged: true/false, chargeId: "" } │ ├── [SUB] Create Shipment — Shipping Label │ └── Returns: { trackingNumber: "", labelUrl: "" } │ └── [SUB] Send Confirmations — Email + SMS └── Returns: { emailSent: true, smsSent: true } Orchestrator handles: - Sequential execution order - Rollback on failure (reverse previous steps) - Status tracking (store state between steps) - Timeout management (overall SLA)
Phase 8: n8n Static Data & State Management
8.1 Static Data Patterns
// Global static data (persists across executions) const staticData = $getWorkflowStaticData('global'); // Pattern: Last processed ID (for incremental sync) const lastId = staticData.lastProcessedId || 0; // ... process items where id > lastId ... staticData.lastProcessedId = maxProcessedId; // Pattern: Rate limit tracking staticData.apiCalls = (staticData.apiCalls || 0) + 1; staticData.windowStart = staticData.windowStart || Date.now(); if (Date.now() - staticData.windowStart > 3600000) { staticData.apiCalls = 1; staticData.windowStart = Date.now(); } // Pattern: Deduplication cache const cache = staticData.processedIds || {}; const newItems = items.filter(item => { if (cache[item.json.id]) return false; cache[item.json.id] = Date.now(); return true; }); // Prune cache entries older than 24h for (const [id, ts] of Object.entries(cache)) { if (Date.now() - ts > 86400000) delete cache[id]; } staticData.processedIds = cache;
8.2 External State (When Static Data Isn't Enough)
state_management: static_data: capacity: "~1MB per workflow" persistence: "Survives restarts" use_for: "Counters, last-processed IDs, small caches" dont_use_for: "Large datasets, shared state between workflows" database: use_for: "Shared state, large datasets, audit trails" options: ["Postgres", "SQLite", "Redis"] pattern: "Read state → Process → Write state (in same execution)" google_sheets: use_for: "Human-readable state, manual override capability" pattern: "Config sheet = feature flags, processing rules" redis: use_for: "High-speed counters, distributed locks, pub/sub" pattern: "Rate limiting, dedup across multiple workflows"
Phase 9: Security & Credentials
9.1 Credential Management Rules
credential_rules: DO: - "Use n8n Credential Store for ALL secrets" - "Use environment variables for config (URLs, feature flags)" - "Rotate API keys on schedule (quarterly minimum)" - "Use OAuth2 over API keys when available" - "Limit credential scope (least privilege)" - "Audit credential usage quarterly" NEVER: - "Hardcode secrets in Code nodes" - "Put API keys in webhook URLs" - "Log full request/response bodies (may contain secrets)" - "Share credentials between dev/staging/prod" - "Use personal API keys for production workflows"
9.2 Webhook Security Implementation
// HMAC signature verification (Stripe, GitHub, etc.) const crypto = require('crypto'); const signature = $request.headers['x-hub-signature-256']; const secret = $env.WEBHOOK_SECRET; const body = JSON.stringify($json); const expected = 'sha256=' + crypto .createHmac('sha256', secret) .update(body) .digest('hex'); if (signature !== expected) { // Return 401 via Respond to Webhook node return [{ json: { error: 'Invalid signature', _reject: true } }]; } return items;
9.3 Data Privacy Checklist
privacy_checklist: pii_handling: - "Identify PII fields in every workflow (email, name, phone, IP)" - "Minimize PII: only pass fields actually needed" - "Mask PII in logs (email → j***@example.com)" - "Set execution data pruning (don't keep PII forever)" execution_data: - "Save execution data: Only on error (production)" - "Save execution data: Always (development only)" - "Prune executions older than 30 days" - "Don't store full response bodies from external APIs" compliance: - "GDPR: Can you delete a user's data from all workflow states?" - "Audit trail: Can you prove what data was processed and when?" - "Data residency: Are API calls going to correct region?"
Phase 10: Performance & Optimization
10.1 Performance Optimization Priority Stack
| Priority | Technique | Impact |
|---|---|---|
| 1 | Batch API calls (bulk endpoints) | 10-100x fewer API calls |
| 2 | Parallel execution (split + merge) | 2-5x faster processing |
| 3 | Filter early (drop items before heavy processing) | Reduces compute |
| 4 | Cache repeated lookups (static data) | Fewer API calls |
| 5 | Minimize data passed between nodes | Reduces memory |
| 6 | Use sub-workflows for heavy sections | Better resource management |
| 7 | Schedule during off-peak hours | Reduces contention |
| 8 | Optimize Code node algorithms | Reduces CPU time |
10.2 Batch Processing Template
batch_template: step_1: "Collect all items (trigger / query)" step_2: "Split In Batches (size based on API limit)" step_3: "Process batch (use bulk/batch API endpoint)" step_4: "Wait node (respect rate limit between batches)" step_5: "Aggregate results" step_6: "Report summary" sizing_guide: stripe_api: 100 # Stripe list limit hubspot_api: 100 # HubSpot batch limit postgres_insert: 1000 # Comfortable batch insert email_send: 50 # Avoid spam filters slack_api: 20 # Rate limit friendly openai_api: 1 # Usually per-request
10.3 Memory Optimization
// Anti-pattern: Passing full objects through entire workflow // ❌ BAD return items; // Each item has 50 fields, only need 3 // ✅ GOOD: Extract only needed fields early return items.map(item => ({ json: { id: item.json.id, email: item.json.email, status: item.json.status, } })); // Anti-pattern: Accumulating in memory // ❌ BAD: Loading 100K records into Code node // ✅ GOOD: Use database queries with LIMIT/OFFSET, process in batches
Phase 11: Testing & Debugging
11.1 Testing Methodology
testing_levels: unit_test: what: "Individual nodes with sample data" how: "Pin test data on trigger node, execute single node" when: "Building each node" integration_test: what: "Full workflow with test data" how: "Manual trigger with test payload, verify all outputs" when: "Before activating" smoke_test: what: "Quick check that workflow still works" how: "Trigger with minimal valid payload, check success" when: "After any change, weekly health check" load_test: what: "Performance under volume" how: "Send 100+ items through, measure time and errors" when: "Before scaling to production volume"
11.2 Debugging Checklist
debugging_steps: 1_reproduce: - "Find the failed execution in execution list" - "Check which node failed (red highlight)" - "Read the error message carefully" 2_inspect: - "Check input data to failed node (is it what you expected?)" - "Check node configuration (expressions resolving correctly?)" - "Check credentials (still valid? permissions?)" 3_common_fixes: expression_error: "Wrap in try/catch or use ?? for null safety" timeout: "Increase timeout, check if API is actually up" auth_error: "Re-authenticate credential, check token expiry" rate_limit: "Add Wait node, reduce batch size" json_parse: "Check response is actually JSON (not HTML error page)" missing_field: "Data shape changed — update field mapping" 4_isolate: - "Pin input data on the failing node" - "Execute just that node" - "If it works in isolation, problem is upstream data"
11.3 Monitoring Dashboard
monitoring: metrics_to_track: - name: "Execution success rate" target: ">99%" alert_threshold: "<95%" - name: "Average execution time" target: "Under SLA" alert_threshold: ">2x normal" - name: "Items processed per run" target: "Expected range" alert_threshold: "0 items (nothing processed) or >10x normal" - name: "Error frequency by type" target: "Decreasing trend" alert_threshold: "Same error >3 times in 24h" - name: "API quota usage" target: "<80% of limit" alert_threshold: ">90% of limit" health_check_workflow: schedule: "Every 30 minutes" checks: - "Can reach external APIs? (HEAD request)" - "Database connection alive?" - "Disk space for execution data?" - "Any workflows stuck in 'running' >1 hour?" alert_channel: "Slack #n8n-alerts"
Phase 12: Production Deployment & Maintenance
12.1 Deployment Checklist
pre_activation: workflow: - [ ] "Workflow description filled in (purpose, owner, version)" - [ ] "All nodes named descriptively (not 'HTTP Request 1')" - [ ] "Sticky notes explain complex sections" - [ ] "Error trigger workflow connected" - [ ] "Test data pins removed" - [ ] "No hardcoded secrets or URLs" - [ ] "Environment variables used for config" testing: - [ ] "Happy path tested with real-shape data" - [ ] "Error paths tested (bad data, API failure, timeout)" - [ ] "Edge cases tested (empty array, null fields, special chars)" - [ ] "Load tested at expected volume" operations: - [ ] "Execution data retention configured" - [ ] "Alert channel receiving error notifications" - [ ] "Runbook written for common failure scenarios" - [ ] "Owner documented (who to page at 3 AM)"
12.2 Workflow Versioning Strategy
versioning: format: "vMAJOR.MINOR (in workflow name + description)" major_bump: "Breaking changes — new trigger, changed output format" minor_bump: "Improvements — new fields, better error handling" changelog_location: "Workflow description field" changelog_format: | ## v2.1 (2024-03-15) - Added retry logic for Stripe API calls - Fixed timezone conversion for EU customers ## v2.0 (2024-02-01) - Migrated from REST to GraphQL API - Breaking: output format changed backup_strategy: - "Export workflow JSON before major changes" - "Store in git repo: workflows/[category]/[name].json" - "Tag with version: git tag workflow-name-v2.1"
12.3 Maintenance Schedule
maintenance: daily: - "Check error notifications channel" - "Review failed executions (>0 = investigate)" weekly: - "Review execution volume trends" - "Check API quota usage" - "Process dead letter queue items" monthly: - "Review and prune old executions" - "Audit credential usage" - "Update workflow documentation" - "Review performance (any slow workflows?)" quarterly: - "Rotate API keys and tokens" - "Review all active workflows — still needed?" - "Update n8n version (test in staging first)" - "Archive unused workflows"
Phase 13: Complete Workflow Templates
13.1 Template: Lead Capture → CRM → Notification
name: "[INGEST] Web Lead → HubSpot + Slack Alert (v1.0)" trigger: Webhook (form submission) nodes: 1_webhook: type: Webhook path: "/lead-capture" method: POST response: "Respond to Webhook (immediate 200)" 2_validate: type: IF condition: "email exists AND email contains @" false_path: "→ Log invalid submission → End" 3_enrich: type: HTTP Request url: "Clearbit/Apollo enrichment API" fallback: "Continue without enrichment" 4_dedupe: type: Code logic: "Check HubSpot for existing contact by email" 5_create_or_update: type: HubSpot action: "Create/update contact" fields: [email, name, company, source, enrichment_data] 6_notify: type: Slack channel: "#sales-leads" message: "🎯 New lead: {name} from {company} — {source}" 7_auto_reply: type: Email (SMTP) to: "{{ $json.email }}" template: "Thanks for your interest, we'll be in touch within 24h"
13.2 Template: Scheduled Report Generator
name: "[EXPORT] Weekly Sales Report — Email (v1.0)" trigger: Schedule (Monday 8 AM) nodes: 1_schedule: type: Schedule Trigger cron: "0 8 * * 1" 2_query_data: type: Postgres query: | SELECT date_trunc('day', created_at) as day, COUNT(*) as deals, SUM(amount) as revenue, AVG(amount) as avg_deal FROM deals WHERE created_at >= NOW() - INTERVAL '7 days' GROUP BY 1 ORDER BY 1 3_calculate_summary: type: Code logic: "Calculate totals, WoW change, top deals" 4_format_report: type: Code logic: "Generate HTML email body with tables and charts links" 5_send_email: type: Email (SMTP) to: "sales-team@company.com" subject: "📊 Weekly Sales Report — W{{ weekNumber }}" html: "{{ $json.reportHtml }}"
13.3 Template: AI Support Ticket Classifier
name: "[AI] Support Ticket — Classify + Route (v1.0)" trigger: Webhook (helpdesk new ticket) nodes: 1_webhook: type: Webhook 2_classify: type: OpenAI Chat model: "gpt-4o-mini" system: | Classify this support ticket. Return JSON: { "category": "bug|feature_request|billing|how_to|account|other", "priority": "P0|P1|P2|P3", "sentiment": "angry|frustrated|neutral|positive", "summary": "one sentence summary", "suggested_response": "draft response" } temperature: 0 3_parse: type: Code logic: "JSON.parse response, validate required fields" 4_route: type: Switch on: "{{ $json.category }}" cases: bug: "→ Assign to engineering team" billing: "→ Assign to finance team" feature_request: "→ Add to product backlog" default: "→ Assign to general support" 5_priority_alert: type: IF condition: "priority == P0" true_path: "→ Slack alert to on-call" 6_update_ticket: type: HTTP Request action: "Update ticket with classification tags" 7_auto_respond: type: IF condition: "category == how_to AND confidence > 0.9" true_path: "→ Send suggested_response as reply" false_path: "→ Save draft for human review"
13.4 Template: Multi-System Data Sync
name: "[SYNC] Stripe → Postgres → HubSpot — Payments (v1.0)" trigger: Webhook (Stripe payment_intent.succeeded) nodes: 1_webhook: type: Webhook security: "HMAC signature verification" 2_verify_signature: type: Code logic: "Stripe HMAC verification" 3_extract_payment: type: Code logic: "Extract customer, amount, metadata from Stripe event" 4_upsert_db: type: Postgres action: "INSERT ON CONFLICT UPDATE" table: "payments" 5_update_crm: type: HubSpot action: "Update deal stage to 'Closed Won'" 6_notify_team: type: Slack message: "💰 Payment received: ${{ amount }} from {{ customer }}" 7_send_receipt: type: Email (SMTP) to: "{{ customer_email }}" template: "Payment confirmation"
Phase 14: Advanced Patterns
14.1 Fan-Out / Fan-In (Parallel Processing)
pattern: "Split work across parallel paths, merge results" use_case: "Enrich contacts from 3 APIs simultaneously" implementation: 1: "Trigger with batch of contacts" 2: "Split into 3 parallel HTTP Request nodes" 3: "Each calls different API (Clearbit, Apollo, LinkedIn)" 4: "Merge node (Combine mode) joins results" 5: "Code node merges enrichment data per contact" benefit: "3x faster than sequential API calls" caveat: "All 3 branches must handle their own errors"
14.2 Event-Driven Architecture
pattern: "Workflows trigger other workflows via internal webhooks" implementation: producer: | [PROCESS] Order Created → Process order → HTTP Request to internal webhook: /event/order-created consumers: - "[NOTIFY] Order Confirmation → Email" - "[SYNC] Order → Inventory Update" - "[SYNC] Order → Accounting System" - "[AI] Order → Fraud Detection" benefit: "Loose coupling — add new consumers without changing producer" caveat: "Need to handle consumer failures independently"
14.3 Feature Flag Pattern
pattern: "Control workflow behavior without editing" implementation: config_source: "Google Sheet or database table" columns: [feature_name, enabled, percentage, notes] in_workflow: 1: "Read config at start of workflow" 2: "IF node checks feature flag" 3: "true → new behavior, false → old behavior" examples: - feature: "use_gpt4o_mini" check: "Route to cheaper model when enabled" - feature: "skip_enrichment" check: "Bypass API calls during outage" - feature: "double_check_mode" check: "Add human approval step"
14.4 Queue Pattern (High Volume)
pattern: "Buffer incoming items, process at controlled rate" use_case: "1000 webhook events/minute, API limit 10/minute" implementation: ingestion_workflow: 1: "Webhook receives event" 2: "Write to queue (database table: status=pending)" 3: "Return 200 immediately" processing_workflow: 1: "Schedule trigger (every minute)" 2: "Query: SELECT * FROM queue WHERE status='pending' LIMIT 10" 3: "Process batch" 4: "UPDATE status='completed'" 5: "On error: UPDATE status='failed', retry_count++" benefit: "Never lose events, process at sustainable rate"
Phase 15: n8n Instance Management
15.1 Environment Strategy
environments: development: purpose: "Building and testing new workflows" data: "Test/mock data only" execution_saving: "All executions" staging: purpose: "Pre-production validation" data: "Anonymized production-like data" execution_saving: "All executions" production: purpose: "Live workflows" data: "Real data" execution_saving: "Errors only (save disk)" promotion_process: 1: "Build in dev" 2: "Export workflow JSON" 3: "Import to staging, test with realistic data" 4: "Export again (staging may have fixes)" 5: "Import to production" 6: "Activate and monitor first 24h"
15.2 n8n Performance Tuning
tuning: execution_mode: "queue" # For high volume (requires Redis) environment_variables: EXECUTIONS_DATA_SAVE_ON_ERROR: "all" EXECUTIONS_DATA_SAVE_ON_SUCCESS: "none" # Save disk in production EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS: "true" EXECUTIONS_DATA_MAX_AGE: 720 # Hours (30 days) EXECUTIONS_DATA_PRUNE: "true" GENERIC_TIMEZONE: "UTC" # Always UTC internally N8N_CONCURRENCY_PRODUCTION_LIMIT: 20 # Parallel executions scaling: vertical: "More CPU/RAM for the n8n instance" horizontal: "Queue mode + multiple workers" webhook_scaling: "Separate webhook processor from main"
Scoring Rubric: Workflow Quality Assessment
Rate any n8n workflow 0-100 across 8 dimensions:
| Dimension | Weight | 0 (Poor) | 5 (Adequate) | 10 (Excellent) |
|---|---|---|---|---|
| Reliability | 20% | No error handling | Basic error trigger | Full retry + DLQ + alerts |
| Security | 15% | Hardcoded secrets | Credential store | HMAC + validation + audit |
| Performance | 15% | Sequential, no batching | Some batching | Optimized + cached + parallel |
| Maintainability | 15% | No names, no docs | Named nodes | Full docs + versioned + sticky notes |
| Data Quality | 10% | No validation | Basic checks | Schema validation + dedup + transform |
| Observability | 10% | No monitoring | Error alerts | Metrics + logging + health checks |
| Scalability | 10% | Breaks at 100 items | Handles 1K | Batched + queued + horizontal |
| Reusability | 5% | Monolithic | Some sub-workflows | Modular + documented interfaces |
Score:
- 0-30: Prototype — not production ready
- 31-60: Functional — works but fragile
- 61-80: Production — solid with room to improve
- 81-100: Enterprise — resilient, observable, scalable
10 Commandments of n8n Workflow Engineering
- Every production workflow has an error handler — no exceptions
- Never hardcode secrets — credential store or env vars only
- Name every node — "HTTP Request 4" is tech debt
- Filter early, transform late — drop bad data before heavy processing
- Batch everything — one API call for 100 items beats 100 calls for 1
- Test with real-shaped data — mock data hides real bugs
- Version your workflows — in the name and description
- Document the "why" — sticky notes explain decisions, not obvious steps
- Monitor actively — don't discover failures from angry users
- Keep it simple — if you need a diagram to explain it, decompose it
Natural Language Commands
When a user asks you to help with n8n, interpret these commands:
| Command | Action |
|---|---|
| "Build a workflow for [task]" | Design complete workflow using templates above |
| "Review this workflow" | Score against rubric, suggest improvements |
| "Debug [workflow/error]" | Follow debugging checklist |
| "Optimize [workflow]" | Apply performance optimization stack |
| "Add error handling to [workflow]" | Implement error trigger + retry + alert pattern |
| "Create a sub-workflow for [logic]" | Extract with clear interface |
| "Set up monitoring" | Implement health check + alert workflow |
| "Migrate workflow to production" | Follow deployment checklist |
| "Design integration for [A] → [B]" | Select pattern from integration library |
| "Add AI to [workflow]" | Implement AI pipeline pattern |
| "Handle rate limits for [API]" | Implement batching + wait + circuit breaker |
| "Audit my n8n setup" | Run quick health check, score, prioritize fixes |