Some_claude_skills background-job-orchestrator
Expert in background job processing with Bull/BullMQ (Redis), Celery, and cloud queues. Implements retries, scheduling, priority queues, and worker management. Use for async task processing,
git clone https://github.com/curiositech/some_claude_skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/curiositech/some_claude_skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/background-job-orchestrator" ~/.claude/skills/curiositech-some-claude-skills-background-job-orchestrator && rm -rf "$T"
.claude/skills/background-job-orchestrator/SKILL.mdBackground Job Orchestrator
Expert in designing and implementing production-grade background job systems that handle long-running tasks without blocking API responses.
When to Use
✅ Use for:
- Long-running tasks (email sends, report generation, image processing)
- Batch operations (bulk imports, exports, data migrations)
- Scheduled tasks (daily digests, cleanup jobs, recurring reports)
- Tasks requiring retry logic (external API calls, flaky operations)
- Priority-based processing (premium users first, critical alerts)
- Rate-limited operations (API quotas, third-party service limits)
❌ NOT for:
- Real-time bidirectional communication (use WebSockets)
- Sub-second latency requirements (use in-memory caching)
- Simple delays (setTimeout is fine for <5 seconds)
- Synchronous API responses (keep logic in request handler)
Quick Decision Tree
Does this task: ├── Take >5 seconds? → Background job ├── Need to retry on failure? → Background job ├── Run on a schedule? → Background job (cron pattern) ├── Block user interaction? → Background job ├── Process in batches? → Background job └── Return immediately? → Keep synchronous
Technology Selection
Node.js: BullMQ (Recommended 2024+)
When to use:
- TypeScript project
- Redis already in stack
- Need advanced features (rate limiting, priorities, repeatable jobs)
Why BullMQ over Bull:
- Bull (v3) → BullMQ (v4+): Complete rewrite in TypeScript
- Better Redis connection handling
- Improved concurrency and performance
- Active maintenance (Bull is in maintenance mode)
Python: Celery
When to use:
- Python/Django project
- Need distributed task execution
- Complex workflows (chains, groups, chords)
Alternatives:
- RQ (Redis Queue): Simpler, fewer features
- Dramatiq: Modern, less ecosystem
- Huey: Lightweight, good for small projects
Cloud-Native: AWS SQS, Google Cloud Tasks
When to use:
- Serverless architecture
- Don't want to manage Redis/RabbitMQ
- Need guaranteed delivery and dead-letter queues
Common Anti-Patterns
Anti-Pattern 1: No Dead Letter Queue
Novice thinking: "Retry 3 times, then fail silently"
Problem: Failed jobs disappear with no visibility or recovery path.
Correct approach:
// BullMQ with dead letter queue const queue = new Queue('email-queue', { connection: redis, defaultJobOptions: { attempts: 3, backoff: { type: 'exponential', delay: 2000 }, removeOnComplete: 100, // Keep last 100 successful removeOnFail: false // Keep all failed for inspection } }); // Monitor failed jobs const failedJobs = await queue.getFailed();
Timeline:
- Pre-2020: Retry and forget
- 2020+: Dead letter queues standard
- 2024+: Observability for job failures required
Anti-Pattern 2: Synchronous Job Processing
Symptom: API endpoint waits for job completion
Problem:
// ❌ WRONG - Blocks API response app.post('/send-email', async (req, res) => { await sendEmail(req.body.to, req.body.subject); res.json({ success: true }); });
Why wrong: Timeout, poor UX, wastes server resources
Correct approach:
// ✅ RIGHT - Queue and return immediately app.post('/send-email', async (req, res) => { const job = await emailQueue.add('send', { to: req.body.to, subject: req.body.subject }); res.json({ success: true, jobId: job.id, status: 'queued' }); }); // Separate worker processes the job worker.process('send', async (job) => { await sendEmail(job.data.to, job.data.subject); });
Anti-Pattern 3: No Idempotency
Problem: Job runs twice → duplicate charges, double emails
Why it happens:
- Redis connection drops mid-processing
- Worker crashes before job completion
- Job timeout triggers retry while still running
Correct approach:
// ✅ Idempotent job with deduplication key await queue.add('charge-payment', { userId: 123, amount: 50.00 }, { jobId: `payment-${orderId}`, // Prevents duplicates attempts: 3 }); // In worker: Check if already processed worker.process('charge-payment', async (job) => { const { userId, amount } = job.data; // Check idempotency const existing = await db.payments.findOne({ jobId: job.id }); if (existing) { return existing; // Already processed } // Process payment const result = await stripe.charges.create({...}); // Store idempotency record await db.payments.create({ jobId: job.id, result }); return result; });
Anti-Pattern 4: No Rate Limiting
Problem: Overwhelm third-party APIs or exhaust quotas
Symptom: "Rate limit exceeded" errors from Sendgrid, Stripe, etc.
Correct approach:
// BullMQ rate limiting const queue = new Queue('api-calls', { limiter: { max: 100, // Max 100 jobs duration: 60000 // Per 60 seconds } }); // Or: Priority-based rate limits await queue.add('send-email', data, { priority: user.isPremium ? 1 : 10, rateLimiter: { max: user.isPremium ? 1000 : 100, duration: 3600000 // Per hour } });
Anti-Pattern 5: Forgetting Worker Scaling
Problem: Single worker can't keep up with queue depth
Symptom: Queue backs up, jobs delayed hours/days
Correct approach:
// Horizontal scaling with multiple workers const worker = new Worker('email-queue', async (job) => { await processEmail(job.data); }, { connection: redis, concurrency: 5 // Process 5 jobs concurrently per worker }); // Run multiple worker processes (PM2, Kubernetes, etc.) // Each worker processes concurrency * num_workers jobs
Monitoring:
// Set up alerts for queue depth setInterval(async () => { const waiting = await queue.getWaitingCount(); if (waiting > 1000) { alert('Queue depth exceeds 1000, scale workers!'); } }, 60000);
Implementation Patterns
Pattern 1: Email Campaigns
// Queue setup const emailQueue = new Queue('email-campaign', { connection: redis }); // Enqueue batch async function sendCampaign(userIds: number[], template: string) { const jobs = userIds.map(userId => ({ name: 'send', data: { userId, template }, opts: { attempts: 3, backoff: { type: 'exponential', delay: 5000 } } })); await emailQueue.addBulk(jobs); } // Worker with retry logic const worker = new Worker('email-campaign', async (job) => { const { userId, template } = job.data; const user = await db.users.findById(userId); const email = renderTemplate(template, user); try { await sendgrid.send({ to: user.email, subject: email.subject, html: email.body }); } catch (error) { if (error.code === 'ECONNREFUSED') { throw error; // Retry } // Invalid email, don't retry console.error(`Invalid email for user ${userId}`); } }, { connection: redis, concurrency: 10 });
Pattern 2: Scheduled Reports
// Daily report at 9 AM await queue.add('daily-report', { type: 'sales', recipients: ['admin@company.com'] }, { repeat: { pattern: '0 9 * * *', // Cron syntax tz: 'America/New_York' } }); // Worker generates and emails report worker.process('daily-report', async (job) => { const { type, recipients } = job.data; const data = await generateReport(type); const pdf = await createPDF(data); await emailQueue.add('send', { to: recipients, subject: `Daily ${type} Report`, attachments: [{ filename: 'report.pdf', content: pdf }] }); });
Pattern 3: Video Transcoding Pipeline
// Multi-stage job with progress tracking await videoQueue.add('transcode', { videoId: 123, formats: ['720p', '1080p', '4k'] }, { attempts: 2, timeout: 3600000 // 1 hour timeout }); worker.process('transcode', async (job) => { const { videoId, formats } = job.data; for (let i = 0; i < formats.length; i++) { const format = formats[i]; // Update progress await job.updateProgress((i / formats.length) * 100); // Transcode await ffmpeg.transcode(videoId, format); } await job.updateProgress(100); }); // Client polls for progress app.get('/videos/:id/status', async (req, res) => { const job = await queue.getJob(req.params.jobId); res.json({ state: await job.getState(), progress: job.progress }); });
Monitoring & Observability
Essential Metrics
// Queue health dashboard async function getQueueMetrics() { const [waiting, active, completed, failed, delayed] = await Promise.all([ queue.getWaitingCount(), queue.getActiveCount(), queue.getCompletedCount(), queue.getFailedCount(), queue.getDelayedCount() ]); return { waiting, // Jobs waiting to be processed active, // Jobs currently processing completed, // Successfully completed failed, // Failed after retries delayed, // Scheduled for future health: waiting < 1000 && failed < 100 ? 'healthy' : 'degraded' }; }
BullMQ Board (UI)
// Development: Monitor jobs visually import { createBullBoard } from '@bull-board/api'; import { BullMQAdapter } from '@bull-board/api/bullMQAdapter'; import { ExpressAdapter } from '@bull-board/express'; const serverAdapter = new ExpressAdapter(); createBullBoard({ queues: [ new BullMQAdapter(emailQueue), new BullMQAdapter(videoQueue) ], serverAdapter }); app.use('/admin/queues', serverAdapter.getRouter()); // Visit http://localhost:3000/admin/queues
Production Checklist
□ Dead letter queue configured □ Retry strategy with exponential backoff □ Job timeout limits set □ Rate limiting for third-party APIs □ Idempotency keys for critical operations □ Worker concurrency tuned (CPU cores * 2) □ Horizontal scaling configured (multiple workers) □ Queue depth monitoring with alerts □ Failed job inspection workflow □ Job data doesn't contain PII in logs □ Redis persistence enabled (AOF or RDB) □ Graceful shutdown handling (SIGTERM)
When to Use vs Avoid
| Scenario | Use Background Jobs? |
|---|---|
| Send welcome email on signup | ✅ Yes - can take 2-5 seconds |
| Charge credit card | ⚠️ Maybe - depends on payment provider latency |
| Generate PDF report (30 seconds) | ✅ Yes - definitely background |
| Fetch user profile from DB | ❌ No - milliseconds, keep synchronous |
| Process video upload (5 minutes) | ✅ Yes - always background |
| Validate form input | ❌ No - synchronous validation |
| Daily cron job | ✅ Yes - use repeatable jobs |
| Real-time chat message | ❌ No - use WebSockets |
Technology Comparison
| Feature | BullMQ | Celery | AWS SQS |
|---|---|---|---|
| Language | Node.js | Python | Any (HTTP API) |
| Backend | Redis | Redis/RabbitMQ/SQS | Managed |
| Priorities | ✅ | ✅ | ✅ |
| Rate Limiting | ✅ | ❌ | ✅ (via attributes) |
| Repeat/Cron | ✅ | ✅ (celery-beat) | ❌ (use EventBridge) |
| UI Dashboard | Bull Board | Flower | CloudWatch |
| Workflows | ❌ | ✅ (chains, groups) | ❌ |
| Learning Curve | Medium | Medium | Low |
| Cost | Redis hosting | Redis hosting | $0.40/million requests |
References
- Advanced BullMQ patterns and examples/references/bullmq-patterns.md
- Celery chains, groups, and chords/references/celery-workflows.md
- Monitoring, alerting, and debugging/references/job-observability.md
Scripts
- Initialize BullMQ with Redisscripts/setup_bullmq.sh
- Queue metrics dashboardscripts/queue_health_check.ts
- Bulk retry failed jobsscripts/retry_failed_jobs.ts
This skill guides: Background job implementation | Queue architecture | Retry strategies | Worker scaling | Job observability