Skillshub apify-cost-tuning
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/apify-cost-tuning" ~/.claude/skills/comeonoliver-skillshub-apify-cost-tuning && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/apify-cost-tuning/SKILL.mdsource content
Apify Cost Tuning
Overview
Apify charges based on compute units (CU), proxy traffic (GB), and storage. One CU = 1 GB memory running for 1 hour. This skill covers how to analyze, reduce, and monitor costs across all three dimensions.
Pricing Model
Compute Units (CU)
CU = (Memory in GB) x (Duration in hours) Example: 2048 MB (2 GB) running for 30 minutes = 2 x 0.5 = 1 CU
| Plan | CU Price | Included CUs |
|---|---|---|
| Free | N/A | Limited trial |
| Starter | $0.30/CU | Varies by plan |
| Scale | $0.25/CU | Volume discounts |
| Enterprise | Custom | Negotiated |
Proxy Costs
| Proxy Type | Cost | Use Case |
|---|---|---|
| Datacenter | Included in plan | Non-blocking sites |
| Residential | ~$12/GB | Sites that block datacenters |
| Google SERP | ~$3.50/1000 queries | Google search results |
Storage
Named datasets and KV stores persist indefinitely but count against storage quota. Unnamed (default run) storage expires after 7 days.
Instructions
Step 1: Analyze Current Costs
import { ApifyClient } from 'apify-client'; const client = new ApifyClient({ token: process.env.APIFY_TOKEN }); async function analyzeActorCosts(actorId: string, days = 30) { const { items: runs } = await client.actor(actorId).runs().list({ limit: 1000, desc: true, }); const cutoff = new Date(Date.now() - days * 86400_000); const recentRuns = runs.filter(r => new Date(r.startedAt) > cutoff); let totalCu = 0; let totalUsd = 0; let totalDurationSecs = 0; for (const run of recentRuns) { totalCu += run.usage?.ACTOR_COMPUTE_UNITS ?? 0; totalUsd += run.usageTotalUsd ?? 0; totalDurationSecs += run.stats?.runTimeSecs ?? 0; } const avgCuPerRun = recentRuns.length > 0 ? totalCu / recentRuns.length : 0; const avgCostPerRun = recentRuns.length > 0 ? totalUsd / recentRuns.length : 0; console.log(`=== Cost Analysis: ${actorId} (last ${days} days) ===`); console.log(`Runs: ${recentRuns.length}`); console.log(`Total CU: ${totalCu.toFixed(4)}`); console.log(`Total cost: $${totalUsd.toFixed(4)}`); console.log(`Avg CU/run: ${avgCuPerRun.toFixed(4)}`); console.log(`Avg cost/run: $${avgCostPerRun.toFixed(4)}`); console.log(`Total duration: ${(totalDurationSecs / 3600).toFixed(2)} hours`); // Find the most expensive run const mostExpensive = recentRuns.reduce( (max, r) => ((r.usageTotalUsd ?? 0) > (max.usageTotalUsd ?? 0) ? r : max), recentRuns[0], ); if (mostExpensive) { console.log(`Most expensive: $${mostExpensive.usageTotalUsd?.toFixed(4)} (${mostExpensive.id})`); } return { totalCu, totalUsd, avgCuPerRun, avgCostPerRun, runs: recentRuns.length }; }
Step 2: Reduce Memory Allocation
Memory is the biggest cost lever. Most CheerioCrawler Actors are over-provisioned.
// Test with progressively lower memory to find the sweet spot for (const memory of [4096, 2048, 1024, 512, 256]) { try { const run = await client.actor('user/actor').call(testInput, { memory, timeout: 600, }); console.log( `${memory}MB: ${run.status} | ` + `${run.stats?.runTimeSecs}s | ` + `${run.usage?.ACTOR_COMPUTE_UNITS?.toFixed(4)} CU | ` + `$${run.usageTotalUsd?.toFixed(4)}` ); if (run.status !== 'SUCCEEDED') break; } catch (error) { console.log(`${memory}MB: FAILED — ${(error as Error).message}`); break; } }
Typical memory sweet spots:
| Actor Type | Start At | Sweet Spot |
|---|---|---|
| CheerioCrawler (simple) | 256 MB | 256-512 MB |
| CheerioCrawler (complex) | 512 MB | 512-1024 MB |
| PlaywrightCrawler | 2048 MB | 2048-4096 MB |
| Data processing | 1024 MB | 1024-2048 MB |
Step 3: Optimize Crawl Duration
Faster crawls = fewer CUs consumed:
const crawler = new CheerioCrawler({ // Higher concurrency = faster completion maxConcurrency: 30, // Don't wait too long on slow pages requestHandlerTimeoutSecs: 20, // Stop early when you have enough data maxRequestsPerCrawl: 1000, // Avoid unnecessary retries maxRequestRetries: 2, // Default: 3 requestHandler: async ({ request, $, enqueueLinks }) => { // Only extract what you need await Actor.pushData({ url: request.url, title: $('title').text().trim(), // Don't scrape entire page body if you don't need it }); // Only enqueue relevant links (not every link on the page) await enqueueLinks({ selector: 'a.product-link', // Specific selector, not 'a' strategy: 'same-domain', }); }, });
Step 4: Minimize Proxy Costs
// Strategy 1: Use datacenter proxy first (free with plan) const dcProxy = await Actor.createProxyConfiguration({ groups: ['BUYPROXIES94952'], }); // Strategy 2: Only use residential proxy when needed // Don't waste residential bandwidth on non-blocking sites // Strategy 3: Minimize data transfer through residential proxy const crawler = new PlaywrightCrawler({ proxyConfiguration: resProxy, preNavigationHooks: [ async ({ page }) => { // Block images, fonts, CSS (saves residential proxy GB) await page.route('**/*.{png,jpg,jpeg,gif,svg,webp,ico,woff,woff2,ttf,css}', route => route.abort() ); }, ], }); // Strategy 4: Session stickiness (reduces new proxy connections) const crawler = new CheerioCrawler({ proxyConfiguration: resProxy, useSessionPool: true, sessionPoolOptions: { sessionOptions: { maxUsageCount: 100, // More reuse = fewer new connections }, }, });
Step 5: Cost Guard for Runaway Actors
async function runWithBudget( actorId: string, input: Record<string, unknown>, maxCostUsd: number, ) { const run = await client.actor(actorId).start(input, { memory: 512, timeout: 3600, }); // Poll every 30 seconds const interval = setInterval(async () => { try { const status = await client.run(run.id).get(); const cost = status.usageTotalUsd ?? 0; if (cost > maxCostUsd) { console.error(`Budget exceeded: $${cost.toFixed(4)} > $${maxCostUsd}. Aborting.`); await client.run(run.id).abort(); clearInterval(interval); } } catch { // Ignore polling errors } }, 30_000); const finished = await client.run(run.id).waitForFinish(); clearInterval(interval); return finished; } // Usage: max $0.50 per run const run = await runWithBudget('user/scraper', input, 0.50);
Step 6: Monitor Monthly Usage
async function monthlyUsageReport() { // Get all Actors const { items: actors } = await client.actors().list(); let grandTotalUsd = 0; const report: { actor: string; runs: number; cost: number }[] = []; for (const actor of actors) { const { items: runs } = await client.actor(actor.id).runs().list({ limit: 1000, desc: true, }); const thisMonth = new Date(); thisMonth.setDate(1); thisMonth.setHours(0, 0, 0, 0); const monthlyRuns = runs.filter(r => new Date(r.startedAt) >= thisMonth); const monthlyCost = monthlyRuns.reduce( (sum, r) => sum + (r.usageTotalUsd ?? 0), 0, ); if (monthlyRuns.length > 0) { report.push({ actor: actor.name, runs: monthlyRuns.length, cost: monthlyCost, }); grandTotalUsd += monthlyCost; } } // Sort by cost descending report.sort((a, b) => b.cost - a.cost); console.log('\n=== Monthly Cost Report ==='); console.log(`${'Actor'.padEnd(30)} | ${'Runs'.padEnd(6)} | Cost`); console.log('-'.repeat(55)); for (const r of report) { console.log(`${r.actor.padEnd(30)} | ${String(r.runs).padEnd(6)} | $${r.cost.toFixed(4)}`); } console.log('-'.repeat(55)); console.log(`${'TOTAL'.padEnd(30)} | ${' '.padEnd(6)} | $${grandTotalUsd.toFixed(4)}`); }
Cost Optimization Checklist
- Memory profiled (start low: 256-512MB for Cheerio)
-
set to prevent runaway crawlsmaxRequestsPerCrawl - Datacenter proxy used when possible (free with plan)
- Residential proxy: images/CSS/fonts blocked to save bandwidth
-
tuned (higher = faster = fewer CUs)maxConcurrency - Scheduled runs have appropriate frequency (don't over-scrape)
- Cost guard implemented for expensive runs
- Monthly usage reviewed
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Unexpected cost spike | No | Always set an upper bound |
| High residential proxy cost | Scraping images/fonts | Block non-essential resources |
| Over-provisioned memory | Default 1024MB | Profile and reduce to minimum |
| Too many scheduled runs | Aggressive cron | Reduce frequency if data freshness allows |
Resources
Next Steps
For architecture patterns, see
apify-reference-architecture.