install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/noartem/skills/laravel-data-chunking-large-datasets" ~/.claude/skills/comeonoliver-skillshub-laravel-data-chunking-large-datasets && rm -rf "$T"
manifest:
skills/noartem/skills/laravel-data-chunking-large-datasets/SKILL.mdsource content
Data Chunking for Large Datasets
Process large datasets efficiently by breaking them into manageable chunks to reduce memory consumption and improve performance.
The Problem: Memory Exhaustion
// BAD: Loading all records into memory $users = User::all(); // Could be millions of records! foreach ($users as $user) { $user->sendNewsletter(); } // BAD: Even with select, still loads everything $emails = User::pluck('email'); // Array of millions of emails foreach ($emails as $email) { Mail::to($email)->send(new Newsletter()); }
Solution: Chunking Methods
1. Basic Chunking with chunk()
chunk()// Process 100 records at a time User::chunk(100, function ($users) { foreach ($users as $user) { $user->calculateStatistics(); $user->save(); } }); // With conditions User::where('active', true) ->chunk(200, function ($users) { foreach ($users as $user) { ProcessUserJob::dispatch($user); } });
2. Chunk By ID for Safer Updates
// Prevents issues when modifying records during iteration User::where('newsletter_sent', false) ->chunkById(100, function ($users) { foreach ($users as $user) { $user->update(['newsletter_sent' => true]); Mail::to($user)->send(new Newsletter()); } }); // With custom column Payment::where('processed', false) ->chunkById(100, function ($payments) { foreach ($payments as $payment) { $payment->process(); } }, 'payment_id'); // Custom ID column
3. Lazy Collections for Memory Efficiency
// Uses PHP generators, minimal memory footprint User::where('created_at', '>=', now()->subDays(30)) ->lazy() ->each(function ($user) { $user->recalculateScore(); }); // With chunking size control User::lazy(100)->each(function ($user) { ProcessRecentUser::dispatch($user); }); // Filter and map with lazy collections $results = User::lazy() ->filter(fn($user) => $user->hasActiveSubscription()) ->map(fn($user) => [ 'id' => $user->id, 'revenue' => $user->calculateRevenue(), ]) ->take(1000);
4. Cursor for Forward-Only Iteration
// Most memory-efficient for simple forward iteration foreach (User::where('active', true)->cursor() as $user) { $user->updateLastSeen(); } // With lazy() for additional collection methods User::where('verified', true) ->cursor() ->filter(fn($user) => $user->hasCompletedProfile()) ->each(fn($user) => SendWelcomeEmail::dispatch($user));
Real-World Examples
Export Large CSV
class ExportUsersCommand extends Command { public function handle() { $file = fopen('users.csv', 'w'); // Write headers fputcsv($file, ['ID', 'Name', 'Email', 'Created At']); // Process in chunks to avoid memory issues User::select('id', 'name', 'email', 'created_at') ->chunkById(500, function ($users) use ($file) { foreach ($users as $user) { fputcsv($file, [ $user->id, $user->name, $user->email, $user->created_at->toDateTimeString(), ]); } // Optional: Show progress $this->info("Processed up to ID: {$users->last()->id}"); }); fclose($file); $this->info('Export completed!'); } }
Batch Email Campaign
class SendCampaignJob implements ShouldQueue { public function handle() { $campaign = Campaign::find($this->campaignId); // Process subscribers in chunks $campaign->subscribers() ->where('unsubscribed', false) ->chunkById(50, function ($subscribers) use ($campaign) { foreach ($subscribers as $subscriber) { SendCampaignEmail::dispatch($campaign, $subscriber) ->onQueue('emails') ->delay(now()->addSeconds(rand(1, 10))); } // Prevent rate limiting sleep(2); }); } }
Data Migration/Transformation
class MigrateUserData extends Command { public function handle() { $bar = $this->output->createProgressBar(User::count()); User::with(['profile', 'settings']) ->chunkById(100, function ($users) use ($bar) { DB::transaction(function () use ($users, $bar) { foreach ($users as $user) { // Complex transformation $newData = $this->transformUserData($user); NewUserModel::create($newData); $bar->advance(); } }); }); $bar->finish(); $this->newLine(); $this->info('Migration completed!'); } }
Cleanup Old Records
class CleanupOldLogs extends Command { public function handle() { $deletedCount = 0; ActivityLog::where('created_at', '<', now()->subMonths(6)) ->chunkById(1000, function ($logs) use (&$deletedCount) { $ids = $logs->pluck('id')->toArray(); // Batch delete for efficiency ActivityLog::whereIn('id', $ids)->delete(); $deletedCount += count($ids); $this->info("Deleted {$deletedCount} records so far..."); // Give database a breather usleep(100000); // 100ms }); $this->info("Total deleted: {$deletedCount}"); } }
Choosing the Right Method
| Method | Use Case | Memory Usage | Notes |
|---|---|---|---|
| General processing | Moderate | May skip/duplicate if modifying filter columns |
| Updates during iteration | Moderate | Safer for modifications |
| Large result processing | Low | Returns LazyCollection |
| Simple forward iteration | Lowest | Returns Generator |
| Simple operations | High (loads all) | Avoid for large datasets |
Performance Optimization Tips
1. Select Only Needed Columns
User::select('id', 'email', 'name') ->chunkById(100, function ($users) { // Process with minimal data });
2. Use Indexes
// Ensure indexed columns in where clauses User::where('status', 'active') // status should be indexed ->where('created_at', '>', $date) // created_at should be indexed ->chunkById(200, function ($users) { // Process efficiently });
3. Disable Eloquent Events When Appropriate
User::withoutEvents(function () { User::chunkById(500, function ($users) { foreach ($users as $user) { $user->update(['processed' => true]); } }); });
4. Use Raw Queries for Bulk Updates
// Instead of updating each record User::chunkById(100, function ($users) { $ids = $users->pluck('id')->toArray(); // Bulk update with raw query DB::table('users') ->whereIn('id', $ids) ->update([ 'last_processed_at' => now(), 'processing_count' => DB::raw('processing_count + 1'), ]); });
5. Queue Large Operations
class ProcessLargeDataset extends Command { public function handle() { User::chunkById(100, function ($users) { ProcessUserBatch::dispatch($users->pluck('id')) ->onQueue('heavy-processing'); }); } } class ProcessUserBatch implements ShouldQueue { public function __construct( public Collection $userIds ) {} public function handle() { User::whereIn('id', $this->userIds) ->get() ->each(fn($user) => $user->process()); } }
Testing Chunked Operations
test('processes all active users in chunks', function () { // Create test data User::factory()->count(150)->create(['active' => true]); User::factory()->count(50)->create(['active' => false]); $processed = []; User::where('active', true) ->chunkById(50, function ($users) use (&$processed) { foreach ($users as $user) { $processed[] = $user->id; } }); expect($processed)->toHaveCount(150); expect(count(array_unique($processed)))->toBe(150); }); test('handles empty datasets gracefully', function () { $callCount = 0; User::where('id', '<', 0) // No results ->chunk(100, function ($users) use (&$callCount) { $callCount++; }); expect($callCount)->toBe(0); });
Common Pitfalls
-
Modifying filter columns during chunk()
// WRONG: May skip records User::where('processed', false) ->chunk(100, function ($users) { foreach ($users as $user) { $user->update(['processed' => true]); // Changes the WHERE condition! } }); // CORRECT: Use chunkById() User::where('processed', false) ->chunkById(100, function ($users) { foreach ($users as $user) { $user->update(['processed' => true]); } }); -
Not handling chunk callback returns
// Return false to stop chunking User::chunk(100, function ($users) { foreach ($users as $user) { if ($user->hasIssue()) { return false; // Stop processing } $user->process(); } }); -
Ignoring database connection limits
// Consider connection timeouts for long operations DB::connection()->getPdo()->setAttribute(PDO::ATTR_TIMEOUT, 3600); User::chunkById(100, function ($users) { // Long running process });
Remember: When dealing with large datasets, always think about memory usage, query efficiency, and processing time. Chunk your data appropriately!