Personal_AI_Infrastructure Apify
Social media scraping, business data, e-commerce via Apify actors — with auto-update workflow for actor catalog. USE WHEN Twitter, Instagram, LinkedIn, TikTok, YouTube, Facebook, Google Maps, Amazon scraping, Apify, update Apify actors, social media scraping, lead generation, web scraper.
git clone https://github.com/danielmiessler/Personal_AI_Infrastructure
T=$(mktemp -d) && git clone --depth=1 https://github.com/danielmiessler/Personal_AI_Infrastructure "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Releases/v4.0.0/.claude/skills/Scraping/Apify" ~/.claude/skills/danielmiessler-personal-ai-infrastructure-apify-5ebf78 && rm -rf "$T"
Releases/v4.0.0/.claude/skills/Scraping/Apify/SKILL.mdCustomization
Before executing, check for user customizations at:
~/.claude/PAI/USER/SKILLCUSTOMIZATIONS/Apify/
If this directory exists, load and apply any PREFERENCES.md, configurations, or resources found there. These override default behavior. If the directory does not exist, proceed with skill defaults.
🚨 MANDATORY: Voice Notification (REQUIRED BEFORE ANY ACTION)
You MUST send this notification BEFORE doing anything else when this skill is invoked.
-
Send voice notification:
curl -s -X POST http://localhost:8888/notify \ -H "Content-Type: application/json" \ -d '{"message": "Running the WORKFLOWNAME workflow in the Apify skill to ACTION"}' \ > /dev/null 2>&1 & -
Output text notification:
Running the **WorkflowName** workflow in the **Apify** skill to ACTION...
This is not optional. Execute this curl command immediately upon skill invocation.
Apify - Social Media & Web Scraping
Direct TypeScript access to 9 popular Apify actors with 99% token savings.
🔌 File-Based MCP
This skill is a file-based MCP - a code-first API wrapper that replaces token-heavy MCP protocol calls.
Why file-based? Filter data in code BEFORE returning to model context = 97.5% token savings.
Architecture: See
~/.claude/PAI/DOCUMENTATION/FileBasedMCPs.md
🎯 Overview
Direct TypeScript access to the 9 most popular Apify actors without MCP overhead. Filter and transform data in code BEFORE it reaches the model context.
📊 Available Actors
Social Media (5 platforms)
- Instagram (145k users, 4.60★) - Profiles, posts, hashtags, comments
- LinkedIn (26k users, 4.10★) - Profiles, jobs, posts
- TikTok (90k users, 4.61★) - Profiles, videos, hashtags, comments
- YouTube (40k users, 4.40★) - Channels, videos, comments, search
- Facebook (35k users, 4.56★) - Posts, groups, comments
Business & Lead Generation
- Google Maps (198k users, 4.76★) - HIGHEST VALUE!
- Search businesses, extract contacts, reviews, images
- Perfect for lead generation
E-commerce
- Amazon (8k users, 4.97★) - Products, reviews, pricing
Web Scraping
- Web Scraper (94k users, 4.39★) - General-purpose, works with ANY website
🚀 Quick Start
Basic Usage Pattern
import { scrapeInstagramProfile, searchGoogleMaps } from 'actors' // 1. Call the actor wrapper const profile = await scrapeInstagramProfile({ username: 'target_username', maxPosts: 50 }) // 2. Filter in code - BEFORE data reaches model! const viral = profile.latestPosts?.filter(p => p.likesCount > 10000) // 3. Only filtered results reach model context console.log(viral) // ~10 posts instead of 50
📚 Examples by Use Case
Social Media Monitoring
Instagram - Track engagement:
import { scrapeInstagramProfile, scrapeInstagramPosts } from 'actors' // Get profile with recent posts const profile = await scrapeInstagramProfile({ username: 'competitor', maxPosts: 100 }) // Filter in code - only high-performing posts from last 30 days const thirtyDaysAgo = Date.now() - (30 * 24 * 60 * 60 * 1000) const topRecent = profile.latestPosts ?.filter(p => new Date(p.timestamp).getTime() > thirtyDaysAgo && p.likesCount > 5000 ) .sort((a, b) => b.likesCount - a.likesCount) .slice(0, 10) // Only 10 posts reach model instead of 100!
LinkedIn - Job search:
import { searchLinkedInJobs } from 'actors' const jobs = await searchLinkedInJobs({ keywords: 'AI engineer', location: 'San Francisco', remote: true, maxResults: 200 }) // Filter in code - only senior roles at well-funded startups const topJobs = jobs.filter(j => j.seniority?.includes('Senior') && parseInt(j.applicants || '0') > 50 )
TikTok - Trend analysis:
import { scrapeTikTokHashtag } from 'actors' const videos = await scrapeTikTokHashtag({ hashtag: 'ai', maxResults: 500 }) // Filter in code - only viral content const viral = videos .filter(v => v.playCount > 1000000) .sort((a, b) => b.playCount - a.playCount) .slice(0, 20)
Lead Generation (Business Intelligence)
Google Maps - Local business leads:
import { searchGoogleMaps } from 'actors' // Search with contact info extraction const places = await searchGoogleMaps({ query: 'restaurants in Austin', maxResults: 500, includeReviews: true, maxReviewsPerPlace: 20, scrapeContactInfo: true // Extracts emails from websites! }) // Filter in code - only highly-rated with email/phone const qualifiedLeads = places .filter(p => p.rating >= 4.5 && p.reviewsCount >= 100 && (p.email || p.phone) ) .map(p => ({ name: p.name, rating: p.rating, reviews: p.reviewsCount, email: p.email, phone: p.phone, website: p.website, address: p.address })) // Export leads - only qualified results! console.log(`Found ${qualifiedLeads.length} qualified leads`)
Google Maps - Review sentiment analysis:
import { scrapeGoogleMapsReviews } from 'actors' const reviews = await scrapeGoogleMapsReviews({ placeUrl: 'https://maps.google.com/maps?cid=12345', maxResults: 1000 }) // Filter in code - analyze sentiment by rating const recentNegative = reviews .filter(r => { const thirtyDaysAgo = Date.now() - (30 * 24 * 60 * 60 * 1000) return ( r.rating <= 2 && new Date(r.publishedAtDate).getTime() > thirtyDaysAgo && r.text.length > 50 ) }) // Identify common complaints const complaints = recentNegative.map(r => r.text)
E-commerce & Competitive Intelligence
Amazon - Price monitoring:
import { scrapeAmazonProduct } from 'actors' const product = await scrapeAmazonProduct({ productUrl: 'https://www.amazon.com/dp/B08L5VT894', includeReviews: true, maxReviews: 200 }) // Filter in code - only recent negative reviews const recentNegative = product.reviews ?.filter(r => { const weekAgo = Date.now() - (7 * 24 * 60 * 60 * 1000) return ( r.rating <= 2 && new Date(r.date).getTime() > weekAgo ) }) console.log(`Price: $${product.price}`) console.log(`Rating: ${product.rating}/5`) console.log(`Recent issues: ${recentNegative?.length} complaints`)
Custom Web Scraping
Any Website - Custom extraction:
import { scrapeWebsite } from 'actors' const products = await scrapeWebsite({ startUrls: ['https://example.com/products'], linkSelector: 'a.product-link', maxPagesPerCrawl: 100, pageFunction: ` async function pageFunction(context) { const { request, $, log } = context return { url: request.url, title: $('h1.product-title').text(), price: $('span.price').text(), inStock: $('.in-stock').length > 0, description: $('.description').text() } } ` }) // Filter in code - only available products under $100 const affordable = products.filter(p => p.inStock && parseFloat(p.price.replace('$', '')) < 100 )
🎨 Advanced Patterns
Pattern 1: Multi-Platform Social Listening
import { scrapeInstagramHashtag, scrapeTikTokHashtag, searchYouTube } from 'actors' // Run all platforms in parallel const [instagramPosts, tiktokVideos, youtubeVideos] = await Promise.all([ scrapeInstagramHashtag({ hashtag: 'ai', maxResults: 100 }), scrapeTikTokHashtag({ hashtag: 'ai', maxResults: 100 }), searchYouTube({ query: '#ai', maxResults: 100 }) ]) // Combine and filter - only viral content across all platforms const allViral = [ ...instagramPosts.filter(p => p.likesCount > 10000), ...tiktokVideos.filter(v => v.playCount > 100000), ...youtubeVideos.filter(v => v.viewsCount > 50000) ] console.log(`Found ${allViral.length} viral posts across 3 platforms`)
Pattern 2: Lead Enrichment Pipeline
import { searchGoogleMaps, scrapeLinkedInProfile } from 'actors' // 1. Find businesses on Google Maps const restaurants = await searchGoogleMaps({ query: 'restaurants in SF', maxResults: 100, scrapeContactInfo: true }) // 2. Filter for qualified leads const qualified = restaurants.filter(r => r.rating >= 4.5 && r.email && r.reviewsCount >= 50 ) // 3. Enrich with LinkedIn data (if available) const enriched = await Promise.all( qualified.map(async (restaurant) => { // Try to find LinkedIn company page // ... additional enrichment logic return restaurant }) )
Pattern 3: Competitive Analysis Dashboard
import { scrapeInstagramProfile, scrapeYouTubeChannel, scrapeTikTokProfile } from 'actors' async function analyzeCompetitor(username: string) { // Gather data from all platforms const [instagram, youtube, tiktok] = await Promise.all([ scrapeInstagramProfile({ username, maxPosts: 30 }), scrapeYouTubeChannel({ channelUrl: `https://youtube.com/@${username}`, maxVideos: 30 }), scrapeTikTokProfile({ username, maxVideos: 30 }) ]) // Calculate engagement metrics in code return { username, instagram: { followers: instagram.followersCount, avgLikes: average(instagram.latestPosts?.map(p => p.likesCount) || []), engagementRate: calculateEngagement(instagram) }, youtube: { subscribers: youtube.subscribersCount, avgViews: average(youtube.videos?.map(v => v.viewsCount) || []) }, tiktok: { followers: tiktok.followersCount, avgPlays: average(tiktok.videos?.map(v => v.playCount) || []) } } }
💰 Token Savings Calculator
Example: Instagram profile with 100 posts
MCP Approach:
1. search-actors → 1,000 tokens 2. call-actor → 1,000 tokens 3. get-actor-output → 50,000 tokens (100 unfiltered posts) TOTAL: ~52,000 tokens
File-Based Approach:
const profile = await scrapeInstagramProfile({ username: 'user', maxPosts: 100 }) // Filter in code - only top 10 posts const top = profile.latestPosts ?.sort((a, b) => b.likesCount - a.likesCount) .slice(0, 10) // TOTAL: ~500 tokens (only 10 filtered posts reach model)
Savings: 99% reduction (52,000 → 500 tokens)
🔧 Actor Reference
Social Media
- Profile + postsscrapeInstagramProfile(input)
- Posts from userscrapeInstagramPosts(input)
- Posts by hashtagscrapeInstagramHashtag(input)
- Comments on postscrapeInstagramComments(input)
- Profile + experience + emailscrapeLinkedInProfile(input)
- Job listingssearchLinkedInJobs(input)
- Posts from profile/companyscrapeLinkedInPosts(input)
TikTok
- Profile + videosscrapeTikTokProfile(input)
- Videos by hashtagscrapeTikTokHashtag(input)
- Comments on videoscrapeTikTokComments(input)
YouTube
- Channel + videosscrapeYouTubeChannel(input)
- Search videossearchYouTube(input)
- Comments on videoscrapeYouTubeComments(input)
- Posts from pagesscrapeFacebookPosts(input)
- Group postsscrapeFacebookGroups(input)
- Post commentsscrapeFacebookComments(input)
Business & Lead Generation
Google Maps
- Search places (with contact extraction!)searchGoogleMaps(input)
- Single place detailsscrapeGoogleMapsPlace(input)
- Place reviewsscrapeGoogleMapsReviews(input)
E-commerce
Amazon
- Product details + reviewsscrapeAmazonProduct(input)
- Product reviews onlyscrapeAmazonReviews(input)
Web Scraping
General Web
- Custom multi-page crawlingscrapeWebsite(input)
- Single page extractionscrapePage(url, pageFunction)
⚙️ Configuration
Environment Variables:
# Required - Get from https://console.apify.com/account/integrations APIFY_TOKEN=apify_api_xxxxx...
Actor Run Options:
{ memory: 2048, // MB: 128, 256, 512, 1024, 2048, 4096, 8192 timeout: 300, // seconds build: 'latest' // or specific build number }
🎯 When to Use This vs MCP
Use File-Based (this skill):
- ✅ Need to filter large datasets (>100 results)
- ✅ Want to transform/aggregate data in code
- ✅ Multiple sequential operations
- ✅ Control flow (loops, conditionals)
- ✅ Maximum token efficiency
Use MCP:
- ❌ Simple single operations with small results (<10 items)
- ❌ One-off exploratory queries
- ❌ Don't want to write code
🔗 Links
- Apify Platform: https://apify.com
- Actor Store: https://apify.com/store
- API Docs: https://docs.apify.com/api/v2
Remember: Filter data in code BEFORE returning to model context. This is where the 99% token savings happen!