Skills tiktok-scraper-2

Name: tiktok-scraper-2
Author: openclaw

TikTok Profile Scraper

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/arulmozhiv/tiktok-scraper-2" ~/.claude/skills/openclaw-skills-tiktok-scraper-2 && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/arulmozhiv/tiktok-scraper-2" ~/.openclaw/skills/openclaw-skills-tiktok-scraper-2 && rm -rf "$T"

manifest: skills/arulmozhiv/tiktok-scraper-2/SKILL.md

TikTok Profile Scraper

A browser-based TikTok profile discovery and scraping tool.

Part of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, TikTok, and Facebook built with Python & Playwright, no API keys required.

---
name: tiktok-scraper
description: Discover and scrape TikTok profiles from your browser.
emoji: 🎵
version: 1.0.0
author: influenza
tags:
  - tiktok
  - scraping
  - social-media
  - influencer-discovery
metadata:
  clawdbot:
    requires:
      bins:
        - python3
        - chromium

    config:
      stateDirs:
        - data/output
        - data/queue
        - thumbnails
      outputFormats:
        - json
        - csv
---

Overview

This skill provides a two-phase TikTok scraping system:

Profile Discovery
Browser Scraping

Features

🔍 - Discover TikTok profiles by location and category
🌐 - Full browser simulation for accurate scraping
🛡️ - Browser fingerprinting, human behavior simulation, and stealth scripts
📊 - Profile info, stats, video thumbnails, and engagement data
💾 - JSON/CSV export with downloaded thumbnails
🔄 - Resume interrupted scraping sessions
⚡ - Auto-skip private accounts, low followers, empty profiles
🌍 - Built-in residential proxy support with 4 providers

Getting Google API Credentials (Optional)

Go to Google Cloud Console
Create a new project or select existing
Enable "Custom Search API"
Create API credentials → API Key
Go to Programmable Search Engine
Create a search engine with
```
tiktok.com
```
as the site to search
Copy the Search Engine ID

Usage

Agent Tool Interface

For OpenClaw agent integration, the skill provides JSON output:

# Discover profiles (returns JSON)
discover --location "Miami" --category "dance" --output json

# Scrape single profile (returns JSON)
scrape --username charlidamelio --output json

Output Data

Profile Data Structure

{
  "username": "example_creator",
  "full_name": "Example Creator",
  "nickname": "Example",
  "bio": "Dance creator | NYC 💃",
  "bio_link": "https://example.com",
  "followers": 250000,
  "following": 800,
  "likes": 5000000,
  "videos_count": 120,
  "is_verified": false,
  "is_private": false,
  "influencer_tier": "macro",
  "category": "dance",
  "location": "New York",
  "profile_url": "https://www.tiktok.com/@example_creator",
  "profile_pic_local": "thumbnails/example_creator/profile_abc123.jpg",
  "content_thumbnails": [
    "thumbnails/example_creator/content_1_def456.jpg",
    "thumbnails/example_creator/content_2_ghi789.jpg"
  ],
  "video_views": [
    {"display": "1.2M", "count": 1200000},
    {"display": "500K", "count": 500000}
  ],
  "scrape_timestamp": "2026-03-02T14:30:00"
}

Influencer Tiers

Tier	Follower Range
nano	< 1,000
micro	1,000 - 10,000
mid	10,000 - 100,000
macro	100,000 - 1M
mega	> 1,000,000

File Outputs

Queue files:

data/queue/{location}_{category}_{timestamp}.json

Scraped data:
```
data/output/{username}.json
```

Thumbnails:

thumbnails/{username}/profile_*.jpg

thumbnails/{username}/content_*.jpg

Export files:

data/export_{timestamp}.json

data/export_{timestamp}.csv

Configuration

Edit

config/scraper_config.json

{
  "proxy": {
    "enabled": false,
    "provider": "brightdata",
    "country": "",
    "sticky": true,
    "sticky_ttl_minutes": 10
  },
  "google_search": {
    "enabled": true,
    "api_key": "",
    "search_engine_id": "",
    "queries_per_location": 3
  },
  "scraper": {
    "headless": false,
    "min_followers": 1000,
    "download_thumbnails": true,
    "max_thumbnails": 6
  },
  "cities": ["New York", "Los Angeles", "Miami", "Chicago"],
  "categories": ["fashion", "beauty", "fitness", "food", "travel", "tech", "comedy", "dance", "music", "gaming"]
}

Filters Applied

The scraper automatically filters out:

❌ Private accounts
❌ Accounts with < 1,000 followers (configurable)
❌ Accounts with no videos
❌ Non-existent/removed accounts
❌ Already scraped accounts (deduplication)

Troubleshooting

No Profiles Discovered

Check Google API key and quota
Verify Search Engine ID is configured for tiktok.com
Try different location/category combinations

Rate Limiting

Reduce scraping speed (increase delays in config)
Run during off-peak hours
Use a residential proxy (see below)

CAPTCHA / Bot Detection

TikTok has aggressive bot detection — residential proxies are strongly recommended
The built-in anti-detection handles fingerprinting and stealth automatically
If you see CAPTCHAs, try running in non-headless mode and solve them manually

🌐 Residential Proxy Support

Why Use a Residential Proxy?

Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:

Advantage	Description
Avoid IP Bans	Residential IPs look like real household users, not data-center bots. TikTok is far less likely to flag them.
Automatic IP Rotation	Each request (or session) gets a fresh IP, so rate-limits never stack up on one address.
Geo-Targeting	Route traffic through a specific country/city so scraped content matches the target audience's locale.
Sticky Sessions	Keep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session.
Higher Success Rate	Rotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on TikTok.
Long-Running Scrapes	Scrape thousands of profiles over hours or days without interruption.
Concurrent Scraping	Run multiple browser instances across different IPs simultaneously.

Recommended Proxy Providers

We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:

Provider	Best For	Sign Up
Bright Data	World's largest network, 72M+ IPs, enterprise-grade	👉 Get Bright Data
IProyal	Pay-as-you-go, 195+ countries, no traffic expiry	👉 Get IProyal
Storm Proxies	Fast & reliable, developer-friendly API, competitive pricing	👉 Get Storm Proxies
NetNut	ISP-grade network, 52M+ IPs, direct connectivity	👉 Get NetNut

Setup Steps

1. Get Your Proxy Credentials

Username (from your provider dashboard)
Password (from your provider dashboard)
Host and Port are pre-configured per provider (or use custom)

2. Configure via Environment Variables

export PROXY_ENABLED=true
export PROXY_PROVIDER=brightdata    # brightdata | iproyal | stormproxies | netnut | custom
export PROXY_USERNAME=your_user
export PROXY_PASSWORD=your_pass
export PROXY_COUNTRY=us             # optional: two-letter country code
export PROXY_STICKY=true            # optional: keep same IP per session

3. Provider-Specific Host/Port Defaults

These are auto-configured when you set the

provider

name:

Provider	Host	Port
Bright Data	`brd.superproxy.io`	`22225`
IProyal	`proxy.iproyal.com`	`12321`
Storm Proxies	`rotating.stormproxies.com`	`9999`
NetNut	`gw-resi.netnut.io`	`5959`

Override with

PROXY_HOST

PROXY_PORT

env vars if your plan uses a different gateway.

4. Custom Proxy Provider

For any other proxy service, set provider to

custom

and supply host/port manually:

{
  "proxy": {
    "enabled": true,
    "provider": "custom",
    "host": "your.proxy.host",
    "port": 8080,
    "username": "user",
    "password": "pass"
  }
}

Running the Scraper with Proxy

Once configured, the scraper picks up the proxy automatically — no extra flags needed:

# Discover and scrape as usual — proxy is applied automatically
python main.py discover --location "Miami" --category "dance"
python main.py scrape --username charlidamelio

# The log will confirm proxy is active:
# INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>

Using the Proxy Manager Programmatically

from proxy_manager import ProxyManager

# From config (auto-reads config/scraper_config.json)
pm = ProxyManager.from_config()

# From environment variables
pm = ProxyManager.from_env()

# Manual construction
pm = ProxyManager(
    provider="brightdata",
    username="your_user",
    password="your_pass",
    country="us",
    sticky=True
)

# For Playwright browser context
proxy = pm.get_playwright_proxy()
# → {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"}

# For requests / aiohttp
proxies = pm.get_requests_proxy()
# → {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"}

# Force new IP (rotates session ID)
pm.rotate_session()

# Debug info
print(pm.info())

Best Practices for Long-Running Scrapes

Use sticky sessions — TikTok requires consistent IPs during a browsing session. Set
```
"sticky": true
```
.
Target the right country — Set
```
"country": "us"
```
(or your target region) so TikTok serves content in the expected locale.
Combine with existing anti-detection — This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer.
Rotate sessions between batches — Call
```
pm.rotate_session()
```
between large batches of profiles to get a fresh IP.
Use delays — Even with proxies, respect
```
delay_between_profiles
```
in config to avoid aggressive patterns.
Monitor your proxy dashboard — All providers have dashboards showing bandwidth usage and success rates.