Skills bright-data
Bright Data Web Scraper API via curl. Use this skill for scraping social media (Twitter/X, Reddit, YouTube, Instagram, TikTok), account management, and usage monitoring.
git clone https://github.com/wulaosiji/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/wulaosiji/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/third-party/bright-data" ~/.claude/skills/wulaosiji-skills-bright-data && rm -rf "$T"
third-party/bright-data/SKILL.mdBright Data Web Scraper API
Use the Bright Data API via direct
curl calls for social media scraping, web data extraction, and account management.
Official docs:
https://docs.brightdata.com/
When to Use
Use this skill when you need to:
- Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
- Extract web data - Posts, profiles, comments, engagement metrics
- Monitor usage - Track bandwidth and request usage
- Manage account - Check status and zones
Prerequisites
- Sign up at Bright Data
- Get your API key from Settings > Users
- Create a Web Scraper dataset in the Control Panel to get your
dataset_id
export BRIGHTDATA_API_KEY="your-api-key"
Base URL
https://api.brightdata.com
Important: When using
in a command that pipes to another command, wrap the command containing$VARin$VAR. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.bash -c '...'bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
Social Media Scraping
Bright Data supports scraping these social media platforms:
| Platform | Profiles | Posts | Comments | Reels/Videos |
|---|---|---|---|---|
| Twitter/X | ✅ | ✅ | - | - |
| - | ✅ | ✅ | - | |
| YouTube | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | ✅ | ✅ | |
| TikTok | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | - | - |
How to Use
1. Trigger Scraping (Asynchronous)
Trigger a data collection job and get a
snapshot_id for later retrieval.
Write to
/tmp/brightdata_request.json:
[ {"url": "https://twitter.com/username"}, {"url": "https://twitter.com/username2"} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Response:
{ "snapshot_id": "s_m4x7enmven8djfqak" }
2. Trigger Scraping (Synchronous)
Get results immediately in the response (for small requests).
Write to
/tmp/brightdata_request.json:
[ {"url": "https://www.reddit.com/r/technology/comments/xxxxx"} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
3. Monitor Progress
Check the status of a scraping job (replace
<snapshot-id> with your actual snapshot ID):
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Response:
{ "snapshot_id": "s_m4x7enmven8djfqak", "dataset_id": "gd_xxxxx", "status": "running" }
Status values:
running, ready, failed
4. Download Results
Once status is
ready, download the collected data (replace <snapshot-id> with your actual snapshot ID):
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
5. List Snapshots
Get all your snapshots:
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'
6. Cancel Snapshot
Cancel a running job (replace
<snapshot-id> with your actual snapshot ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Platform-Specific Examples
Twitter/X - Scrape Profile
Write to
/tmp/brightdata_request.json:
[ {"url": "https://twitter.com/elonmusk"} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Returns:
x_id, profile_name, biography, is_verified, followers, following, profile_image_link
Twitter/X - Scrape Posts
Write to
/tmp/brightdata_request.json:
[ {"url": "https://twitter.com/username/status/123456789"} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Returns:
post_id, text, replies, likes, retweets, views, hashtags, media
Reddit - Scrape Subreddit Posts
Write to
/tmp/brightdata_request.json:
[ {"url": "https://www.reddit.com/r/technology", "sort_by": "hot"} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Parameters:
url, sort_by (new/top/hot)
Returns:
post_id, title, description, num_comments, upvotes, date_posted, community
Reddit - Scrape Comments
Write to
/tmp/brightdata_request.json:
[ {"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Returns:
comment_id, user_posted, comment_text, upvotes, replies
YouTube - Scrape Video Info
Write to
/tmp/brightdata_request.json:
[ {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Returns:
title, views, likes, num_comments, video_length, transcript, channel_name
YouTube - Search by Keyword
Write to
/tmp/brightdata_request.json:
[ {"keyword": "artificial intelligence", "num_of_posts": 50} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
YouTube - Scrape Comments
Write to
/tmp/brightdata_request.json:
[ {"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Returns:
comment_text, likes, replies, username, date
Instagram - Scrape Profile
Write to
/tmp/brightdata_request.json:
[ {"url": "https://www.instagram.com/username"} ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Returns:
followers, post_count, profile_name, is_verified, biography
Instagram - Scrape Posts
Write to
/tmp/brightdata_request.json:
[ { "url": "https://www.instagram.com/username", "num_of_posts": 20, "start_date": "01-01-2024", "end_date": "12-31-2024" } ]
Then run (replace
<dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \ -H "Content-Type: application/json" \ -d @/tmp/brightdata_request.json'
Account Management
Check Account Status
bash -c 'curl -s "https://api.brightdata.com/status" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Response:
{ "status": "active", "customer": "hl_xxxxxxxx", "can_make_requests": true, "ip": "x.x.x.x" }
Get Active Zones
bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'
Get Bandwidth Usage
bash -c 'curl -s "https://api.brightdata.com/customer/bw" \ -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Getting Dataset IDs
To use the scraping features, you need a
dataset_id:
- Go to Bright Data Control Panel
- Create a new Web Scraper dataset or select an existing one
- Choose the platform (Twitter, Reddit, YouTube, etc.)
- Copy the
from the dataset settingsdataset_id
Dataset IDs can also be found in the bandwidth usage API response under the
data field keys (e.g., v__ds_api_gd_xxxxx where gd_xxxxx is your dataset ID).
Common Parameters
| Parameter | Description | Example |
|---|---|---|
| Target URL to scrape | |
| Search keyword | |
| Limit number of results | |
| Filter by date (MM-DD-YYYY) | |
| Filter by date (MM-DD-YYYY) | |
| Sort order (Reddit) | , , |
| Response format | , |
Rate Limits
- Batch mode: up to 100 concurrent requests
- Maximum input size: 1GB per batch
- Exceeding limits returns
error429
Guidelines
- Create datasets first: Use the Control Panel to create scraper datasets
- Use async for large jobs: Use
for discovery and batch operations/trigger - Use sync for small jobs: Use
for single URL quick lookups/scrape - Check status before download: Poll
until status is/progressready - Respect rate limits: Don't exceed 100 concurrent requests
- Date format: Use MM-DD-YYYY for date parameters