Skills skroller
Automated social media content collection and analysis across platforms (Twitter/X, Instagram, TikTok, Reddit, LinkedIn, YouTube, Product Hunt, Medium, GitHub, Pinterest). Use when you need to: (1) scrape public posts programmatically, (2) analyze content by keywords or filters, (3) monitor brand mentions or trends for research, (4) curate content for personal analysis, (5) archive publicly available information, or (6) generate digests from scraped feeds. Always comply with platform ToS and applicable privacy laws.
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/10oss/skroller" ~/.claude/skills/openclaw-skills-skroller && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/10oss/skroller" ~/.openclaw/skills/openclaw-skills-skroller && rm -rf "$T"
skills/10oss/skroller/SKILL.mdSkroller - Social Media Content collection
Automate the collection and analysis of publicly available social media content. This skill handles content collection, filtering, and export with compliance safeguards.
⚖️ Legal & Compliance Requirements
Before using this skill:
- Review Platform ToS - Each platform has different rules about automated access
- Check robots.txt - Respect disallowed paths
- Rate Limiting - Stay within platform rate limits to avoid service disruption
- Privacy Laws - GDPR, CCPA, and other regulations apply to personal data
- Permitted Use - Research, personal analysis, competitive intelligence (where allowed)
- Prohibited Use - Spam, harassment, data resale, manipulation, bypassing auth
Data Protection:
- Anonymize personal data when storing
- Honor deletion requests (GDPR Art. 17)
- Limit retention to necessary periods
- Document lawful basis for processing
- Do not scrape sensitive personal data
Core Capabilities
- Content collection - Gather publicly available posts from platforms
- Data extraction - Pull text, timestamps, engagement metrics, author info
- Smart filtering - Filter by keywords, hashtags, date ranges, engagement thresholds
- Deduplication - Track seen posts to avoid duplicates across sessions
- Export formats - JSON, CSV, Markdown, or direct to note apps (Bear/Obsidian)
- Rate limiting - Respect platform constraints and avoid service disruption
Platform Support
| Platform | Approach | Notes |
|---|---|---|
| Twitter/X | Browser automation | Use Playwright; handle login if needed |
| Browser automation | Rate-limited; use sparingly | |
| TikTok | Browser automation | Heavy JS; may need longer waits |
| API + browser | Prefer API where possible | |
| Browser automation | Login required for most content | |
| YouTube | API + browser | Comments via browser, videos via API |
| Product Hunt | Browser automation | Product discovery, launches |
| Medium | Browser automation | Articles, blog posts |
| GitHub | Browser automation | Issues, discussions, repos |
| Browser automation | Visual content, pins |
Quick Start
Basic Scroll and Extract
# Scroll Twitter feed, extract 50 posts about "AI" node scripts/skroller.js --platform twitter --query "AI" --limit 50 --output posts.json
Monitor Brand Mentions
# Monitor Reddit for brand mentions, export to Markdown node scripts/skroller.js --platform reddit --query "mybrand" --format markdown --output mentions.md
Competitive Research
# Scroll competitor Instagram, capture top posts by engagement node scripts/skroller.js --platform instagram --profile @competitor --min-likes 1000 --output competitor-posts.json
Scripts
scripts/skroller.js
- Main scrolling engine
scripts/skroller.jsJavaScript script using Playwright for browser automation.
# Using npm scripts (recommended) npm run scroll -- --platform twitter --query "AI" --limit 50 --output posts.json # Direct execution node scripts/skroller.js --platform twitter --query "AI" --limit 50 --output posts.json
Options:
- Target platform (required): twitter, reddit, instagram, tiktok, linkedin, youtube, producthunt, medium, github, pinterest--platform
- Search keyword/hashtag--query
- Specific profile to scroll--profile
- Max posts to scrape (default: 50)--limit
- Filter by minimum engagement--min-likes
- Output format: json, csv, markdown (default: json)--format
- Output file path--output
- Capture screenshots for debugging--screenshot
- Skip previously seen posts--dedupe
scripts/feed-digest.js
- Generate digests
scripts/feed-digest.jsCreates summary digests from exported post data.
npm run digest -- --input posts.json --output digest.md # or: node scripts/feed-digest.js --input posts.json --output digest.md
scripts/export-to-notes.js
- Unified note app exporter
scripts/export-to-notes.jsExports scraped posts to multiple note applications with a single command.
Supported apps: Bear, Obsidian, Notion, Apple Notes, Evernote, OneNote, Google Keep, Roam Research, Logseq, Anytype
# Using npm scripts (recommended) npm run export -- --input posts.json --app obsidian --vault ~/Documents/Obsidian # Direct execution node scripts/export-to-notes.js --input posts.json --app bear --tags "ai,research" node scripts/export-to-notes.js --input posts.json --app notion --api-key $NOTION_TOKEN node scripts/export-to-notes.js --input posts.json --app apple-notes node scripts/export-to-notes.js --input posts.json --app evernote --output export.enex node scripts/export-to-notes.js --input posts.json --app one-note --access-token $MS_TOKEN node scripts/export-to-notes.js --input posts.json --app keep --output keep.html node scripts/export-to-notes.js --input posts.json --app roam --output roam.md node scripts/export-to-notes.js --input posts.json --app logseq --vault ~/Documents/Logseq node scripts/export-to-notes.js --input posts.json --app anytype --output anytype.md node scripts/export-to-notes.js --input posts.json --app obsidian --dry-run
Configuration: Set defaults in
.skroller-config.json:
{ "export": { "defaultApp": "obsidian", "vault": "~/Documents/Obsidian", "folder": "Skroller", "notionDatabaseId": "<db-id>" } }
Requirements by app:
- Bear: grizzly CLI (
)go install github.com/tylerwince/grizzly/cmd/grizzly@latest - Obsidian: Vault path
- Notion: API key (create at notion.so)
- Apple Notes: macOS with Notes app
- Evernote: Manual ENEX import
- OneNote: Microsoft Graph access token
- Google Keep: Manual HTML import
- Roam Research: Markdown import (drag MDL file into Roam)
- Logseq: Vault path (writes to pages/ folder)
- Anytype: Markdown import (use Anytype Import feature)
Configuration
.skroller-config.json
.skroller-config.jsonStore default settings:
{ "defaultLimit": 50, "scrollDelayMs": 1500, "userAgent": "Mozilla/5.0 ...", "platforms": { "twitter": { "loginRequired": false, "rateLimit": "100 requests/hour" }, "instagram": { "loginRequired": true, "rateLimit": "50 requests/hour" } }, "export": { "defaultFormat": "json", "includeImages": true, "includeMetrics": true } }
Authentication
Some platforms require login. Store credentials securely:
# For platforms requiring auth, set environment variables export SKROLLR_TWITTER_COOKIE="<auth cookie>" export SKROLLR_INSTAGRAM_USER="<username>" export SKROLLR_INSTAGRAM_PASS="<password>"
Security note: Never commit auth files. Use
.env with .gitignore.
Output Structure
JSON Output (default)
{ "platform": "twitter", "query": "AI", "scrapedAt": "2026-03-14T20:30:00Z", "posts": [ { "id": "1234567890", "author": "@username", "text": "Post content here...", "timestamp": "2026-03-14T18:00:00Z", "likes": 150, "retweets": 42, "replies": 12, "url": "https://twitter.com/...", "media": ["image1.jpg"], "hashtags": ["#AI", "#ML"] } ] }
Markdown Output
## Twitter Posts: "AI" (2026-03-14) ### @username - 150 likes Post content here... [View post](https://twitter.com/...) ---
Filtering Strategies
Keyword Matching
- Exact match:
"exact phrase" - Boolean:
AI AND (startup OR venture) - Exclude:
AI -crypto
Engagement Thresholds
- Filter low-quality:
--min-likes 100 - Viral content:
--min-shares 500
Time Windows
- Recent only:
--date-from 2026-03-14 - Historical:
--date-from 2026-01-01 --date-to 2026-01-31
Best Practices
Rate Limiting
- Add delays between scrolls:
--delay 2000 - Respect platform limits (see config)
- Use proxies for high-volume scraping
Content Quality
- Filter by engagement to find signal
- Dedupe across sessions
- Export with timestamps for freshness tracking
Ethics & Compliance
- Check platform ToS before scraping
- Don't scrape personal data at scale
- Use for research/curation, not spam
Troubleshooting
Scroll stops early
- Increase
or check for login requirements--limit - Some platforms detect automation; add random delays
Missing content
- Some platforms lazy-load; increase scroll delay
- Try
to debug what's visible--screenshot
Rate limited
- Reduce frequency; use
--delay - Check platform-specific rate limits in config
Integration with Other Skills
- bear-notes: Export via
export-to-notes.js --app bear - obsidian: Export via
export-to-notes.js --app obsidian - notion: Export via
export-to-notes.js --app notion - github: Create issues from scraped feedback/mentions
# Scroll and export to Obsidian in one command node scripts/skroller.js --platform twitter --query "AI" --limit 20 --output ai.json && \ node scripts/export-to-notes.js --input ai.json --app obsidian --vault ~/Documents/Obsidian # Scroll and export to Notion node scripts/skroller.js --platform reddit --query "startups" --limit 30 --output startups.json && \ node scripts/export-to-notes.js --input startups.json --app notion --api-key $NOTION_TOKEN # Use npm scripts for cleaner commands npm run scroll -- --platform twitter --query "tech" --limit 25 --output tech.json npm run export -- --input tech.json --app bear --tags "tech,research"
See Also
- Platform-specific selectors and quirksreferences/platform-details.md
- Rate limit guidelines per platformreferences/rate-limits.md
- CSS selectors for each platformassets/selector-cheatsheet.md