git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/batxent/rednote-contacts" ~/.claude/skills/openclaw-skills-rednote-contacts && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/batxent/rednote-contacts" ~/.openclaw/skills/openclaw-skills-rednote-contacts && rm -rf "$T"
skills/batxent/rednote-contacts/SKILL.mdred-crawler-ops
Use this skill when you need to operate the
red-crawler CLI from an OpenClaw workflow. It is the portable wrapper for the repo's existing crawler runtime, not a separate crawler implementation.
When To Use
Use
red-crawler-ops for:
- installing or cloning a fresh
workspacered-crawler - bootstrapping a fresh workspace into a ready-to-run state
- saving a login session into Playwright storage state
- crawling a seed Xiaohongshu profile
- running nightly collection against a workspace database
- exporting a weekly report
- listing contactable creators from the SQLite database
red-crawler CLI Commands
All crawling tasks must use the native
red-crawler CLI commands:
1. crawl-seed
Crawl a specific Xiaohongshu user profile and extract contact information.
uv run red-crawler crawl-seed \ --seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \ --storage-state "./state.json" \ --max-accounts 5 \ --max-depth 2 \ --db-path "./data/red_crawler.db" \ --output-dir "./output"
Parameters:
(required): Target user profile URL--seed-url
(required): Path to Playwright storage state file--storage-state
: Maximum accounts to crawl (default: 20)--max-accounts
: Crawl depth for related accounts (default: 2)--max-depth
: Include note recommendations--include-note-recommendations
: Enable safe mode--safe-mode
: Cache directory path--cache-dir
: Cache TTL in days (default: 7)--cache-ttl-days
: SQLite database path (default: ./data/red_crawler.db)--db-path
: Output directory (default: ./output)--output-dir
Outputs:
: Crawled account informationaccounts.csv
: Extracted contact information (emails, etc.)contact_leads.csv
: Execution reportrun_report.json
2. login
Interactive login to save browser session.
uv run red-crawler login --save-state "./state.json"
Parameters:
(required): Path to save storage state--save-state
: Login page URL (default: https://www.xiaohongshu.com)--login-url
3. login-qr-start / login-qr-finish
QR code-based login for headless environments.
# Start QR login (generates QR code) uv run red-crawler login-qr-start \ --save-state "./state.json" \ --qr-path "./login-qr.png" \ --session-path "./login-session.json" \ --timeout 180 # Finish QR login after user scans uv run red-crawler login-qr-finish \ --save-state "./state.json" \ --session-path "./login-session.json"
4. collect-nightly
Run scheduled nightly data collection.
uv run red-crawler collect-nightly \ --storage-state "./state.json" \ --db-path "./data/red_crawler.db" \ --report-dir "./reports" \ --crawl-budget 30 \ --search-term-limit 4
Parameters:
(required): Path to storage state file--storage-state
: Database path (default: ./data/red_crawler.db)--db-path
: Report directory (default: ./reports)--report-dir
: Cache directory--cache-dir
: Cache TTL (default: 7)--cache-ttl-days
: Crawl budget (default: 30)--crawl-budget
: Search term limit (default: 4)--search-term-limit
: Startup jitter--startup-jitter-minutes
: Slot name for scheduling--slot-name
5. report-weekly
Export weekly reports from database.
uv run red-crawler report-weekly \ --db-path "./data/red_crawler.db" \ --report-dir "./reports" \ --days 7
Parameters:
: Database path (default: ./data/red_crawler.db)--db-path
: Report directory (default: ./reports)--report-dir
: Report period in days (default: 7)--days
Outputs:
weekly-growth-report.jsoncontactable_creators.csv
6. list-contactable
List contactable creators from database.
uv run red-crawler list-contactable \ --db-path "./data/red_crawler.db" \ --lead-type "email" \ --creator-segment "creator" \ --min-relevance-score 0.5 \ --limit 20 \ --format csv
Parameters:
: Database path (default: ./data/red_crawler.db)--db-path
: Lead type filter (default: email)--lead-type
: Creator segment filter (default: creator)--creator-segment
: Minimum relevance score (default: 0.0)--min-relevance-score
: Result limit (default: 20)--limit
: Output format - table or csv (default: table)--format
7. open
Open Xiaohongshu in browser with saved session.
uv run red-crawler open --storage-state "./state.json"
Supported Actions
install_or_bootstrapbootstraplogincrawl_seedcollect_nightlyreport_weeklylist_contactable
Example Prompts
- "帮我在当前目录初始化/安装一个小红书爬虫项目" (Automatically maps to
to setup the workspace)install_or_bootstrap - "我需要登录爬虫" / "我要登录小红书" (Automatically maps to
to fetch/refresh the Playwright session state)login - "开始执行每日夜间数据采集" / "运行自动收集任务" (Automatically maps to
to continue crawling based on the database queue)collect_nightly - "帮我生成一份本周的爬虫数据周报" (Automatically maps to
pointing to the workspace's DB)report_weekly
Crawling New Data vs Querying Database:
- "帮我从这个博主去爬10个相关的美妆博主: https://www.xiaohongshu.com/..." (Crawls NEW data: Automatically maps to
withcrawl_seed
, settingseed_url
to 10. Note: crawling new data requires a seed URL.)max_accounts - "帮我从数据库/已爬取的数据中找出10个美妆/游戏/科技博主的联系方式" (Queries EXISTING DB: Automatically sets
toaction
,list_contactable
to 10, andlimit
to "美妆" to filter the local SQLite database)creator_segment
(Also understands technical prompt variations:)
- "Bootstrap this workspace: run setup, install Chromium, and finish when
has been created."state.json - "Crawl this seed profile with a depth of 2 and write outputs into
."output/ - "Export this week's report and return the generated artifacts."
Environment Setup
Windows (WSL2)
On Windows, red-crawler runs inside WSL2. You need:
- WSL2 with Ubuntu (20.04 or 22.04 recommended)
- WSLg (built-in graphics support for WSL2) - for browser GUI
- Dependencies:
sudo apt-get update sudo apt-get install -y git python3 python3-pip - uv (Python package manager):
curl -LsSf https://astral.sh/uv/install.sh | sh
Known Issues & Fixes:
-
playwright-stealth version conflict
- Error:
ImportError: cannot import name 'stealth_sync' - Fix: Lock to version
in<2.0.0
:pyproject.tomldependencies = [ "playwright-stealth<2.0.0", ... ]
- Error:
-
setuptools/pkg_resources missing
- Error:
ModuleNotFoundError: No module named 'pkg_resources' - Fix: Lock to version
in<70
:pyproject.tomldependencies = [ "setuptools<70", ... ]
- Error:
-
DISPLAY not set (WSLg)
- Error:
Missing X server or $DISPLAY - Fix: Export DISPLAY before running:
export DISPLAY=:0
- Error:
-
Headless vs Headed browser
command requires headed browser (GUI)login
and other commands also require headed browser on WSLcrawl-seed- Always set
before running any command with browserDISPLAY=:0
Linux (Native)
- Dependencies:
sudo apt-get update sudo apt-get install -y git python3 python3-pip - uv:
curl -LsSf https://astral.sh/uv/install.sh | sh - X Server (for headed browser):
sudo apt-get install -y xvfb export DISPLAY=:99 Xvfb :99 -screen 0 1024x768x16 &
macOS
- Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" - uv:
brew install uv - Git:
brew install git
Prerequisites
can clone the repository before setup when a workspace does not exist yet.install_or_bootstrap
and every operational action require the workspace to be thebootstrap
repository root.red-crawler
must be available whengit
needs to clone the repository.install_or_bootstrap
must be available foruv
,bootstrap
, and every CLI action.install_or_bootstrap
creates the Playwright storage state explicitly.login
andcrawl_seed
require an authenticated Playwright storage state file.collect_nightly
andreport_weekly
run from the database and do not require storage state.list_contactable- The workspace must contain
.pyproject.toml
Safety Limits
- Do not overwrite an existing non-
directory during installation.red-crawler - Do not point this skill at a directory that lacks
unless you intendpyproject.toml
to clone a fresh workspace there.install_or_bootstrap - Do not create login sessions silently;
orbootstrap
still require the user to complete interactive authentication.install_or_bootstrap - Do not point it at production data or unknown databases.
- Do not assume a browser session exists; create
withstate.json
first.login - Do not hard-code machine-specific paths in prompts or config.
- Prefer relative, workspace-scoped paths for outputs and reports.
Input Shape
Provide an object with
action plus optional fields used by the selected action. Common fields include:
workspace_pathrepo_urlworkspace_parentworkspace_namebranchrunner_commandstorage_statedb_pathreport_diroutput_dircache_dir
Action-specific fields include:
force_loginsync_dependenciesinstall_browserseed_urllogin_urlmax_accountsmax_depthinclude_note_recommendationssafe_modecache_ttl_dayscrawl_budgetsearch_term_limitstartup_jitter_minutesslot_namedayslead_typecreator_segmentmin_relevance_scorelimitformat
Output Shape
Successful runs return:
statusactioncommandsummaryartifactsmetricsnext_stepstdoutstderr
Error runs return:
statusactionerror_typemessagesuggested_fix
,action
,command
, andstdout
for execution-time failuresstderr- Early validation or configuration failures may omit
,action
,command
, andstdoutstderr