Marketplace extract
Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.
install
source · Clone the upstream repo
git clone https://github.com/aiskillstore/marketplace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tavily-ai/extract" ~/.claude/skills/aiskillstore-marketplace-extract-722f9b && rm -rf "$T"
manifest:
skills/tavily-ai/extract/SKILL.mdsource content
Extract Skill
Extract clean content from specific URLs. Ideal when you know which pages you want content from.
Authentication
The script uses OAuth via the Tavily MCP server. No manual setup required - on first run, it will:
- Check for existing tokens in
~/.mcp-auth/ - If none found, automatically open your browser for OAuth authentication
Note: You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. Sign up at tavily.com first if you don't have an account.
Alternative: API Key
If you prefer using an API key, get one at https://tavily.com and add to
~/.claude/settings.json:
{ "env": { "TAVILY_API_KEY": "tvly-your-api-key-here" } }
Quick Start
Using the Script
./scripts/extract.sh '<json>'
Examples:
# Single URL ./scripts/extract.sh '{"urls": ["https://example.com/article"]}' # Multiple URLs ./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}' # With query focus and chunks ./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}' # Advanced extraction for JS pages ./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
Basic Extraction
curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": ["https://example.com/article"] }'
Multiple URLs with Query Focus
curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": [ "https://example.com/ml-healthcare", "https://example.com/ai-diagnostics" ], "query": "AI diagnostic tools accuracy", "chunks_per_source": 3 }'
API Reference
Endpoint
POST https://api.tavily.com/extract
Headers
| Header | Value |
|---|---|
| |
| |
Request Body
| Field | Type | Default | Description |
|---|---|---|---|
| array | Required | URLs to extract (max 20) |
| string | null | Reranks chunks by relevance |
| integer | 3 | Chunks per URL (1-5, requires query) |
| string | | or (for JS pages) |
| string | | or |
| boolean | false | Include image URLs |
| float | varies | Max wait (1-60 seconds) |
Response Format
{ "results": [ { "url": "https://example.com/article", "raw_content": "# Article Title\n\nContent..." } ], "failed_results": [], "response_time": 2.3 }
Extract Depth
| Depth | When to Use |
|---|---|
| Simple text extraction, faster |
| Dynamic/JS-rendered pages, tables, structured data |
Examples
Single URL Extraction
curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": ["https://docs.python.org/3/tutorial/classes.html"], "extract_depth": "basic" }'
Targeted Extraction with Query
curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": [ "https://example.com/react-hooks", "https://example.com/react-state" ], "query": "useState and useEffect patterns", "chunks_per_source": 2 }'
JavaScript-Heavy Pages
curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": ["https://app.example.com/dashboard"], "extract_depth": "advanced", "timeout": 60 }'
Batch Extraction
curl --request POST \ --url https://api.tavily.com/extract \ --header "Authorization: Bearer $TAVILY_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3", "https://example.com/page4", "https://example.com/page5" ], "extract_depth": "basic" }'
Tips
- Max 20 URLs per request - batch larger lists
- Use
+query
to get only relevant contentchunks_per_source - Try
first, fall back tobasic
if content is missingadvanced - Set longer
for slow pages (up to 60s)timeout - Check
for URLs that couldn't be extractedfailed_results