Learn-skills.dev flyscrape

Write and run web scraping scripts with `flyscrape`, the standalone CLI scraper with jQuery-like selectors. Use this skill when the user asks to scrape a site, extract data from HTML, follow pagination, crawl multiple pages, download files, use browser mode, configure selectors, or mentions `flyscrape`, a scraping script, nested scraping, or depth-controlled crawling.

install

source · Clone the upstream repo

git clone https://github.com/NeverSight/learn-skills.dev

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/aaronflorey/agent-skills/flyscrape" ~/.claude/skills/neversight-learn-skills-dev-flyscrape && rm -rf "$T"

manifest: data/skills-md/aaronflorey/agent-skills/flyscrape/SKILL.md

source content

Flyscrape

Flyscrape is a command-line web scraping tool that uses JavaScript scraping scripts. It's standalone (single binary), supports jQuery-like selectors, and can render JavaScript-heavy pages via headless browser.

Quick Reference

flyscrape new script.js    # Create new script from template
flyscrape dev script.js    # Dev mode: watch & re-run on changes (cached)
flyscrape run script.js    # Run the scraper
flyscrape run script.js --url "http://example.com" --depth 3  # Override config via CLI

Script Structure

Every script has two parts: config (controls behavior) and default function (extracts data).

export const config = {
  url: "https://example.com",
  // See references/config.md for all options
};

export default function({ doc, url, absoluteURL, scrape, follow }) {
  // doc - parsed HTML document with jQuery-like API
  // url - the current page URL
  // absoluteURL(path) - converts relative URLs to absolute
  // scrape(url, fn) - nested scraping of linked pages
  // follow(url) - manually follow a link (use with follow: [])
  
  return {
    title: doc.find("h1").text(),
    // Return object becomes JSON output
  };
}

Essential Config Options

Option	Default	Description
`url`	-	Starting URL
`urls`	`[]`	Multiple starting URLs
`depth`	`0`	How deep to follow links (0 = no following)
`follow`	`["a[href]"]`	CSS selectors for links to follow
`browser`	`false`	Enable headless Chromium for JS-heavy sites
`cache`	-	Set to `"file"` to cache requests
`rate`	-	Requests per minute limit
`concurrency`	-	Max concurrent requests

See

references/config.md

for complete configuration reference.

Query API (jQuery-like)

const el = doc.find(".selector");  // Find element(s)
el.text()                          // Get text content
el.html()                          // Get inner HTML
el.attr("href")                    // Get attribute
el.hasAttr("data-id")              // Check attribute exists
el.hasClass("active")              // Check class exists

// Collections
const items = doc.find("li");
items.length()                     // Count
items.first() / items.last()       // First/last element
items.get(0)                       // Element by index
items.map(el => el.text())         // Map to array
items.filter(el => el.hasClass("x")) // Filter elements

// Traversal
el.parent()                        // Parent element
el.children()                      // Direct children
el.siblings()                      // Sibling elements
el.prev() / el.next()              // Adjacent siblings
el.prevAll() / el.nextAll()        // All prev/next siblings
el.prevUntil("selector")           // Siblings until selector

See

references/query-api.md

for full API reference.

Common Patterns

Follow Pagination

export const config = {
  url: "https://example.com/posts",
  depth: 10,
  follow: [".pagination a.next"],
};

Scrape with Browser Mode (JS-heavy sites)

export const config = {
  url: "https://spa-site.com",
  browser: true,
  headless: true,
};

Nested Scraping (detail pages)

export default function({ doc, scrape, absoluteURL }) {
  const links = doc.find(".product-link");
  
  return {
    products: links.map(link => {
      const detailUrl = absoluteURL(link.attr("href"));
      return scrape(detailUrl, ({ doc }) => ({
        name: doc.find("h1").text(),
        price: doc.find(".price").text(),
      }));
    }),
  };
}

Download Files

import { download } from "flyscrape/http";

export default function({ doc, absoluteURL }) {
  doc.find("img").each(img => {
    download(absoluteURL(img.attr("src")), "images/");
  });
  return { downloaded: true };
}

Rate Limiting & Caching (be polite)

export const config = {
  url: "https://example.com",
  rate: 30,           // 30 requests/minute
  concurrency: 2,     // Max 2 concurrent
  cache: "file",      // Cache to scriptname.cache
};

Workflow

Create:
```
flyscrape new myscript.js
```
Develop:
```
flyscrape dev myscript.js
```
- iterates with cached responses
Run:
```
flyscrape run myscript.js
```
- full execution

Output:

flyscrape run myscript.js --output.file results.json

Troubleshooting Quick Tips

Problem	Solution
Getting blocked (403)	Add User-Agent header, reduce rate, use `browser: true`
Empty results	Check if site needs browser mode, verify selectors
Links not followed	Set `depth > 0` , check `follow` selectors
Slow performance	Increase `concurrency` , enable `cache: "file"`

See

references/troubleshooting.md

for detailed solutions.

Reference Files

```
references/config.md
```
- Complete configuration options
```
references/query-api.md
```
- Full Query API documentation
```
references/recipes.md
```
- Common patterns and code snippets
```
references/troubleshooting.md
```
- Problem solving guide
```
examples/
```
- Ready-to-use example scripts

External Resources

Documentation: https://flyscrape.com/docs/getting-started/
GitHub: https://github.com/philippta/flyscrape
Examples: https://github.com/philippta/flyscrape/tree/master/examples