Skills cheerio
install
source · Clone the upstream repo
git clone https://github.com/TerminalSkills/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/cheerio" ~/.claude/skills/terminalskills-skills-cheerio && rm -rf "$T"
manifest:
skills/cheerio/SKILL.mdsource content
Cheerio
Overview
Cheerio is a fast, lightweight HTML/XML parser for Node.js that implements a jQuery-like API. Unlike Puppeteer, it does not run a browser — it parses raw HTML strings, making it 100x faster and ideal for scraping server-rendered pages, parsing HTML files, and transforming HTML content. Pair it with
fetch or axios for web scraping, or use it standalone for HTML processing.
Instructions
Step 1: Installation
npm install cheerio
Step 2: Parse HTML and Extract Data
// parse_html.js — Load HTML and extract structured data with CSS selectors import * as cheerio from 'cheerio' const html = ` <html> <body> <h1>Products</h1> <div class="product" data-id="1"> <h2>Widget Pro</h2> <span class="price">$29.99</span> <a href="/products/widget-pro">Details</a> </div> <div class="product" data-id="2"> <h2>Gadget Max</h2> <span class="price">$49.99</span> <a href="/products/gadget-max">Details</a> </div> </body> </html>` const $ = cheerio.load(html) // Extract all products const products = [] $('.product').each((i, el) => { products.push({ id: $(el).attr('data-id'), title: $(el).find('h2').text().trim(), price: $(el).find('.price').text().trim(), link: $(el).find('a').attr('href'), }) }) console.log(products) // [{ id: '1', title: 'Widget Pro', price: '$29.99', link: '/products/widget-pro' }, ...]
Step 3: Web Scraping with Fetch
// scrape_site.js — Fetch a page and extract data import * as cheerio from 'cheerio' async function scrape(url) { const response = await fetch(url) const html = await response.text() const $ = cheerio.load(html) // Extract all links const links = [] $('a[href]').each((i, el) => { links.push({ text: $(el).text().trim(), href: $(el).attr('href'), }) }) // Extract meta tags const meta = { title: $('title').text(), description: $('meta[name="description"]').attr('content'), ogImage: $('meta[property="og:image"]').attr('content'), } return { links, meta } }
Step 4: Advanced Selectors and Traversal
// selectors.js — Complex CSS selectors and DOM traversal const $ = cheerio.load(html) // Attribute selectors $('a[href^="https"]') // links starting with https $('img[src$=".png"]') // PNG images $('div[class*="product"]') // divs with "product" in class // Traversal $('.product').first() // first product $('.product').last() // last product $('.product').eq(2) // third product (0-indexed) $('.price').parent() // parent of each .price element $('.product').children('h2') // direct h2 children $('.product').find('.price') // descendants matching .price $('.product').next() // next sibling $('.product').prev() // previous sibling // Filtering $('.product').filter((i, el) => { const price = parseFloat($(el).find('.price').text().replace('$', '')) return price < 50 }) // Text and HTML $('.product').first().text() // all text content, flattened $('.product').first().html() // inner HTML
Step 5: Table Extraction
// extract_table.js — Parse HTML tables into structured data function extractTable($, tableSelector) { /** * Convert an HTML table to an array of objects using headers as keys. * Args: * $: Cheerio instance * tableSelector: CSS selector for the table element */ const headers = [] $(`${tableSelector} thead th`).each((i, el) => { headers.push($(el).text().trim()) }) const rows = [] $(`${tableSelector} tbody tr`).each((i, tr) => { const row = {} $(tr).find('td').each((j, td) => { row[headers[j]] = $(td).text().trim() }) rows.push(row) }) return rows } // Usage const tableData = extractTable($, '#pricing-table') // [{ Plan: 'Free', Price: '$0', Users: '1' }, { Plan: 'Pro', Price: '$29', Users: '10' }]
Step 6: HTML Transformation
// transform.js — Modify HTML content const $ = cheerio.load(html) // Add class $('.product').addClass('featured') // Remove elements $('.ad-banner').remove() // Replace content $('h1').text('Updated Title') // Wrap elements $('.product').wrap('<section class="product-section"></section>') // Add attributes $('a').attr('target', '_blank') $('img').attr('loading', 'lazy') // Get modified HTML const modifiedHtml = $.html()
Examples
Example 1: Build a price monitoring scraper
User prompt: "Scrape product prices from 5 competitor websites daily and save to a CSV. The sites are server-rendered (no JavaScript needed)."
The agent will:
- Use
+ cheerio for each site (no browser overhead).fetch - Write site-specific selectors for product name, price, and availability.
- Parse prices into numbers, normalize currency.
- Append results to a CSV with timestamps.
- Set up as a cron job for daily execution.
Example 2: Extract and clean article content from HTML
User prompt: "I have 1,000 saved HTML pages from a blog. Extract just the article title, author, date, and body text from each, ignoring navigation, ads, and footers."
The agent will:
- Read each HTML file, load with cheerio.
- Extract content using article-specific selectors (
,article
, etc.)..post-content - Strip HTML tags from body, normalize whitespace.
- Output structured JSON with title, author, date, and clean text.
Guidelines
- Use cheerio for server-rendered pages (where the HTML contains the data you need). For SPAs or JavaScript-rendered content, use Puppeteer instead.
- Cheerio does not execute JavaScript, fetch external resources, or render CSS — it only parses the HTML string you give it.
- Always call
on extracted text — HTML often contains whitespace, newlines, and indentation that clutters results..trim() - Use
and.attr('href')
to get link/image URLs. Remember these may be relative — resolve them against the base URL..attr('src') - For large-scale scraping, cheerio is 100x faster than Puppeteer and uses negligible memory. It can process thousands of pages per second.
- Combine cheerio with
orfetch
for scraping, and add delays between requests to avoid overwhelming target servers.axios