Agent-skills gallery-scraper
Bulk download images from login-protected gallery websites using an attached browser session. Use when asked to scrape, download, or save images from authenticated gallery pages, extract full-size images from thumbnails, or batch download from multi-page galleries.
git clone https://github.com/jdrhyne/agent-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jdrhyne/agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/clawdbot/gallery-scraper" ~/.claude/skills/jdrhyne-agent-skills-gallery-scraper && rm -rf "$T"
clawdbot/gallery-scraper/SKILL.mdGallery Scraper
Bulk download images from authenticated gallery websites via browser relay.
Safety Boundaries
- Do not access gallery sites or user accounts that the user has not explicitly attached and authorized.
- Do not download beyond the selected gallery, profile, or page range without confirmation.
- Do not store cookies, tokens, or hidden form values in local output files.
- Do not keep retrying blocked downloads indefinitely; surface rate limits or auth failures instead.
Prerequisites
- User must have Chrome with OpenClaw Browser Relay extension
- User must be logged into the target site
- User must attach the browser tab (click relay toolbar button, badge ON)
Workflow
1. Attach Browser Tab
Ask user to:
- Log into the gallery site in Chrome
- Navigate to the target gallery/profile page
- Click the OpenClaw Browser Relay toolbar button (badge shows ON)
2. Discover Image URL Pattern
Most gallery sites store full-size URLs in data attributes. Common patterns:
// Extract via browser evaluate () => { // Try common patterns const patterns = [ 'img[data-max]', // data-max attribute 'img[data-src]', // lazy-load pattern 'img[data-full]', // full-size pattern 'a[data-lightbox] img', // lightbox galleries '.gallery-item img' // generic gallery ]; for (const sel of patterns) { const imgs = document.querySelectorAll(sel); if (imgs.length > 0) { return { selector: sel, count: imgs.length, sample: imgs[0].outerHTML.substring(0, 200) }; } } return null; }
3. Extract Full-Size URLs
Once pattern identified, extract all URLs:
// For data-max pattern (common) () => Array.from(document.querySelectorAll('img[data-max]')) .map(img => img.dataset.max) // For thumbnail→full conversion (replace path segment) () => Array.from(document.querySelectorAll('.gallery img')) .map(img => img.src.replace('/thumb/', '/full/'))
4. Handle Pagination
Check for multiple pages:
() => { const pagination = document.querySelectorAll('.pagination a, [class*="page"] a'); return Array.from(pagination).map(a => ({text: a.textContent, href: a.href})); }
Navigate to each page and collect URLs.
4b. Batch scrape multiple galleries (iframe trick)
When you need multiple galleries quickly and can’t automate CDP, you can load each gallery in a hidden iframe and extract
data-max URLs:
async () => { const urls = [ 'https://site.example/galleries/view/123', 'https://site.example/galleries/view/456' ]; const results = []; for (const url of urls) { const iframe = document.createElement('iframe'); iframe.style.position = 'fixed'; iframe.style.left = '-9999px'; iframe.style.width = '800px'; iframe.style.height = '600px'; iframe.src = url; document.body.appendChild(iframe); await new Promise((resolve, reject) => { const t = setTimeout(() => reject(new Error('timeout load')), 20000); iframe.onload = () => { clearTimeout(t); resolve(); }; }); const doc = iframe.contentDocument; const start = Date.now(); let imgs = []; while (Date.now() - start < 20000) { imgs = Array.from(doc.querySelectorAll('img[data-max]')).map(i => i.dataset.max); if (imgs.length) break; await new Promise(r => setTimeout(r, 500)); } results.push({ id: url.split('/').pop(), urls: imgs }); iframe.remove(); } return results; }
5. Check CDN Access
Test if CDN requires authentication or just Referer:
# Test direct access curl -I "CDN_URL" 2>/dev/null | head -3 # Test with Referer curl -I -H "Referer: https://SITE_DOMAIN/" "CDN_URL" 2>/dev/null | head -3
6. Bulk Download
Collect the URLs into a text file, then parallel download:
# Create output directory mkdir -p ~/Downloads/gallery_name # Download with Referer header (parallel) cd ~/Downloads/gallery_name while IFS= read -r url; do filename=$(basename "$url") curl -s -H "Referer: https://SITE_DOMAIN/" -o "$filename" "$url" & [ $(jobs -r | wc -l) -ge 8 ] && wait -n done < urls.txt wait
Python ThreadPool fallback (avoids shell quoting + wait -n issues):
import os import requests from concurrent.futures import ThreadPoolExecutor outdir = os.path.expanduser('~/Downloads/gallery_name') os.makedirs(outdir, exist_ok=True) headers = {'Referer': 'https://SITE_DOMAIN/', 'User-Agent': 'Mozilla/5.0'} with open('urls.txt') as f: urls = [line.strip() for line in f if line.strip()] def download(url): filename = os.path.join(outdir, os.path.basename(url)) if os.path.exists(filename) and os.path.getsize(filename) > 0: return r = requests.get(url, headers=headers, timeout=60) r.raise_for_status() with open(filename, 'wb') as f: f.write(r.content) with ThreadPoolExecutor(max_workers=8) as ex: for url in urls: ex.submit(download, url)
Handling Lock Buttons
Some galleries have "lock" buttons to reveal hidden content. Look for:
// Find lock/unlock buttons () => { const locks = document.querySelectorAll( '[class*="lock"], [class*="unlock"], ' + 'button[title*="lock"], .premium-unlock' ); return Array.from(locks).map(el => ({ tag: el.tagName, class: el.className, text: el.innerText?.substring(0, 30) })); }
Click each lock button before extracting URLs.
Output Organization
Optionally organize by gallery:
# Derive a gallery-specific folder name from the selected URL mkdir -p "gallery_<id>"
Troubleshooting
- 403 Forbidden: Add Referer header or extract cookies from browser
- Rate limited: Reduce parallel downloads, add delays
- Missing images: Check for JavaScript-loaded content, may need scroll injection
- Login required for CDN: Extract session cookies via
document.cookie