Skills save-article-with-images

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/barryqin9999/save-article-with-images" ~/.claude/skills/clawdbot-skills-save-article-with-images && rm -rf "$T"
manifest: skills/barryqin9999/save-article-with-images/SKILL.md
source content

Save Article with Images

Save web articles to local storage, supporting articles with images. Automatically downloads images, generates Markdown, and converts to PDF.

Triggers

  • "save article"
  • "save this article"
  • "download article"
  • "clip article"

Quick Execution

Articles Without Images

1. Fetch article content (Jina Reader or browser)
2. Save to saved-articles/{title}-{date}.md
3. Send file to Feishu

Articles With Images

1. Create directory reports/{article-name}/
2. Create images/ subdirectory
3. Download all images to images/
4. Generate Markdown (relative path references)
5. Convert to PDF
6. Send PDF to Feishu

Complete Workflow

Step 1: Check if Article Has Images

Methods:

  • Jina Reader returns content with
    ![Image](URL)
    format
  • Or original webpage has
    <img>
    tags

Decision:

  • Images < 3 → Save Markdown directly, don't download images separately
  • Images ≥ 3 → Process with image workflow

Step 2: Create Directory Structure

mkdir -p ~/.openclaw/workspace/reports/{article-name}/images/

Directory Structure:

reports/{article-name}/
├── {article-name}.md      # Markdown file
├── {article-name}.html    # HTML intermediate (optional)
├── {article-name}.pdf     # Final output (optional)
└── images/                # Image directory
    ├── image1.jpg
    ├── image2.png
    └── ...

Step 3: Fetch Article Content

Method A: Jina Reader (Recommended)

curl -s "https://r.jina.ai/URL"

Pros: Auto-converts to Markdown, extracts image links Cons: Some sites blocked

Method B: Browser Fetch

# Open webpage
browser action=open url=URL

# Get content
browser action=act kind=evaluate fn='() => document.body.innerText'

# Get images
browser action=act kind=evaluate fn='() => {
  const imgs = document.querySelectorAll("img");
  return JSON.stringify(Array.from(imgs).map(img => ({
    src: img.src,
    alt: img.alt
  })));
}'

Step 4: Download Images

Single Image:

curl -o "images/image1.jpg" "https://example.com/image.jpg"

Batch Download (Python):

import requests
from pathlib import Path

def download_images(image_urls, output_dir):
    """Download image list"""
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    for i, url in enumerate(image_urls, 1):
        try:
            # Get extension
            ext = url.split('.')[-1].split('?')[0]
            if ext not in ['jpg', 'jpeg', 'png', 'gif', 'webp']:
                ext = 'jpg'
            
            # Download
            resp = requests.get(url, timeout=30, headers={
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
            })
            
            if resp.status_code == 200:
                filename = f"image{i}.{ext}"
                (output_dir / filename).write_bytes(resp.content)
                print(f"✅ {filename}")
            else:
                print(f"❌ HTTP {resp.status_code}: {url}")
        except Exception as e:
            print(f"❌ {e}: {url}")

# Usage
# download_images(['url1', 'url2'], 'images/')

Image Naming:

  • Sequential:
    image1.jpg
    ,
    image2.png
    , ...
  • By content:
    cover.jpg
    ,
    screenshot.png
    , ...

Step 5: Generate Markdown

Template:

# {Article Title}

> Source: {URL}
> Author: {author}
> Published: {date}

---

![Cover](images/image1.jpg)

{Content}

---

## Images

![Figure 1: {description}](images/image2.jpg)
![Figure 2: {description}](images/image3.png)

---

*Saved: {timestamp}*

Image Reference Format:

![Description](images/filename.ext)

Step 6: Convert to PDF (Optional)

Using Preset Styles:

# CSS file
CSS_FILE=~/.openclaw/workspace/templates/mobile-friendly.css

# Convert to HTML
pandoc {article-name}.md -o {article-name}.html --standalone --css=$CSS_FILE

# Generate PDF
weasyprint {article-name}.html {article-name}.pdf

PDF Configuration:

  • Body: 16pt, line-height 1.8
  • Page: 6×9 inches, margins 1.5cm
  • Font: Noto Sans CJK SC

⚠️ Image Overflow Solution (Important)

Problem: Images too large (e.g., 1200px wide), exceed PDF page width (~432pt/6 inches)

Solution: Create CSS file to limit image max-width

Required CSS:

/* Prevent image overflow */
img {
  max-width: 100%;
  height: auto;
  display: block;
  margin: 1em auto;
}

/* Images in images/ directory - 90% width */
img[src^="images/"] {
  max-width: 90%;
  margin: 0.5em auto;
}

/* Body styles */
body {
  max-width: 100%;
  padding: 1cm;
}

Correct PDF Generation Flow:

# 1. Create CSS file (in article directory)
cat > style.css << 'EOF'
img { max-width: 100%; height: auto; }
img[src^="images/"] { max-width: 90%; }
EOF

# 2. Generate HTML with CSS
pandoc {article-name}.md -o {article-name}.html --standalone --css=style.css

# 3. Generate PDF
weasyprint {article-name}.html {article-name}.pdf

Key Points:

  • ✅ Must add
    max-width: 100%
    or
    max-width: 90%
  • ✅ Use relative paths
    images/xxx.jpg
  • ❌ Don't render images at original size (will overflow)

Step 7: Send to Feishu

Send Markdown:

message action=send channel=feishu target="user:ou_xxx" filePath="path/to/file.md"

Send PDF:

message action=send channel=feishu target="user:ou_xxx" filePath="path/to/file.pdf"

Platform-Specific Handling

SourceFetch MethodImage Handling
Twitter/XJina ReaderDownload pbs.twimg.com images
WeChat Official Accountbrowser + CamoufoxDownload mmbiz.qpic.cn images
General WebpagesJina ReaderDownload all img tags
Login Required SitesbrowserUser manual screenshot

Twitter/X Articles

Image URL Format:

https://pbs.twimg.com/media/XXXXX?format=jpg&name=small

Download Command:

# Get best quality
curl -o "images/image1.jpg" "https://pbs.twimg.com/media/XXXXX?format=jpg&name=large"

WeChat Official Account Articles

Problem: WeChat has anti-hotlinking, direct download fails

Solutions:

  1. Use browser to open article
  2. Save screenshot
  3. Or use Camoufox tool
# Use tool from agent-reach
cd ~/.agent-reach/tools/wechat-article-for-ai
python3 main.py "https://mp.weixin.qq.com/s/ARTICLE_ID"

Checklist

After saving, verify:

□ Markdown file generated
□ All images downloaded successfully
□ Image relative paths correct
□ Images display correctly (local preview)
□ PDF generated successfully (optional)
□ File sent to Feishu

Error Handling

ErrorCauseSolution
Image download failedAnti-hotlinking/NetworkUse browser or lower quality
PDF generation failedMissing fonts/dependenciesCheck weasyprint installation
Markdown images not showingPath errorCheck relative paths
Jina Reader blockedSite restrictionUse browser fetch

File Locations

TypeDirectory
Simple articles
saved-articles/{title}-{date}.md
Articles with images
reports/{article-name}/
Temporary files
/tmp/article-{id}/

Skill Version: 1.0.0 Created: 2026-03-17