Claude-skill-registry hf-papers-reporter
Generate Word reports from Hugging Face Daily Papers. Downloads top papers, extracts abstracts and introductions from PDFs, extracts figures, and compiles everything into a formatted Word document with cover images. Use when user asks for 'HF daily papers', 'Hugging Face papers report', 'download papers and make a summary', or any request to fetch, analyze, and document papers from huggingface.co/papers.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/hf-papers-reporter" ~/.claude/skills/majiayu000-claude-skill-registry-hf-papers-reporter && rm -rf "$T"
manifest:
skills/data/hf-papers-reporter/SKILL.mdsource content
Hugging Face Daily Papers Reporter
Generate professional Word reports from Hugging Face Daily Papers with full text extraction and image capture.
What This Skill Does
- Scrapes huggingface.co/papers for the top papers
- Downloads PDFs from arXiv
- Extracts Abstract and Introduction sections
- Extracts figures/images from PDFs
- Generates a formatted Word document (.docx) with:
- Paper titles and arXiv links
- Cover images from HF
- Full abstracts
- Introduction sections
- Extracted figures from papers
Quick Start
Run the main script to generate today's report:
cd /path/to/hf-papers-reporter python3 scripts/process_papers.py
Output will be saved to
output/HF_Daily_Papers_Report.docx
Dependencies
Install required packages:
pip3 install PyMuPDF python-docx Pillow beautifulsoup4 requests
How It Works
Step 1: Fetch Paper List
- Scrapes huggingface.co/papers
- Extracts arXiv IDs, titles, and cover image URLs
Step 2: Download & Process (per paper)
Download PDF from arxiv.org/pdf/{id}.pdf ↓ Extract text (first 5 pages) - Abstract (regex match) - Introduction (regex match) ↓ Extract images (first 5 pages, max 3 per page) - Compress to 600x400 ↓ Download cover image from HF CDN - Compress to 800x600
Step 3: Generate Word Document
- Title page with report name and date
- Each paper as a section with:
- Cover image (centered)
- Abstract section
- Introduction section
- Extracted figures (up to 4)
Output Structure
hf_papers/ ├── pdfs/ # Downloaded PDFs ├── images/ # Cover images + extracted figures └── output/ ├── HF_Daily_Papers_Report.docx └── papers_data.json
Known Issues & Solutions
| Issue | Cause | Fix |
|---|---|---|
| XML encoding error | PDF text contains control characters | Script auto-cleans 0x00-0x1F chars |
| No abstract found | PDF structure varies | Multiple regex patterns tried |
| Large PDFs | Some papers are 20MB+ | Only first 5 pages processed |
Customization
To modify the number of papers (default: 10), edit the
PAPERS list in scripts/process_papers.py.
To change image sizes, modify the
thumbnail() calls in the script.