Skills PDF OCR using Gemini LLM
Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ashtonizmev/geminipdfocr" ~/.claude/skills/clawdbot-skills-pdf-ocr-using-gemini-llm && rm -rf "$T"
manifest:
skills/ashtonizmev/geminipdfocr/SKILL.mdsource content
Purpose
Use geminipdfocr to extract text from PDF documents via OCR (Google Gemini).
Data and privacy
Full page images/files are sent to Google's API. PDFs are split into single-page files and each page is uploaded to Google Gemini for OCR. There are no hidden exfiltration endpoints or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.
Setup (venv installation)
Before first use, create and activate the virtual environment:
cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
Set
GOOGLE_API_KEY in your environment before running (e.g. export GOOGLE_API_KEY=your-key).
How to use
When requested to extract text or perform OCR on a PDF:
- Run:
cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr <path-to-pdf> [--json] [--output <file>] - Use
for structured data.--json - Use
for testing or very long documents.--max-pages N - Use
to suppress progress logs.--quiet
Requirements
- A valid PDF file path.
set in the process environment (e.g.GOOGLE_API_KEY
).export GOOGLE_API_KEY=your-key
CLI options
| Option | Description |
|---|---|
| One or more PDF file paths (positional) |
| Limit pages per PDF |
| Output structured JSON instead of plain text |
| Write result to file (default: stdout) |
| Suppress INFO/DEBUG logs |