OpenSpace pdf-verification
Verify PDF page counts and file integrity programmatically using PyPDF2 after generation
install
source · Clone the upstream repo
git clone https://github.com/HKUDS/OpenSpace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/pdf-verification" ~/.claude/skills/hkuds-openspace-pdf-verification && rm -rf "$T"
manifest:
gdpval_bench/skills/pdf-verification/SKILL.mdsource content
PDF Verification Skill
Purpose
After generating PDF files, always verify page counts and file integrity programmatically before declaring task completion. This catches formatting issues, empty pages, and corrupted files early.
When to Use
- After any PDF generation task
- When specific page counts are required
- Before marking a PDF-related task as complete
Verification Steps
1. Install PyPDF2 (if needed)
pip install PyPDF2
2. Verify Page Count and Integrity
Use
run_shell to execute a Python verification script:
python -c " from PyPDF2 import PdfReader import sys def verify_pdf(filepath, expected_pages=None): try: reader = PdfReader(filepath) actual_pages = len(reader.pages) print(f'✓ {filepath}: {actual_pages} pages') if expected_pages and actual_pages != expected_pages: print(f'✗ Page count mismatch: expected {expected_pages}, got {actual_pages}') return False # Check file is not empty/corrupted if actual_pages == 0: print(f'✗ Empty PDF: {filepath}') return False return True except Exception as e: print(f'✗ Error reading {filepath}: {e}') return False # Verify each PDF results = [] results.append(verify_pdf('output.pdf', expected_pages=2)) print(f'All checks passed: {all(results)}') sys.exit(0 if all(results) else 1) "
3. Verify Multiple PDFs
For tasks with multiple PDFs, verify each one:
python -c " from PyPDF2 import PdfReader pdfs = { 'listings.pdf': 2, 'map.pdf': 1, 'summary.pdf': 1 } all_passed = True for filepath, expected in pdfs.items(): try: reader = PdfReader(filepath) actual = len(reader.pages) status = '✓' if actual == expected else '✗' print(f'{status} {filepath}: {actual}/{expected} pages') if actual != expected: all_passed = False except Exception as e: print(f'✗ {filepath}: {e}') all_passed = False print(f'Verification: {\"PASSED\" if all_passed else \"FAILED\"}') "
4. Check File Size (Optional)
Add file size validation to catch empty or near-empty files:
python -c " import os from PyPDF2 import PdfReader filepath = 'output.pdf' min_size = 1000 # minimum bytes file_size = os.path.getsize(filepath) if file_size < min_size: print(f'✗ File too small: {file_size} bytes') else: reader = PdfReader(filepath) print(f'✓ {filepath}: {len(reader.pages)} pages, {file_size} bytes') "
Best Practices
- Verify immediately after generation - Don't wait until the end of the task
- Check all required PDFs - Verify each file meets specifications
- Fail fast - If verification fails, regenerate before proceeding
- Log results - Print verification results for debugging
- Set reasonable minimums - Use file size checks to catch empty outputs
Common Issues Caught
- Wrong page counts (extra blank pages, missing content)
- Corrupted or unreadable PDF files
- Empty PDFs (0 pages)
- Files that appear generated but contain no actual content
Example Task Completion Check
# Final verification before marking task complete python -c " from PyPDF2 import PdfReader import sys required = {'listings.pdf': 2, 'map.pdf': 1} passed = True for f, pages in required.items(): try: actual = len(PdfReader(f).pages) if actual != pages: print(f'FAIL: {f} has {actual} pages, expected {pages}') passed = False except Exception as e: print(f'FAIL: Cannot read {f}: {e}') passed = False if passed: print('SUCCESS: All PDFs verified') sys.exit(0) else: sys.exit(1) "