OpenSpace pdf-verification-check
Verify generated PDFs for integrity and page count using PyPDF2 before task completion.
install
source · Clone the upstream repo
git clone https://github.com/HKUDS/OpenSpace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/pdf-verification-check" ~/.claude/skills/hkuds-openspace-pdf-verification-check && rm -rf "$T"
manifest:
gdpval_bench/skills/pdf-verification-check/SKILL.mdsource content
PDF Verification Check
Use this skill whenever your task involves generating PDF files. Do not declare task completion until you have programmatically verified that the PDFs exist, are not corrupted, and match the expected page counts.
When to Use
- Immediately after generating one or more PDF files.
- Before reporting success or moving to the next phase of a task.
- When specific page counts are required (e.g., "1-page summary", "2-page listing").
Verification Steps
- Identify Expected Outputs: List the file paths and their expected page counts.
- Run Verification Script: Use
to execute a Python script usingrun_shell
.PyPDF2 - Validate Results: Ensure the script exits successfully and reports the correct page counts.
- Remediate: If verification fails, inspect the generation logic and regenerate.
Verification Script
Use the following Python snippet with
run_shell to verify PDFs. Adjust the expected_pages dictionary to match your task requirements.
import sys from PyPDF2 import PdfReader def verify_pdfs(files): """ files: dict mapping filepath to expected_page_count (or None if any count is ok) """ all_valid = True for path, expected_count in files.items(): try: reader = PdfReader(path) actual_count = len(reader.pages) if expected_count is not None and actual_count != expected_count: print(f"FAIL: {path} has {actual_count} pages, expected {expected_count}") all_valid = False else: print(f"OK: {path} has {actual_count} pages") except Exception as e: print(f"ERROR: Cannot read {path} - {str(e)}") all_valid = False return 0 if all_valid else 1 # Example usage - modify this map for your specific task files_to_check = { "output/listing.pdf": 2, "output/map.pdf": 1 } sys.exit(verify_pdfs(files_to_check))
Integration Example
When using
run_shell, pass the script as a heredoc or save it temporarily:
python3 -c " import sys from PyPDF2 import PdfReader # ... insert logic here ... "
Rules
- Always Verify: Never assume a PDF generation tool succeeded without checking the output file.
- Check Integrity: Successfully opening the file confirms it is not corrupted/empty.
- Check Counts: Page count mismatches often indicate formatting errors or missing content.
- Fail Fast: If verification fails, attempt to fix the generation code immediately rather than proceeding.
Dependencies
Ensure
PyPDF2 is available in the environment. If not, install it first:
pip install PyPDF2