Claude-skill-registry fixture-tricky
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/fixture-tricky" ~/.claude/skills/majiayu000-claude-skill-registry-fixture-tricky && rm -rf "$T"
manifest:
skills/data/fixture-tricky/SKILL.mdsource content
Fixture Tricky Skill
Generate adversarial PDF content designed to expose extractor bugs. Continuously extensible as new edge cases are discovered from real-world PDFs.
Why This Exists
Real-world PDFs contain patterns that reliably break extractors:
- Text that Camelot/Marker falsely detect as tables
- Tables corrupted by Word/PDF conversions
- Text with ligatures, special characters, mixed directions
- Layout patterns that confuse section detection
This skill creates reproducible test cases for these issues.
Quick Start
cd .pi/skills/fixture-tricky # Generate false-positive table content uv run generate.py false-tables --output false_tables.pdf # Generate malformed/corrupted tables uv run generate.py malformed-tables --output malformed.pdf # Generate text extraction nightmares uv run generate.py cursed-text --output cursed.pdf # Generate layout traps uv run generate.py layout-traps --output layout.pdf # All-in-one stress test uv run generate.py gauntlet --output gauntlet.pdf # List all available tricks uv run generate.py list-tricks
Trick Categories
False-Positive Tables (false-tables
)
false-tablesText patterns that extractors incorrectly identify as tables:
| Trick | Description |
|---|---|
| "1. Item one\n2. Item two" with aligned numbers |
| Multi-line addresses with aligned fields |
| Indented code with column-like alignment |
| Name/title/date aligned like table rows |
| "Key: Value" patterns in sequence |
| Two-column text layout |
| Table of contents with dotted leaders |
Malformed Tables (malformed-tables
)
malformed-tablesReal tables with structural problems:
| Trick | Description |
|---|---|
| Rows with fewer cells than header (Word import bug) |
| Inconsistent column counts across rows |
| Excessive cell merging breaking structure |
| Table split across page break |
| Tables inside table cells |
| No visible borders (detection challenge) |
| Some borders missing |
| Columns that don't line up |
Cursed Text (cursed-text
)
cursed-textText extraction nightmares:
| Trick | Description |
|---|---|
| fi, fl, ff, ffi, ffl characters |
| Equations with special notation |
| Latin + Greek + Cyrillic |
| Right-to-left text mixed with LTR |
| Chemical formulas, footnote markers |
| Zero-width spaces, soft hyphens |
| Characters that look alike but aren't |
Layout Traps (layout-traps
)
layout-trapsStructure/layout patterns that confuse extractors:
| Trick | Description |
|---|---|
| 10+ levels of section hierarchy |
| Footnotes that look like new sections |
| Marginal notes alongside main text |
| Large quoted text in middle of content |
| Text overlaid with watermark |
| 90° rotated text blocks |
| Content out of reading order |
Adding New Tricks
Tricks are registered in
tricks/registry.py. To add a new trick:
# In tricks/registry.py from .my_new_trick import generate_my_trick TRICKS["my-new-trick"] = { "category": "false-tables", # or malformed-tables, cursed-text, layout-traps "description": "Description of what this trick tests", "generator": generate_my_trick, }
Or add directly to
generate.py in the appropriate category dict.
Integration with pdf-fixture
Use with
pdf-fixture to create comprehensive test suites:
# Generate clean fixture cd ../pdf-fixture && uv run generate.py simple --output clean.pdf # Generate tricky fixture cd ../fixture-tricky && uv run generate.py gauntlet --output tricky.pdf # Compare extractor results on both
Real-World Discovery Workflow
When you find a PDF that breaks the extractor:
- Identify the problematic pattern
- Add a new trick that reproduces it minimally
- Run
to broadcastskills-sync - Use the trick in regression testing
# Example: Found a PDF where Camelot detects email signatures as tables uv run generate.py add-trick \ --name "email-signature" \ --category "false-tables" \ --description "Email signature blocks with name/title/phone"
Dependencies
dependencies = [ "pymupdf>=1.23.0", "reportlab>=4.0.0", "typer>=0.9.0", ]