Claude-skill-registry fixture-tricky

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/fixture-tricky" ~/.claude/skills/majiayu000-claude-skill-registry-fixture-tricky && rm -rf "$T"
manifest: skills/data/fixture-tricky/SKILL.md
source content

Fixture Tricky Skill

Generate adversarial PDF content designed to expose extractor bugs. Continuously extensible as new edge cases are discovered from real-world PDFs.

Why This Exists

Real-world PDFs contain patterns that reliably break extractors:

  • Text that Camelot/Marker falsely detect as tables
  • Tables corrupted by Word/PDF conversions
  • Text with ligatures, special characters, mixed directions
  • Layout patterns that confuse section detection

This skill creates reproducible test cases for these issues.

Quick Start

cd .pi/skills/fixture-tricky

# Generate false-positive table content
uv run generate.py false-tables --output false_tables.pdf

# Generate malformed/corrupted tables
uv run generate.py malformed-tables --output malformed.pdf

# Generate text extraction nightmares
uv run generate.py cursed-text --output cursed.pdf

# Generate layout traps
uv run generate.py layout-traps --output layout.pdf

# All-in-one stress test
uv run generate.py gauntlet --output gauntlet.pdf

# List all available tricks
uv run generate.py list-tricks

Trick Categories

False-Positive Tables (
false-tables
)

Text patterns that extractors incorrectly identify as tables:

TrickDescription
numbered-list
"1. Item one\n2. Item two" with aligned numbers
address-block
Multi-line addresses with aligned fields
code-block
Indented code with column-like alignment
signature-block
Name/title/date aligned like table rows
key-value-pairs
"Key: Value" patterns in sequence
multi-column
Two-column text layout
toc-entries
Table of contents with dotted leaders

Malformed Tables (
malformed-tables
)

Real tables with structural problems:

TrickDescription
missing-columns
Rows with fewer cells than header (Word import bug)
ragged-rows
Inconsistent column counts across rows
merged-chaos
Excessive cell merging breaking structure
split-table
Table split across page break
nested-tables
Tables inside table cells
borderless
No visible borders (detection challenge)
partial-borders
Some borders missing
misaligned-columns
Columns that don't line up

Cursed Text (
cursed-text
)

Text extraction nightmares:

TrickDescription
ligatures
fi, fl, ff, ffi, ffl characters
math-symbols
Equations with special notation
mixed-scripts
Latin + Greek + Cyrillic
rtl-mixed
Right-to-left text mixed with LTR
subscript-superscript
Chemical formulas, footnote markers
invisible-chars
Zero-width spaces, soft hyphens
encoding-hell
Characters that look alike but aren't

Layout Traps (
layout-traps
)

Structure/layout patterns that confuse extractors:

TrickDescription
deep-nesting
10+ levels of section hierarchy
footnote-sections
Footnotes that look like new sections
sidebar
Marginal notes alongside main text
pull-quote
Large quoted text in middle of content
watermark
Text overlaid with watermark
rotated-text
90° rotated text blocks
floating-elements
Content out of reading order

Adding New Tricks

Tricks are registered in

tricks/registry.py
. To add a new trick:

# In tricks/registry.py
from .my_new_trick import generate_my_trick

TRICKS["my-new-trick"] = {
    "category": "false-tables",  # or malformed-tables, cursed-text, layout-traps
    "description": "Description of what this trick tests",
    "generator": generate_my_trick,
}

Or add directly to

generate.py
in the appropriate category dict.

Integration with pdf-fixture

Use with

pdf-fixture
to create comprehensive test suites:

# Generate clean fixture
cd ../pdf-fixture && uv run generate.py simple --output clean.pdf

# Generate tricky fixture
cd ../fixture-tricky && uv run generate.py gauntlet --output tricky.pdf

# Compare extractor results on both

Real-World Discovery Workflow

When you find a PDF that breaks the extractor:

  1. Identify the problematic pattern
  2. Add a new trick that reproduces it minimally
  3. Run
    skills-sync
    to broadcast
  4. Use the trick in regression testing
# Example: Found a PDF where Camelot detects email signatures as tables
uv run generate.py add-trick \
  --name "email-signature" \
  --category "false-tables" \
  --description "Email signature blocks with name/title/phone"

Dependencies

dependencies = [
    "pymupdf>=1.23.0",
    "reportlab>=4.0.0",
    "typer>=0.9.0",
]