OpenSpace pandoc-unicode-workaround

Handle LaTeX Unicode errors in pandoc PDF generation by normalizing special characters to ASCII

install

source · Clone the upstream repo

git clone https://github.com/HKUDS/OpenSpace

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/pandoc-unicode-workaround" ~/.claude/skills/hkuds-openspace-pandoc-unicode-workaround && rm -rf "$T"

manifest: gdpval_bench/skills/pandoc-unicode-workaround/SKILL.md

source content

Pandoc Unicode Workaround

This skill provides a workflow for generating PDFs from Markdown using pandoc when LaTeX compilation fails due to Unicode character errors.

When to Use

Use this skill when:

```
pandoc document.md -o document.pdf
```
fails with LaTeX/Unicode errors
Error messages mention "Unicode character", "LaTeX Error", or specific characters like ✓, →, —, etc.
You need to produce PDF output but pandoc's default LaTeX engine cannot handle special characters

Step-by-Step Procedure

Step 1: Attempt Initial PDF Generation

pandoc input.md -o output.pdf

If this succeeds, you're done. If it fails with Unicode/LaTeX errors, proceed to Step 2.

Step 2: Normalize Unicode Characters

Replace problematic Unicode characters with ASCII equivalents. Common replacements:

Unicode Character	ASCII Replacement	Alternative
✓ (checkmark)	[Y] or [X]	OK, ✓ removed
✗ or ✘ (cross)	[N]	FAIL
→ (arrow right)	->	=>
← (arrow left)	<-
— (em dash)	-- or -	---
– (en dash)	-
• (bullet)	-	*
© (copyright)	(c)	Copyright
® (registered)	(R)
™ (trademark)	(TM)
… (ellipsis)	...
" (smart quotes)	" or '

Step 3: Apply Replacements

Option A - Manual edit: Open the Markdown file and manually replace the characters using find/replace.

Option B - Automated with sed (Linux/Mac):

sed -i 's/✓/[Y]/g' input.md
sed -i 's/✗/[N]/g' input.md
sed -i 's/→/->/g' input.md
sed -i 's/—/--/g' input.md

Option C - Automated with Python:

replacements = {
    '✓': '[Y]',
    '✗': '[N]',
    '→': '->',
    '—': '--',
    '–': '-',
    '…': '...',
    '"': '"',
    '"': '"',
}

with open('input.md', 'r', encoding='utf-8') as f:
    content = f.read()

for orig, repl in replacements.items():
    content = content.replace(orig, repl)

with open('input.md', 'w', encoding='utf-8') as f:
    f.write(content)

Step 4: Regenerate PDF

pandoc input.md -o output.pdf

Step 5: Verify Output

Check that the PDF was generated successfully and review the content to ensure character replacements are acceptable for your use case.

Alternative Approaches

If Unicode normalization is not acceptable:

Use a different PDF engine:

pandoc input.md -o output.pdf --pdf-engine=wkhtmltopdf

Use XeLaTeX (better Unicode support):

pandoc input.md -o output.pdf --pdf-engine=xelatex

Add LaTeX packages for Unicode:

pandoc input.md -o output.pdf -H header.tex

Where header.tex contains:

\usepackage{fontspec}
\usepackage{xunicode}

Tips

Keep the original Markdown file before normalization if you need to preserve characters
Document which characters were replaced for future reference
Test with a small sample first if working with large documents
For Word documents (.docx), Unicode issues are less common; this pattern is primarily for PDF generation

Error Indicators

Common LaTeX Unicode errors to watch for:

! LaTeX Error: Unicode character ... not set up for use with LaTeX
Package inputenc Error: Unicode character ... not set up
! Missing character: There is no ... in font ...