OpenSpace document-gen-dual-backend

Document generation with direct pandoc/ReportLab execution (lightweight default) and optional shell_agent fallback for complex scenarios

install
source · Clone the upstream repo
git clone https://github.com/HKUDS/OpenSpace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/document-gen-fallback-enhanced-enhanced-9f3b1f" ~/.claude/skills/hkuds-openspace-document-gen-dual-backend && rm -rf "$T"
manifest: gdpval_bench/skills/document-gen-fallback-enhanced-enhanced-9f3b1f/SKILL.md
source content

Document Generation: Dual-Backend Workflow (Unicode-Safe)

When to Use

Default Approach: Lightweight Direct Execution (Recommended for 90% of tasks)

For most document generation tasks, use direct

write_file
+
run_shell
without
shell_agent
:

  • Generating documents in standard formats (
    .docx
    ,
    .pdf
    ,
    .html
    ) from Markdown
  • Content is straightforward with minimal special characters
  • You already know the pandoc/ReportLab commands needed
  • Quick single-format or multi-format output is needed

Fallback Approach: shell_agent Delegation (For Complex Scenarios Only)

Use

shell_agent
delegation only when:

  • Automated fallback handling between backends requires complex logic
  • Dynamic content generation needs programmatic decision-making
  • You need to capture and analyze error messages for intelligent retry logic

When shell_agent May Be Useful (Optional)

Only consider

shell_agent
delegation for:

  • Complex error recovery requiring automated backend switching
  • Dynamic workflows with conditional branching based on generation results

Core Technique

For simple tasks (default): Use direct

write_file
+
run_shell
with pandoc or ReportLab (no shell_agent needed)

For complex tasks requiring fallback logic: Split the workflow into discrete, observable steps with two PDF generation paths:

Path A (Pandoc): Best for Markdown-to-PDF conversion with rich text formatting Path B (ReportLab): Best for programmatic PDF generation without LaTeX dependencies

Workflow steps:

  1. Content creation → Use
    write_file
    to create source document (Markdown)
  2. Choose PDF backend → Decide between pandoc (rich formatting) or ReportLab (programmatic control)
  3. Unicode handling → Apply sanitization for pandoc; ReportLab handles Unicode natively
  4. Format conversion → Use
    run_shell
    with appropriate commands for each format
  5. Verification → Check output files exist and are valid

⚠️ Backend Selection Guide

RequirementRecommended BackendRationale
Markdown source with headers, lists, tablesPandocNative Markdown parsing
Heavy Unicode/special charactersReportLabNative UTF-8 support, no sanitization needed
LaTeX not availableReportLabPure Python, no external dependencies
Precise layout control (positions, graphics)ReportLabProgrammatic canvas control
Quick DOCX + PDF + HTML batchPandocSingle tool, multiple outputs
Tables with complex formattingPandocBetter table rendering
Dynamic/charts/graphs in PDFReportLabDrawing operations support

Unicode & LaTeX Compatibility Warning

Critical for Pandoc Path: PDF generation via pandoc typically uses LaTeX (

pdflatex
or
xelatex
), which has limited Unicode support. Common problematic characters include:

CharacterIssueSafe Replacement
(em dash)
May not render
--
or
-
(en dash)
May not render
-
" "
(curly quotes)
Encoding errors
" "
(straight quotes)
' '
(curly apostrophe)
Encoding errors
'
(straight apostrophe)
(ellipsis)
May not render
...
(arrows)
LaTeX incompatibility
->
<-
^
v
(checkmarks)
May not render
[x]
[ ]
(symbols)
May not render
*
-
©
®
May require packages
(c)
(r)
(tm)
Non-ASCII letters (é, ñ, ü)Font-dependentUse xeLaTeX or replace

ReportLab Path: Handles Unicode natively. Pass content as UTF-8 strings; no character sanitization required.

Step-by-Step Workflow

Quick Start: Lightweight Direct Execution (Default for Most Tasks)

This is the recommended approach for most document generation tasks. No shell_agent delegation needed:

# Step 1: Write Markdown content
write_file
path: /tmp/doc.md
content: |
  # My Document
  
  Simple content here.

# Step 2: Convert with pandoc (direct)
run_shell
command: pandoc /tmp/doc.md -o output.docx

# Step 3: Verify
run_shell
command: ls -lh output.docx

For PDF with potential Unicode issues:

# Create sanitized version if needed
write_file
path: /tmp/doc_sanitized.md
content: |
  # My Document
  Content with safe ASCII characters only.

run_shell
command: pandoc /tmp/doc_sanitized.md -o output.pdf

For Unicode-heavy content (use ReportLab directly):

write_file
path: /tmp/gen.py
content: |
  from reportlab.platypus import SimpleDocTemplate, Paragraph
  from reportlab.lib.styles import getSampleStyleSheet
  doc = SimpleDocTemplate("output.pdf")
  styles = getSampleStyleSheet()
  story = [Paragraph("Unicode: é ñ ü 日本語", styles["Normal"])]
  doc.build(story)

run_shell
command: python /tmp/gen.py

Full Workflow: Dual-Backend with Fallback (For Complex Cases)

Step 1: Create Source Content with write_file

Write your document content as Markdown to a temporary source file:

write_file
path: /tmp/document_source.md
content: |
  # Document Title
  
  ## Section 1
  Content here...
  
  ## Section 2
  More content...

Step 2A: Pandoc Path - Sanitize Unicode (Pre-PDF Conversion)

Before PDF conversion with pandoc, create a sanitized version:

write_file
path: /tmp/document_source_sanitized.md
content: |
  # Document Title
  
  ## Section 1
  Content here... (with all special chars replaced per table above)

Step 2B: ReportLab Path - Create Python Script

For ReportLab, create a Python script that generates PDF directly:

write_file
path: /tmp/generate_pdf.py
content: |
  from reportlab.lib.pagesizes import letter
  from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
  from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
  from reportlab.lib import colors
  from reportlab.pdfbase import pdfmetrics
  from reportlab.pdfbase.ttfonts import TTFont
  
  # Create PDF
  doc = SimpleDocTemplate("output.pdf", pagesize=letter)
  styles = getSampleStyleSheet()
  story = []
  
  # Add content
  story.append(Paragraph("Document Title", styles["Heading1"]))
  story.append(Spacer(1, 12))
  story.append(Paragraph("Section 1 content with Unicode: é ñ ü → ✓ …", styles["Normal"]))
  
  # Build PDF
  doc.build(story)

Step 3: Convert to Target Formats

Option A: Pandoc Conversion (Use sanitized source for PDF)

run_shell
command: pandoc /tmp/document_source.md -o output.docx
run_shell
command: pandoc /tmp/document_source_sanitized.md -o output.pdf
run_shell
command: pandoc /tmp/document_source.md -o output.html

Option B: ReportLab Conversion (Direct PDF generation)

run_shell
command: python /tmp/generate_pdf.py

Then use pandoc for other formats from original source:

run_shell
command: pandoc /tmp/document_source.md -o output.docx
run_shell
command: pandoc /tmp/document_source.md -o output.html

Step 4: Verify Outputs

run_shell
command: ls -lh output.docx output.pdf output.html

Complete Examples

Example 1: Pandoc-First Approach

# Generate Report with Pandoc

## Step 1: Write Markdown source
write_file
path: /tmp/report.md
content: |
  # Quarterly Report
  
  ## Executive Summary
  Performance improved by 15% this quarter...
  
  ## Key Metrics
  - Revenue: $1.2M
  - Growth: +15%
  - Customers: 500+

## Step 2: Create sanitized version for PDF
write_file
path: /tmp/report_sanitized.md
content: |
  # Quarterly Report
  
  ## Executive Summary
  Performance improved by 15% this quarter...
  
  ## Key Metrics
  - Revenue: $1.2M
  - Growth: +15%
  - Customers: 500+

## Step 3: Convert to DOCX (from original)
run_shell
command: pandoc /tmp/report.md -o quarterly_report.docx

## Step 4: Convert to PDF (from sanitized)
run_shell
command: pandoc /tmp/report_sanitized.md -o quarterly_report.pdf

## Step 5: Convert to HTML (from original)
run_shell
command: pandoc /tmp/report.md -o quarterly_report.html

## Step 6: Verify
run_shell
command: ls -lh quarterly_report.*

Example 2: ReportLab-First Approach (Heavy Unicode)

# Generate Unicode-Heavy Document with ReportLab

## Step 1: Write Markdown source (for DOCX/HTML)
write_file
path: /tmp/intl_report.md
content: |
  # International Market Analysis
  
  ## Région EMEA
  Performance in Europe: ↑ 12%
  Markets: Deutschland, França, España
  
  ## Région APAC
  Growth: ✓ Exceeded targets
  Key: 日本,中国,한국

## Step 2: Create ReportLab Python script for PDF
write_file
path: /tmp/generate_intl_pdf.py
content: |
  from reportlab.lib.pagesizes import letter
  from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
  from reportlab.lib.styles import getSampleStyleSheet
  from reportlab.lib import colors
  
  doc = SimpleDocTemplate("intl_analysis.pdf", pagesize=letter)
  styles = getSampleStyleSheet()
  story = []
  
  # Title - Unicode handled natively
  story.append(Paragraph("International Market Analysis", styles["Heading1"]))
  story.append(Spacer(1, 12))
  
  # EMEA Section
  story.append(Paragraph("Région EMEA", styles["Heading2"]))
  story.append(Paragraph("Performance in Europe: ↑ 12%", styles["Normal"]))
  story.append(Paragraph("Markets: Deutschland, França, España", styles["Normal"]))
  story.append(Spacer(1, 12))
  
  # APAC Section
  story.append(Paragraph("Région APAC", styles["Heading2"]))
  story.append(Paragraph("Growth: ✓ Exceeded targets", styles["Normal"]))
  story.append(Paragraph("Key: 日本,中国,한국", styles["Normal"]))
  
  doc.build(story)

## Step 3: Generate PDF with ReportLab
run_shell
command: python /tmp/generate_intl_pdf.py

## Step 4: Generate DOCX with pandoc
run_shell
command: pandoc /tmp/intl_report.md -o intl_analysis.docx

## Step 5: Generate HTML with pandoc
run_shell
command: pandoc /tmp/intl_report.md -o intl_analysis.html

## Step 6: Verify
run_shell
command: ls -lh intl_analysis.*

Example 3: Hybrid Approach with Fallback

# Generate with Pandoc, Fallback to ReportLab

## Step 1: Create source document
write_file
path: /tmp/analysis.md
content: |
  # Analysis Document
  [Content...]

## Step 2: Try pandoc first
run_shell
command: pandoc /tmp/analysis.md -o analysis.pdf

## Step 3: If pandoc fails, use ReportLab fallback
[If Step 2 returns error...]
write_file
path: /tmp/fallback_pdf.py
content: |
  from reportlab.lib.pagesizes import letter
  from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
  from reportlab.lib.styles import getSampleStyleSheet
  from reportlab.pdfbase import pdfmetrics
  from reportlab.pdfbase.ttfonts import TTFont
  
  # Register Unicode font if needed
  # pdfmetrics.registerFont(TTFont('UnicodeFont', '/path/to/font.ttf'))
  
  doc = SimpleDocTemplate("analysis.pdf", pagesize=letter)
  styles = getSampleStyleSheet()
  story = []
  story.append(Paragraph("Analysis Document", styles["Heading1"]))
  story.append(Spacer(1, 12))
  story.append(Paragraph("[Content from analysis.md parsed here]", styles["Normal"]))
  doc.build(story)

run_shell
command: python /tmp/fallback_pdf.py

Common pandoc Commands

# Markdown to Word
pandoc input.md -o output.docx

# Markdown to PDF (requires LaTeX)
pandoc input.md -o output.pdf

# Markdown to PDF with Unicode-safe engine (better Unicode support)
pandoc input.md -o output.pdf --pdf-engine=xelatex

# Markdown to PDF with wkhtmltopdf (alternative, no LaTeX)
pandoc input.md -o output.pdf --pdf-engine=wkhtmltopdf

# Markdown to HTML
pandoc input.md -o output.html

# With custom template
pandoc input.md --template=template.html -o output.html

# With metadata
pandoc input.md -o output.pdf --metadata title="Document Title"

Common ReportLab Patterns

# Basic PDF creation
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet

doc = SimpleDocTemplate("output.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []
story.append(Paragraph("Title", styles["Heading1"]))
story.append(Spacer(1, 12))
story.append(Paragraph("Content", styles["Normal"]))
doc.build(story)

# With tables
from reportlab.platypus import Table, TableStyle
from reportlab.lib import colors

data = [['Header1', 'Header2'], ['Row1-Col1', 'Row1-Col2']]
table = Table(data)
table.setStyle(TableStyle([
    ('BACKGROUND', (0, 0), (-1, 0), colors.grey),
    ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
    ('GRID', (0, 0), (-1, -1), 0.5, colors.black),
]))
story.append(table)

# With custom fonts for Unicode
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

pdfmetrics.registerFont(TTFont('NotoSans', 'NotoSans-Regular.ttf'))
custom_style = ParagraphStyle('Custom', fontName='NotoSans', fontSize=12)
story.append(Paragraph("Unicode: 日本語 العربية", custom_style))

Troubleshooting

Pandoc Issues

  • PDF generation fails with encoding error:

    • Use the sanitized markdown file
    • Try
      --pdf-engine=xelatex
      for better Unicode support
    • Try
      --pdf-engine=wkhtmltopdf
      as alternative
  • PDF generation fails: LaTeX not found:

    • Install LaTeX:
      apt-get install texlive-latex-recommended texlive-fonts-recommended
    • Switch to ReportLab backend (no LaTeX needed)
  • DOCX formatting issues: Add

    --reference-doc=template.docx
    for custom styles

  • Missing pandoc: Install via

    apt-get install pandoc
    or
    brew install pandoc

ReportLab Issues

  • Missing reportlab: Install via

    pip install reportlab

  • Unicode characters not rendering:

    • ReportLab handles UTF-8 natively - ensure Python script uses UTF-8
    • For CJK/Arabic/etc., register a Unicode-capable font:
      from reportlab.pdfbase import pdfmetrics
      from reportlab.pdfbase.ttfonts import TTFont
      pdfmetrics.registerFont(TTFont('NotoSans', 'NotoSans-Regular.ttf'))
      
  • Complex layouts needed: Use

    reportlab.lib.canvas
    for direct drawing operations

  • Tables breaking across pages: Use

    reportlab.platypus.LongTable
    instead of
    Table

General Issues

  • Unicode/encoding errors in any format:

    • Add
      -f markdown+utf8
      to pandoc command
    • Ensure source file is UTF-8 encoded:
      file -i source.md
  • Special characters not rendering in PDF:

    • For pandoc: Use character replacement table or ReportLab
    • For ReportLab: Register appropriate Unicode fonts

Installation Commands

# Install pandoc
apt-get install pandoc  # Debian/Ubuntu
brew install pandoc     # macOS

# Install LaTeX (for pdflatex)
apt-get install texlive-latex-recommended texlive-fonts-recommended

# Install wkhtmltopdf (alternative PDF engine)
apt-get install wkhtmltopdf

# Install ReportLab
pip install reportlab

# Install fonts for Unicode support
apt-get install fonts-noto-core  # Comprehensive Unicode font

Unicode Sanitization Script (For Pandoc Path)

For repeated use with pandoc, create a reusable sanitization script:

#!/bin/bash
# sanitize_for_pdf.sh - Replace problematic unicode chars for LaTeX/PDF
if [ -z "$1" ]; then
  echo "Usage: $0 <input.md> [output.md]"
  exit 1
fi
INPUT="$1"
OUTPUT="${2:-${1%.md}_sanitized.md}"

sed -e 's/—/--/g' \
    -e 's/–/-/g' \
    -e 's/"([^"]*)"/"\1"/g' \
    -e "s/'([^']*)'/\1'/g" \
    -e 's/…/.../g' \
    -e 's/→/->/g' \
    -e 's/←/<-/g' \
    -e 's/✓/[x]/g' \
    -e 's/✗/[ ]/g' \
    -e 's/©/(c)/g' \
    -e 's/®/(r)/g' \
    -e 's/™/(tm)/g' \
    "$INPUT" > "$OUTPUT"

echo "Sanitized: $INPUT -> $OUTPUT"

Save as

sanitize_for_pdf.sh
, make executable with
chmod +x sanitize_for_pdf.sh
, then use:

run_shell
command: ./sanitize_for_pdf.sh /tmp/document_source.md /tmp/document_source_sanitized.md

Decision Flowchart

Need PDF generation?
    │
    ├─ Heavy Unicode/symbols? ──YES──> Use ReportLab
    │         │
    │         NO
    │         │
    ├─ LaTeX available? ──NO──> Use ReportLab
    │         │
    │         YES
    │         │
    ├─ Need Markdown formatting? ──YES──> Use Pandoc (with sanitization)
    │         │
    │         NO
    │         │
    └─ Need programmatic layout? ──YES──> Use ReportLab

When to Return to shell_agent

After successfully completing the manual workflow once, you can attempt

shell_agent
again for similar tasks, now with a known-working fallback if errors recur. For documents with heavy Unicode content or when LaTeX is unavailable, prefer the ReportLab path.

Related Skills

  • document-gen-unicode-safe
    : Parent skill with pandoc-only approach
  • document-gen-fallback
    : Original fallback workflow without Unicode guidance
  • Use this skill when you need flexible PDF generation with multiple backend options

Example 0: Lightweight Single-Format Generation

# Quick DOCX Generation (No shell_agent needed)

## Step 1: Write content
write_file
path: /tmp/brief.md
content: |
  # Meeting Notes
  
  ## Attendees
  - Alice
  - Bob
  
  ## Decisions
  Project approved.

## Step 2: Convert directly
run_shell
command: pandoc /tmp/brief.md -o meeting_notes.docx

## Step 3: Verify
run_shell
command: ls -lh meeting_notes.docx

Example 1: Pandoc-First Approach (Multi-Format with Sanitization)

Example 2: ReportLab-First Approach (Heavy Unicode — Direct Execution)

Mode Selection Quick Reference

ScenarioRecommended ModeWhy
Simple Markdown → DOCXLightweightDirect pandoc call is fastest
Single PDF from ASCII contentLightweightNo sanitization needed
Multi-format batch (DOCX + PDF + HTML)Full WorkflowCoordinated conversion with error handling
Content with Unicode/symbolsLightweight + ReportLabDirect Python script avoids LaTeX issues
Uncertain about LaTeX availabilityFull WorkflowBuilt-in fallback detection
Need programmatic layout controlFull WorkflowReportLab integration with fallback logic
Production pipeline with error recoveryFull WorkflowAutomated retry and backend switching

Troubleshooting

  • Missing reportlab: Install

ReportLab Issues

  • Missing reportlab: Install via
    pip install reportlab