nutrient-document-processing
install
source · Clone the upstream repo
git clone https://github.com/PSPDFKit-labs/nutrient-agent-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/PSPDFKit-labs/nutrient-agent-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/nutrient-document-processing" ~/.claude/skills/pspdfkit-labs-nutrient-agent-skill-nutrient-document-processing && rm -rf "$T"
manifest:
nutrient-document-processing/SKILL.mdsource content
Nutrient Document Processing
Use Nutrient DWS for managed document workflows where fidelity, compliance, or multi-step processing matters more than local-tool convenience.
Setup
- Get a Nutrient DWS API key at https://dashboard.nutrient.io/sign_up/?product=processor.
- Direct API calls use
.Authorization: Bearer $NUTRIENT_API_KEYexport NUTRIENT_API_KEY="nutr_sk_..." - MCP setups commonly use
with@nutrient-sdk/dws-mcp-server
.NUTRIENT_DWS_API_KEY - Scripts live in
relative to this SKILL.md. Use the directory containing this SKILL.md as the working directory:scripts/cd <directory containing this SKILL.md> && uv run scripts/<script>.py --help - Page ranges use
with 0-based indexes and end-exclusive semantics. Negative indexes count from the end.start:end
When to use
- Generate PDFs from HTML templates, uploaded assets, or remote URLs.
- Convert Office, HTML, image, and PDF files between supported formats.
- OCR scans and extract text, tables, or key-value pairs.
- Redact PII, watermark, sign, fill forms, merge, split, rotate, flatten, or encrypt PDFs.
- Produce delivery targets like PDF/A, PDF/UA, optimized PDFs, or linearized PDFs.
- Check credits before large, batch, or AI-heavy runs.
Tool preference
- Prefer
for covered single-operation workflows.scripts/*.py - Use
for multi-step jobs that should still run through the Python client.assets/templates/custom-workflow-template.py - Use the modular
docs and direct API payloads for capabilities that do not yet have a dedicated helper script, especially HTML/URL generation and compliance tuning.references/ - Use local PDF utilities only for lightweight inspection. Use Nutrient when output fidelity or compliance matters.
Single-operation scripts
-> convert betweenconvert.py
,pdf
,pdfa
,pdfua
,docx
,xlsx
,pptx
,png
,jpeg
,webp
, andhtmlmarkdown
-> merge multiple files into one PDFmerge.py
-> split one PDF into multiple PDFs by page rangessplit.py
-> append blank pagesadd-pages.py
-> remove specific pagesdelete-pages.py
-> reorder or duplicate pages into a new PDFduplicate-pages.py
-> rotate selected pagesrotate.py
-> OCR scanned PDFs or imagesocr.py
-> extract text to JSONextract-text.py
-> extract tablesextract-table.py
-> extract key-value pairsextract-key-value-pairs.py
-> apply a text watermarkwatermark-text.py
-> detect and apply AI-powered redactionsredact-ai.py
-> digitally sign a local PDFsign.py
-> write encrypted output PDFspassword-protect.py
-> apply optimization and linearization-style options via JSONoptimize.py
Multi-Step Workflow Rule
Do not add new committed pipeline scripts under
scripts/.
When the user asks for multiple operations in one run:
- Copy
to a temporary location such asassets/templates/custom-workflow-template.py
./tmp/ndp-workflow-<task>.py - Implement the combined workflow in that temporary script.
- Run it with
.uv run /tmp/ndp-workflow-<task>.py ... - Return generated output files.
- Delete the temporary script unless the user explicitly asks to keep it.
PDF Requirements
requires a multi-page PDF and cannot extract ranges from a single-page document.split.py
must retain at least one page and cannot delete the entire document.delete-pages.py
only accepts local file paths for the main PDF.sign.py
Decision rules
- Prefer a helper script when one already covers the requested operation cleanly.
- If you control the source markup, prefer HTML generation over browser print workflows.
- Use remote
inputs when the source already lives at a stable URL and you want to avoid local uploads.file.url - Use
for conversion and finalization targets. Useoutput.type
for transformations when building direct API payloads.actions - OCR before text extraction, key-value extraction, or semantic redaction on scans.
- Prefer preset or regex redaction when the target is explicit. Use AI redaction only for contextual or natural-language requests.
- Use the PDF manipulation reference for merge, split, rotate, flatten, and page-range workflows instead of inferring those payloads from conversion examples.
- Treat PDF/A and PDF/UA as compliance targets, not cosmetic export formats. Choose the target up front and validate final artifacts when requirements are contractual.
- For PDF/UA, clean born-digital inputs and structured HTML usually tag better than rasterized or flattened source PDFs.
- For delivery optimization, linearize or optimize unsigned output artifacts instead of mutating already signed files.
- When the user asks for multiple steps, keep destructive or final steps late in the sequence. Use the workflow recipes when ordering is ambiguous.
Anti-patterns
- Do not OCR born-digital PDFs just because the task mentions extraction. Extract first and OCR only if the text layer is missing.
- Do not flatten forms or annotations until the user confirms the artifact no longer needs to stay editable.
- Do not sign, archive, or linearize intermediate working files. Keep those as final-delivery steps.
- Do not promise PDF/A or PDF/UA compliance without a validation step when the requirement is contractual.
- Do not commit temporary workflow scripts under
.scripts/
Reference map
Read only what you need:
-> endpoint model, auth, multipart vs JSON, credits, limits, and errorsreferences/request-basics.md
-> HTML/URL generation and format conversionreferences/generation-and-conversion.md
-> merge, split, page-range, rotate, and flatten workflowsreferences/pdf-manipulation.md
-> OCR, text extraction, tables, and key-value workflowsreferences/extraction-and-ocr.md
-> redaction, watermarking, signatures, forms, and passwordsreferences/security-signing-and-forms.md
-> PDF/A, PDF/UA, optimization, and linearizationreferences/compliance-and-optimization.md
-> end-to-end sequencing patterns for common business document workflowsreferences/workflow-recipes.md
Rules
- Fail fast when required arguments are missing.
- Write outputs to explicit paths and print created files.
- Do not log secrets.
- All client methods are async and should run via
.asyncio.run(main()) - If import fails, install dependency with
.uv add nutrient-dws
Security Hardening Addendum
- Prefer a pinned, preinstalled MCP server binary over runtime package fetches.
- Preferred:
npm i -g @nutrient-sdk/dws-mcp-server@<pinned-version> - Avoid unpinned runtime fetch in production paths.
- Preferred:
- Never store
in committed JSON config files.NUTRIENT_DWS_API_KEY- Use process env injection at runtime (shell/export, secrets manager, or host env).
- Restrict file access with
to the minimum required working directory.SANDBOX_PATH - Before enabling MCP mode in production, verify package provenance and lock version.