CheatCodes-Skill-Library document-processing
name: document-processing
git clone https://github.com/jac007x/CheatCodes-Skill-Library
skills/document-processing/skill.yamlname: document-processing version: 1.0.0 description: PDF, PowerPoint, and Form extraction patterns for People workflows author: jac007x source: curated source_urls:
- https://github.com/anthropics/anthropic-cookbook
- https://github.com/Unstructured-IO/unstructured curation_date: 2026-03-18
Compliance
compliance: review_date: 2026-03-18 reviewer: jac007x policies: - AI-01-02 - DG-01-ST-02 - Ethical AI Principles status: compliant risk_level: medium pii_handling: true # Documents may contain PII next_review: 2026-06-18
Model Recommendation
model_recommendation: sonnet model_rationale: > PDF/PPTX analysis and chart interpretation require visual understanding and moderate reasoning. Sonnet handles document analysis well. Use Haiku only for simple text extraction without analysis.
tags:
- powerpoint
- forms
- document-extraction
- ocr
- data-extraction
- people
- hr
status: active
capabilities: pdf: - text_extraction - summarization - policy_parsing - benefits_extraction powerpoint: - slide_content_extraction - chart_interpretation - speaker_notes forms: - structured_data_extraction - survey_processing - intake_form_parsing images: - ocr - chart_analysis - diagram_interpretation
requires: python: ">=3.10" packages: - anthropic - unstructured[all-docs] # optional, for advanced parsing api: - anthropic # Claude API key
patterns: pdf_summarization: description: Executive summary extraction from PDFs use_cases: - Policy documents - Benefits guides - Compliance docs
pptx_analysis: description: PowerPoint slide content extraction use_cases: - MBR deck analysis - Training materials - Town hall presentations
form_extraction: description: Structured data from forms use_cases: - Employee intake - Survey responses - Feedback forms
people_function_use_cases:
- Policy document processing
- MBR deck analysis
- Survey response aggregation
- Benefits guide parsing
- Training material extraction
- Compliance document review
integration: mbr_engine: - Parse source data files - Extract policy references - Analyze existing decks