Skills office-toolkit
处理 Office 文档(Word/Excel/PPT/PDF)的技能。当用户要求读取、创建、编辑 Word 文档(.docx)、Excel 表格(.xlsx/.csv)、PPT(.pptx)或 PDF 时使用。基于 python-docx、openpyxl、python-pptx、pypdf 库。Requires: python-docx, openpyxl, python-pptx, pypdf, pandoc, LibreOffice(验证用)。
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/axelhu/openclaw-office-toolkit" ~/.claude/skills/clawdbot-skills-office-toolkit && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/axelhu/openclaw-office-toolkit" ~/.openclaw/skills/clawdbot-skills-office-toolkit && rm -rf "$T"
manifest:
skills/axelhu/openclaw-office-toolkit/SKILL.mdsource content
office-toolkit
处理 Office 文档:Word(.docx)、Excel(.xlsx/.csv)、PPT(.pptx)、PDF。
环境要求
pip install --break-system-packages python-docx openpyxl python-pptx pypdf sudo apt install libreoffice-writer libreoffice-calc libreoffice-impress pandoc
快速参考
| 任务 | 库/命令 |
|---|---|
| 读 Word | 或 |
| 创建/编辑 Word | |
| 读 Excel | 或 |
| 创建/编辑 Excel | |
| 读 PPT | |
| 创建/编辑 PPT | |
| 读 PDF | 或 |
| PDF 格式验证 | LibreOffice |
| PDF 转图片 | (poppler-utils) |
Word (.docx)
读取
from docx import Document doc = Document('file.docx') for para in doc.paragraphs: print(para.text) # 带格式提取 import subprocess result = subprocess.run(['pandoc', '--track-changes=all', 'file.docx', '-t', 'markdown'], capture_output=True, text=True) print(result.stdout)
创建
from docx import Document from docx.shared import Pt, Inches doc = Document() # 标题 doc.add_heading('文档标题', 0) # 段落 p = doc.add_paragraph('正文内容') p.runs[0].bold = True # 加粗 p.runs[0].font.size = Pt(12) p.runs[0].font.name = 'Arial' # 引用 doc.add_paragraph('引用内容', style='Intense Quote') # 表格 table = doc.add_table(rows=2, cols=3) table.style = 'Light Grid Accent 1' table.rows[0].cells[0].text = '表头1' table.rows[0].cells[1].text = '表头2' doc.save('output.docx')
编辑现有文档
- 解压 → 修改 XML → 重新打包(推荐用 python-docx 直接修改)
- 复杂格式建议用 LibreOffice 打开编辑
Excel (.xlsx)
读取
import openpyxl wb = openpyxl.load_workbook('file.xlsx') ws = wb.active for row in ws.iter_rows(values_only=True): print(row)
创建/编辑
import openpyxl from openpyxl.styles import Font, PatternFill, Alignment wb = openpyxl.Workbook() ws = wb.active ws.title = '数据' # 写入 ws['A1'] = '姓名' ws['B1'] = '年龄' ws['A2'] = '张三' ws['B2'] = 25 # 格式化 header_fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid') ws['A1'].fill = header_fill ws['A1'].font = Font(color='FFFFFF', bold=True) ws['A1'].alignment = Alignment(horizontal='center') # 保存 wb.save('output.xlsx')
格式化规则(财务场景)
- 蓝色字体:硬编码输入值
- 黑色字体:公式/计算
- 绿色字体:同文件内链接
- 红色字体:外部链接
- 黄色背景:需要关注的假设
PowerPoint (.pptx)
读取
from pptx import Presentation prs = Presentation('file.pptx') for slide in prs.slides: for shape in slide.shapes: if hasattr(shape, 'text'): print(shape.text)
创建
from pptx import Presentation from pptx.util import Inches, Pt prs = Presentation() prs.slide_width = Inches(10) prs.slide_height = Inches(7.5) # 使用空白布局 slide = prs.slides.add_slide(prs.slide_layouts[6]) title = slide.shapes.title subtitle = slide.placeholders[1] title.text = '演示标题' subtitle.text = '副标题' prs.save('output.pptx')
设计原则
- 颜色方案:选一个大胆的配色,主色占60-70%,1-2个辅助色,1个尖锐强调色
- 不要默认蓝色:根据主题选配色
- 深浅对比:标题页用深色背景,结论页用浅色背景("三明治"结构)
- 排版留白:不要堆满,留呼吸空间
读取
from pypdf import PdfReader reader = PdfReader('file.pdf') print(f'页数: {len(reader.pages)}') for page in reader.pages: print(page.extract_text())
合并
from pypdf import PdfWriter, PdfReader writer = PdfWriter() for pdf_file in ['doc1.pdf', 'doc2.pdf']: reader = PdfReader(pdf_file) for page in reader.pages: writer.add_page(page) with open('merged.pdf', 'wb') as f: writer.write(f)
分割
reader = PdfReader('input.pdf') for i, page in enumerate(reader.pages): writer = PdfWriter() writer.add_page(page) with open(f'page_{i+1}.pdf', 'wb') as f: writer.write(f)
旋转
page = reader.pages[0] page.rotate(90) # 顺时针90度
依赖安装(当前环境状态)
| 依赖 | 状态 |
|---|---|
| python-docx | ✅ 已安装 |
| openpyxl | ✅ 已安装 |
| python-pptx | ✅ 已安装 |
| pypdf | ✅ 已安装 |
| pandoc | ✅ 已安装 |
| LibreOffice | ✅ 已安装 |
Agent Rules
- 创建文件前先确保目录可写
- 复杂文档先尝试 python-docx 等库,库无法处理再用 LibreOffice
- PDF 读取优先用
,文字提取效果差时用pypdfpandoc - Excel 格式化参照财务规范(蓝/黑/绿/红字体 + 黄色背景)
- LibreOffice 路径:
或soffice
(已安装)libreoffice - 收到文件路径时,先检查文件是否存在:
pathlib.Path(path).exists()