GB-Power-Market-JJ article-tts
拍照或文字转音频:文章照片 OCR 提取文字,或直接接收文字,生成 Microsoft Edge TTS 语音,支持中英文、自动转写、语速调节、逐句拆分。| Capture article photos (OCR) or plain text, generate natural audio via Edge TTS. Bilingual support (EN/ZH), configurable speed, voice, and sentence splitting.
install
source · Clone the upstream repo
git clone https://github.com/GeorgeDoors888/GB-Power-Market-JJ
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GeorgeDoors888/GB-Power-Market-JJ "$T" && mkdir -p ~/.claude/skills && cp -r "$T/openclaw-skills/skills/54meteor/article-tts" ~/.claude/skills/georgedoors888-gb-power-market-jj-article-tts && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GeorgeDoors888/GB-Power-Market-JJ "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/openclaw-skills/skills/54meteor/article-tts" ~/.openclaw/skills/georgedoors888-gb-power-market-jj-article-tts && rm -rf "$T"
manifest:
openclaw-skills/skills/54meteor/article-tts/SKILL.mdsource content
Article TTS Skill
Default Configuration
| 参数 | 默认值 | 说明 |
|---|---|---|
| | 语言: 或 |
| | 是否跳过文字确认步骤 |
| | TTS 语速( = 90%) |
| (英文)/ (中文) | TTS 声音 |
| | 是否生成按句拆分的音频 |
Supported Languages
| 语言 | OCR 语言包 | TTS Voice |
|---|---|---|
| (预装) | |
| (需安装) | |
中文 OCR 语言包安装:
- Linux(WSL/Debian/Ubuntu):
apt-get install tesseract-ocr-chi-sim- macOS:
(自带中文)brew install tesseract-lang- Windows:下载
放入 Tesseract 安装目录的chi_sim.traineddata文件夹tessdata
Workflow
Input Types
- 图片:OCR 提取文字(需要
指定语言)lang - 纯文字:直接 TTS,无需 OCR
Standard Flow(默认,需确认)
图片 → OCR 提取文字 → 展示给用户确认 → 用户确认 → 生成 TTS → 发送 文字 → 直接生成 TTS → 发送
Skip-Confirmation Flow ⚠️
用户说"不需要确认"或"直接生成"时,跳过确认步骤。
⚠️ 安全提示:skipConfirmation 会跳过文字确认步骤,OCR 提取的文本(可能包含敏感信息)会直接转为音频并发送。适用于可信来源、低敏感内容。建议默认关闭(
)。skipConfirmation: false
OCR Step
# 图片预处理 from PIL import Image, ImageOps img = Image.open(image_path) img = ImageOps.autocontrast(img.convert('L'), cutoff=10) w, h = img.size img = img.resize((w*4, h*4), Image.LANCZOS) img.save('/tmp/ocr_input.jpg', quality=99)
# 英文 tesseract /tmp/ocr_input.jpg stdout -l eng --psm 4 # 中文 tesseract /tmp/ocr_input.jpg stdout -l chi_sim --psm 4
TTS Step
全文字频
uvx edge-tts \ -t "FULL TEXT" \ -v en-US-EmmaNeural \ --rate=-10% \ --write-media OUTPUT_DIR/full_article.mp3 # 中文 uvx edge-tts \ -t "中文文字内容" \ -v zh-CN-XiaoxiaoNeural \ --rate=-10% \ --write-media OUTPUT_DIR/full_article.mp3
按句拆分(仅 splitSentences=true)
import subprocess, re def split_sentences(text, lang='en'): if lang == 'zh': # 中文按句号/感叹号/问号拆分 sentences = re.split(r'(?<=[。!?])\s*', text) else: # 英文按 .!? 拆分 sentences = re.split(r'(?<=[.!?])\s+', text) return [s.strip() for s in sentences if s.strip()] sentences = split_sentences(text, lang=lang) for i, sentence in enumerate(sentences, 1): num = str(i).zfill(2) voice = 'zh-CN-XiaoxiaoNeural' if lang == 'zh' else 'en-US-EmmaNeural' subprocess.run([ "uvx", "edge-tts", "-t", sentence, "-v", voice, "--rate=-10%", "--write-media", f"OUTPUT_DIR/sentence_{num}.mp3" ])
Output Directory
/mnt/d/wslspace/workspace/articles/YYYY-MM-DD-article-slug/ ├── original_text.md ├── full_article.mp3 └── sentence_01.mp3 ...
Sending via Message Channel
The agent detects the active channel from the runtime context and calls
message(...) accordingly. No hardcoded channel — the agent uses whichever channel the user is currently chatting through.
# Detect active channel automatically (from runtime inbound metadata) # channel is inferred: feishu / telegram / discord / whatsapp / signal / imessage / openclaw-weixin # 发送全文 message(action="send", channel="{active_channel}", message="📄 全文音频", media="PATH/full_article.mp3", filename="full_article.mp3") # 发送每句 for i, sentence in enumerate(sentences, 1): num = str(i).zfill(2) message(action="send", channel="{active_channel}", message=f"📝 {num}: {sentence}", media=f"PATH/sentence_{num}.mp3", filename=f"sentence_{num}.mp3")
Channel Behavior Notes
| Channel | 音频支持 | 备注 |
|---|---|---|
| Feishu | ✅ | 直接发送 mp3 |
| Telegram | ✅ | 直接发送 mp3 |
| Discord | ✅ | 作为附件发送 |
| ✅ | 直接发送 mp3 | |
| Signal | ⚠️ | 取决于信号强度,可能不支持 |
| iMessage | ⚠️ | 通过 macOS 发送,mp3 兼容性一般 |
| WeChat Work | ✅ | 同 Feishu |
If the channel does not support audio, the agent saves the file to
OUTPUT_DIR and sends the file path as a text message instead.
Available TTS Voices
English
en-US-EmmaNeural, en-US-BrianNeural, en-GB-LibbyNeural, ...
Chinese
zh-CN-XiaoxiaoNeural(女声), zh-CN-YunxiNeural(男声), zh-CN-YunyangNeural(新闻男声), ...
查看完整列表:
uvx edge-tts -l | grep "zh-CN"
Notes
- Tesseract + English 预装;中文需
apt-get install tesseract-ocr-chi-sim - edge-tts 通过
运行,无需安装uvx - 图片质量直接影响 OCR 效果,尽量保持光线充足、角度端正