article-translation
Translate web pages and PDF documents to Korean, save as markdown files. Supports image/table preservation, JS-rendered pages, large PDF chunk processing. Use when 번역, translation, translate, 한글로, Korean
install
source · Clone the upstream repo
git clone https://github.com/seanlion/awesome-skills-for-claude
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/seanlion/awesome-skills-for-claude "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/article-translation" ~/.claude/skills/seanlion-awesome-skills-for-claude-article-translation && rm -rf "$T"
manifest:
.claude/skills/article-translation/SKILL.mdsource content
Article Translation
Translate web pages or PDF documents to Korean and save as markdown files.
Quick Start
- Extract content: Determine source type (URL or PDF path), use appropriate extraction method
- Translate: Create paragraph-level plan with TodoWrite, translate to Korean
- Save: Generate
file with original title, review and finalize.md
Workflow
Step 0: Check for Existing Checkpoint
ALWAYS check for checkpoint file before starting:
- Look for
in current directory.translation-checkpoint-{filename}.json - If checkpoint exists, read it and resume from last completed section
- If no checkpoint, start fresh from Step 1
Step 1: Source Analysis
Determine source type and select extraction method:
| Source Type | Detection Criteria | Processing Method |
|---|---|---|
extension | Use pdfplumber script (NEVER use Read tool for PDF) | |
| Static Web | returns content | Download with curl |
| JS-rendered Web | returns empty/incomplete | Use Playwright script |
Step 2: Content Extraction
For PDF:
- Check file info (size, page count)
- ALWAYS use pdfplumber script from scripts.md - Read tool will fail with "Too large" error
- Use page range for chunk processing if needed
For Web pages:
- Try
firstcurl - If content is empty/incomplete, use Playwright script for JS rendering
- Verify full content is captured
Step 3: Translation with Checkpointing
- Create paragraph-level translation plan with TodoWrite
- Before translating each section: Update checkpoint file with current progress
- Translate paragraph by paragraph to Korean
- After completing each section: Save checkpoint with completed sections list
- Download images to local storage
- Preserve table structure in markdown, translate cell content only
- Run large file downloads in background
Checkpoint file format (
.translation-checkpoint-{filename}.json):
{ "source_url": "https://example.com/article", "source_type": "web", "output_file": "Article Title.md", "total_sections": 10, "completed_sections": [1, 2, 3], "current_section": 4, "last_updated": "2024-01-15T10:30:00", "partial_content": "... translated content so far ..." }
Step 4: Review & Correction
- Compare each paragraph with original
- Identify and fix awkward translations
- Check for missing content
- Save final markdown file
- Delete checkpoint file after successful completion
Output Format
Filename Rule
- Use original title (remove special characters)
- 예:
Understanding React Hooks.md - 한글 제목은 원문 제목이 한글일 때만 사용 가능:
리액트 훅 이해하기.md
Markdown Structure
# 번역된 제목 > 원문: [원문 제목](원문 URL) ## 섹션 1 본문 내용...  | 컬럼1 | 컬럼2 | |------|------| | 내용1 | 내용2 | ## 섹션 2 ...
Best Practices
- Paragraph-level translation: Never translate entire document at once. Process paragraph by paragraph for quality control
- Technical terms: Include original term in parentheses when needed (예: "상태 관리(State Management)")
- Image alt text: Translate image descriptions for accessibility
- Table structure: Preserve markdown table format, translate cell content only
- Code blocks: Never translate code. Only translate comments if necessary
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| PDF "Too large" error | Attempted to read large PDF with Read tool | Use pdfplumber script instead (scripts.md) |
| pdfplumber not working | Not installed | |
| Empty web page content | JS-rendered page | Use Playwright script |
| Broken images | Relative path issue | Convert to absolute path or download locally |
| Awkward translation | Literal translation | Consider context, paraphrase in review step |
| Playwright not working | chromium not installed | |
| Translation interrupted | Session timeout or error | Resume from checkpoint file (Step 0) |
| Duplicate translation work | Didn't check for checkpoint | Always run Step 0 first to check existing progress |
Examples
예시 1: 웹페이지 번역 (신규)
요청:
https://example.com/react-hooks-guide 번역해줘
처리 흐름:
- 체크포인트 파일 확인 → 없음, 신규 시작
- curl로 페이지 다운로드 시도
- 콘텐츠 확인 후 문단별 TodoWrite 작성
- 각 섹션 번역 (섹션 완료마다 체크포인트 저장)
파일 생성React Hooks Guide.md- 원문 비교 검토 후 최종 저장
- 체크포인트 파일 삭제
예시 2: 번역 재개 (체크포인트에서)
요청:
https://example.com/react-hooks-guide 번역해줘
처리 흐름:
- 체크포인트 파일 확인 →
발견.translation-checkpoint-react-hooks-guide.json - 체크포인트 읽기: 섹션 1-3 완료, 현재 섹션 4
- 섹션 4부터 번역 재개
- 나머지 섹션 번역 (섹션 완료마다 체크포인트 갱신)
- 최종 저장 후 체크포인트 파일 삭제
예시 3: PDF 번역
요청:
/path/to/whitepaper.pdf 한글로 번역해줘
처리 흐름:
- 체크포인트 파일 확인 → 없음, 신규 시작
- PDF 정보 확인 (50페이지, 5MB)
- pdfplumber로 전체 텍스트 추출
- 문단별 TodoWrite 작성 (10개 섹션)
- 섹션별 번역 진행 (섹션 완료마다 체크포인트 저장)
- 이미지 다운로드 (백그라운드)
파일 생성Whitepaper.md- 검토 및 교정 후 체크포인트 삭제
Code Reference
자세한 스크립트는 scripts.md 참조:
- PDF 추출 (pdfplumber) - 전체/부분 페이지
- JS 렌더링 페이지 추출 (Playwright)
- 설치 가이드