article-translation

Translate web pages and PDF documents to Korean, save as markdown files. Supports image/table preservation, JS-rendered pages, large PDF chunk processing. Use when 번역, translation, translate, 한글로, Korean

install
source · Clone the upstream repo
git clone https://github.com/seanlion/awesome-skills-for-claude
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/seanlion/awesome-skills-for-claude "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/article-translation" ~/.claude/skills/seanlion-awesome-skills-for-claude-article-translation && rm -rf "$T"
manifest: .claude/skills/article-translation/SKILL.md
source content

Article Translation

Translate web pages or PDF documents to Korean and save as markdown files.


Quick Start

  1. Extract content: Determine source type (URL or PDF path), use appropriate extraction method
  2. Translate: Create paragraph-level plan with TodoWrite, translate to Korean
  3. Save: Generate
    .md
    file with original title, review and finalize

Workflow

Step 0: Check for Existing Checkpoint

ALWAYS check for checkpoint file before starting:

  1. Look for
    .translation-checkpoint-{filename}.json
    in current directory
  2. If checkpoint exists, read it and resume from last completed section
  3. If no checkpoint, start fresh from Step 1

Step 1: Source Analysis

Determine source type and select extraction method:

Source TypeDetection CriteriaProcessing Method
PDF
.pdf
extension
Use pdfplumber script (NEVER use Read tool for PDF)
Static Web
curl
returns content
Download with curl
JS-rendered Web
curl
returns empty/incomplete
Use Playwright script

Step 2: Content Extraction

For PDF:

  1. Check file info (size, page count)
  2. ALWAYS use pdfplumber script from scripts.md - Read tool will fail with "Too large" error
  3. Use page range for chunk processing if needed

For Web pages:

  1. Try
    curl
    first
  2. If content is empty/incomplete, use Playwright script for JS rendering
  3. Verify full content is captured

Step 3: Translation with Checkpointing

  1. Create paragraph-level translation plan with TodoWrite
  2. Before translating each section: Update checkpoint file with current progress
  3. Translate paragraph by paragraph to Korean
  4. After completing each section: Save checkpoint with completed sections list
  5. Download images to local storage
  6. Preserve table structure in markdown, translate cell content only
  7. Run large file downloads in background

Checkpoint file format (

.translation-checkpoint-{filename}.json
):

{
  "source_url": "https://example.com/article",
  "source_type": "web",
  "output_file": "Article Title.md",
  "total_sections": 10,
  "completed_sections": [1, 2, 3],
  "current_section": 4,
  "last_updated": "2024-01-15T10:30:00",
  "partial_content": "... translated content so far ..."
}

Step 4: Review & Correction

  1. Compare each paragraph with original
  2. Identify and fix awkward translations
  3. Check for missing content
  4. Save final markdown file
  5. Delete checkpoint file after successful completion

Output Format

Filename Rule

  • Use original title (remove special characters)
  • 예:
    Understanding React Hooks.md
  • 한글 제목은 원문 제목이 한글일 때만 사용 가능:
    리액트 훅 이해하기.md

Markdown Structure

# 번역된 제목

> 원문: [원문 제목](원문 URL)

## 섹션 1

본문 내용...

![이미지 설명](./images/image1.png)

| 컬럼1 | 컬럼2 |
|------|------|
| 내용1 | 내용2 |

## 섹션 2

...

Best Practices

  1. Paragraph-level translation: Never translate entire document at once. Process paragraph by paragraph for quality control
  2. Technical terms: Include original term in parentheses when needed (예: "상태 관리(State Management)")
  3. Image alt text: Translate image descriptions for accessibility
  4. Table structure: Preserve markdown table format, translate cell content only
  5. Code blocks: Never translate code. Only translate comments if necessary

Common Issues

IssueCauseSolution
PDF "Too large" errorAttempted to read large PDF with Read toolUse pdfplumber script instead (scripts.md)
pdfplumber not workingNot installed
pip3 install pdfplumber
Empty web page contentJS-rendered pageUse Playwright script
Broken imagesRelative path issueConvert to absolute path or download locally
Awkward translationLiteral translationConsider context, paraphrase in review step
Playwright not workingchromium not installed
playwright install chromium
Translation interruptedSession timeout or errorResume from checkpoint file (Step 0)
Duplicate translation workDidn't check for checkpointAlways run Step 0 first to check existing progress

Examples

예시 1: 웹페이지 번역 (신규)

요청:

https://example.com/react-hooks-guide 번역해줘

처리 흐름:

  1. 체크포인트 파일 확인 → 없음, 신규 시작
  2. curl로 페이지 다운로드 시도
  3. 콘텐츠 확인 후 문단별 TodoWrite 작성
  4. 각 섹션 번역 (섹션 완료마다 체크포인트 저장)
  5. React Hooks Guide.md
    파일 생성
  6. 원문 비교 검토 후 최종 저장
  7. 체크포인트 파일 삭제

예시 2: 번역 재개 (체크포인트에서)

요청:

https://example.com/react-hooks-guide 번역해줘

처리 흐름:

  1. 체크포인트 파일 확인 →
    .translation-checkpoint-react-hooks-guide.json
    발견
  2. 체크포인트 읽기: 섹션 1-3 완료, 현재 섹션 4
  3. 섹션 4부터 번역 재개
  4. 나머지 섹션 번역 (섹션 완료마다 체크포인트 갱신)
  5. 최종 저장 후 체크포인트 파일 삭제

예시 3: PDF 번역

요청:

/path/to/whitepaper.pdf 한글로 번역해줘

처리 흐름:

  1. 체크포인트 파일 확인 → 없음, 신규 시작
  2. PDF 정보 확인 (50페이지, 5MB)
  3. pdfplumber로 전체 텍스트 추출
  4. 문단별 TodoWrite 작성 (10개 섹션)
  5. 섹션별 번역 진행 (섹션 완료마다 체크포인트 저장)
  6. 이미지 다운로드 (백그라운드)
  7. Whitepaper.md
    파일 생성
  8. 검토 및 교정 후 체크포인트 삭제

Code Reference

자세한 스크립트는 scripts.md 참조:

  • PDF 추출 (pdfplumber) - 전체/부분 페이지
  • JS 렌더링 페이지 추출 (Playwright)
  • 설치 가이드