Clawhub-skills Summarize Pro
7-mode intelligent summarizer with Chain-of-Density, 8 languages, and smart caching
git clone https://github.com/traygerbig/clawhub-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/traygerbig/clawhub-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/archive/summarize-pro" ~/.claude/skills/traygerbig-clawhub-skills-summarize-pro && rm -rf "$T"
archive/summarize-pro/SKILL.md╭──────────────────────────────────────────╮ │ │ │ 📝 S U M M A R I Z E P R O 📝 │ │ │ │ 📄📄📄📄📄 │ │ ╲ ╲ │ ╱ ╱ │ │ ╲ ╲│╱ ╱ │ │ ▼ ▼ ▼ │ │ ┌──────────┐ │ │ │ ✨TL;DR │ "7 modes. │ │ │ ══════ │ 8 languages. │ │ │ ▪ ▪ ▪ │ Zero fluff." │ │ └──────────┘ │ ╰──────────────────────────────────────────╯
Summarize Pro
📝 7 Modes 🌍 8 Languages 🔗 Chain-of-Density 💾 Cached v1.0.0
Advanced multi-format summarization with 7 output modes, multi-language support, chain-of-density compression, and intelligent caching.
Author: hanabi-jpn | Version: 1.0.0 | License: MIT Tags:
summarization productivity multi-language content documents
Overview
Summarize Pro transforms any content — PDFs, web pages, YouTube videos, code files, slide decks — into exactly the format you need. Choose from 7 output modes, 8 languages, and 4 depth levels. Powered by Chain-of-Density compression for information-dense summaries that miss nothing important.
┌───────────────────────────────────────────┐ │ SUMMARIZE PRO PIPELINE │ │ │ │ ┌─────────┐ ┌──────────┐ ┌─────────┐ │ │ │ FETCH │─▶│ EXTRACT │─▶│ ANALYZE │ │ │ │ Source │ │ Text │ │ Content │ │ │ └─────────┘ └──────────┘ └─────────┘ │ │ │ │ │ ┌──────────────────────────┘ │ │ ▼ │ │ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ │ │ CHAIN │─▶│ FORMAT │─▶│ OUTPUT │ │ │ │of Density│ │ 7 Modes │ │ Cache │ │ │ └──────────┘ └──────────┘ └─────────┘ │ └───────────────────────────────────────────┘
System Prompt Instructions
You are an agent equipped with Summarize Pro. When the user asks you to summarize content, follow these instructions:
Source Handling
Determine the source type and extract content:
-
URL (web page):
- Fetch the page content
- Extract main article text using readability heuristics (ignore nav, ads, footer)
- If JavaScript-heavy, note that content may be incomplete
-
URL (YouTube):
- Extract video ID from URL
- Fetch transcript via YouTube API or yt-dlp
- If no transcript available, note this limitation
- Include video title, channel, and duration in metadata
-
Local file (PDF):
- Read the PDF content
- If scanned/image-based, note OCR may be needed
- Extract text, tables, and figure captions
-
Local file (code):
- Read source code files
- Summarize: purpose, main functions/classes, dependencies, architecture
- Do NOT just echo the code — explain the logic flow
-
Local file (text/markdown/docs):
- Read and extract all text content
- Preserve headings and structure for analysis
-
Multiple sources:
- Process each source independently
- If
flag: generate comparison analysis--compare - Otherwise: generate unified cross-document synthesis
Chain-of-Density Compression
When
--chain-of-density is specified (or for deep/exhaustive depth):
Pass 1 — Entity Extraction:
- Generate a verbose summary capturing ALL named entities, key numbers, dates, and facts
- Length: ~500 words
Pass 2 — Density Compression:
- Rewrite the summary to be 50% shorter while maintaining ALL entities
- Replace verbose phrases with concise equivalents
- Merge related sentences
Pass 3 — Polish:
- Final readability pass
- Ensure logical flow between sentences
- Verify no key information was lost
- Length: ~150-200 words of information-dense content
7 Output Modes
(default):bullets
## Summary: [Title] - Key point 1 - Key point 2 - Sub-point with supporting detail - Key point 3 - Key point 4 - Key point 5 **Source:** [URL/filename] | **Length:** [word count] | **Processed:** [date]
:executive
## Executive Brief: [Title] ### Situation [1-2 sentences on context] ### Key Findings 1. [Finding with data point] 2. [Finding with data point] 3. [Finding with data point] ### Risk Factors - [Risk 1] - [Risk 2] ### Recommendations 1. [Action item] 2. [Action item] ### Bottom Line [1 sentence conclusion]
:academic
## Abstract: [Title] **Background:** [Context and prior work] **Methods:** [How information was gathered/analyzed] **Results:** [Key findings with data] **Conclusion:** [Implications and significance] **Keywords:** [5-8 relevant terms]
:mindmap
[Central Topic] ├── Branch 1: [Theme] │ ├── Detail A │ ├── Detail B │ └── Detail C ├── Branch 2: [Theme] │ ├── Detail D │ └── Detail E ├── Branch 3: [Theme] │ ├── Detail F │ ├── Detail G │ └── Detail H └── Branch 4: [Theme] ├── Detail I └── Detail J
:table
## Summary Table: [Title] | Aspect | Details | Importance | |--------|---------|------------| | [Key 1] | [Detail] | High/Med/Low | | [Key 2] | [Detail] | High/Med/Low | | [Key 3] | [Detail] | High/Med/Low |
:timeline
## Timeline: [Title] - **[Date/Time 1]** — [Event description] - **[Date/Time 2]** — [Event description] - **[Date/Time 3]** — [Event description] - **[Current/Latest]** — [Most recent development]
:qa
## Q&A Summary: [Title] **Q: [Anticipated question 1]?** A: [Answer based on content] **Q: [Anticipated question 2]?** A: [Answer based on content] **Q: [Anticipated question 3]?** A: [Answer based on content] **Q: What's the bottom line?** A: [Concise conclusion]
Depth Levels
: 1-2 sentence TL;DR. Max 50 words.shallow
(default): 3-5 key points. Max 200 words.standard
: Comprehensive with sections. Max 500 words. Uses Chain-of-Density.deep
: Complete breakdown. Max 1500 words. Preserves all significant details.exhaustive
Multi-Language Support
Detect source language automatically. Output in source language by default.
Override with
--lang <code>:
English,en
Japanese,ja
Chinese,zh
Koreanko
Spanish,es
French,fr
German,de
Portuguesept
When translating: maintain technical terms in original language with translation in parentheses.
Comparison Mode
When
--compare is used with multiple sources:
## Comparison: [Source 1] vs [Source 2] ### Agreement - [Points both sources agree on] ### Disagreement - [Point]: Source 1 says [X], Source 2 says [Y] ### Unique to Source 1 - [Points only in Source 1] ### Unique to Source 2 - [Points only in Source 2] ### Synthesis [Overall assessment considering both sources]
Intelligent Caching
- Store summaries in
.summarize-pro/cache/ - Key: SHA-256 hash of source content + mode + depth + language
- TTL: 7 days (configurable)
- Before re-summarizing, check cache first
— Show cached summaries with metadatasummarize cache list
— Delete all cached summariessummarize cache clear
Commands
— Summarize with defaults (bullets, standard depth, auto language)summarize <source>
> summarize https://example.com/blog/ai-trends-2026 📄 Fetching: https://example.com/blog/ai-trends-2026 Source: Web page | Language: English | Words: 2,847 ## Summary: AI Trends Shaping 2026 - Multi-modal AI models now handle text, image, audio, and video in a single pipeline - Enterprise adoption reached 78%, up from 55% in 2024 - Manufacturing and healthcare lead with 89% and 82% adoption respectively - Open-source models closed the gap with proprietary offerings, reaching 92% benchmark parity - AI regulation frameworks enacted in 43 countries, with EU AI Act enforcement beginning Q1 - Agentic AI workflows replaced 34% of traditional automation pipelines **Source:** example.com/blog/ai-trends-2026 | **Length:** 2,847 words | **Processed:** 2026-03-01
— Choose output format:summarize <source> --mode <mode>
,bullets
,executive
,academic
,mindmap
,table
,timelineqa
> summarize quarterly-report.pdf --mode executive 📄 Reading: quarterly-report.pdf Source: PDF (12 pages) | Language: English | Words: 4,521 ## Executive Brief: Q4 2025 Quarterly Report ### Situation Company revenue grew 23% YoY to $45.2M in Q4, driven primarily by enterprise SaaS expansion in APAC markets. ### Key Findings 1. ARR reached $180M (+31% YoY), exceeding $170M target 2. Customer churn decreased to 4.2% (from 6.1% in Q3) 3. APAC revenue grew 67%, now representing 28% of total revenue ### Risk Factors - Rising infrastructure costs (+18%) eroding margins - Two enterprise deals ($2.1M combined) slipped to Q1 ### Recommendations 1. Accelerate APAC hiring to sustain growth trajectory 2. Renegotiate cloud provider contracts before Q2 renewal ### Bottom Line Strong quarter with all KPIs met except gross margin, which needs attention as infrastructure costs scale. **Source:** quarterly-report.pdf | **Length:** 4,521 words | **Processed:** 2026-03-01
— Choose detail level:summarize <source> --depth <level>
,shallow
,standard
,deepexhaustive
> summarize research-paper.pdf --depth shallow 📄 Reading: research-paper.pdf Source: PDF (28 pages) | Language: English | Words: 8,934 **TL;DR:** Researchers demonstrate a novel attention mechanism that reduces transformer inference latency by 40% while maintaining 99.2% accuracy on standard benchmarks, with significant implications for edge deployment. **Source:** research-paper.pdf | **Length:** 8,934 words | **Depth:** shallow | **Processed:** 2026-03-01
— Output languagesummarize <source> --lang <code>
> summarize https://example.com/press-release --lang ja 📄 Fetching: https://example.com/press-release Source: Web page | Language: English → Japanese | Words: 1,203 ## 要約: 新製品発表プレスリリース - AIアシスタント「Nova」の一般提供を2026年4月に開始 - 12言語に対応、enterprise(エンタープライズ)プラン月額$49から - Beta版で500社が導入済み、顧客満足度92%を記録 - API連携でSalesforce、Slack、Microsoft Teamsと統合可能 - SOC 2 Type IIおよびGDPR準拠のsecurity(セキュリティ)認証を取得 **ソース:** example.com/press-release | **文字数:** 1,203語 | **処理日:** 2026-03-01
— Force CoD compressionsummarize <source> --chain-of-density
> summarize long-report.pdf --chain-of-density 📄 Reading: long-report.pdf Source: PDF (45 pages) | Language: English | Words: 15,230 🔗 Chain-of-Density Compression: Pass 1 — Entity extraction (487 words, 34 entities captured) Pass 2 — Density compression (231 words, 34/34 entities retained) Pass 3 — Polish & readability (198 words, final) ## Summary: Global Supply Chain Resilience Report 2025 - Supply chain disruptions decreased 28% YoY (ITC data), though semiconductor lead times remain at 18 weeks (vs. pre-2020 baseline of 12 weeks). The $2.1T reshoring investment across US, EU, and Japan created 340K manufacturing jobs. ASEAN emerged as the primary "China+1" destination with 43% of redirected FDI, led by Vietnam ($28B) and Indonesia ($19B). AI-driven demand forecasting reduced inventory waste by 31% among early adopters (McKinsey, n=850 firms). Climate events caused $67B in logistics losses, prompting 62% of Fortune 500 companies to add climate risk to supplier evaluation criteria. The WTO projects 3.2% trade volume growth in 2026, contingent on no further geopolitical escalation. **Source:** long-report.pdf | **Length:** 15,230 words | **CoD:** 3-pass | **Processed:** 2026-03-01
— Comparison modesummarize <src1> <src2> --compare
> summarize analyst-report-a.pdf analyst-report-b.pdf --compare 📄 Reading: analyst-report-a.pdf (Goldman Sachs) 📄 Reading: analyst-report-b.pdf (Morgan Stanley) ## Comparison: Goldman Sachs vs Morgan Stanley — AI Market Outlook ### Agreement - Both project AI market to exceed $900B by 2028 - Both identify healthcare and finance as top adoption sectors - Both expect open-source models to capture 35-40% market share ### Disagreement - **Growth rate**: GS projects 34% CAGR, MS projects 28% CAGR - **Regulation impact**: GS sees minimal drag (−2%), MS sees significant drag (−8%) - **China market**: GS bullish ($180B by 2028), MS cautious ($120B) ### Unique to Goldman Sachs - Detailed analysis of AI infrastructure spending ($200B capex forecast) - Semiconductor supply chain risk assessment ### Unique to Morgan Stanley - Consumer AI adoption survey data (n=12,000) - AI talent shortage projections by region ### Synthesis Both reports agree on strong AI market growth but diverge on pace and risk factors. Goldman Sachs takes a more optimistic view driven by capex data, while Morgan Stanley emphasizes regulatory and talent headwinds. **Sources:** analyst-report-a.pdf, analyst-report-b.pdf | **Processed:** 2026-03-01
— Show cached summariessummarize cache list
> summarize cache list 📦 Cached Summaries (7 entries) | # | Source | Mode | Depth | Lang | Cached | Expires | Size | |---|--------|------|-------|------|--------|---------|------| | 1 | ai-trends-2026 (web) | bullets | standard | en | Mar 01 | Mar 08 | 1.2 KB | | 2 | quarterly-report.pdf | executive | standard | en | Mar 01 | Mar 08 | 2.1 KB | | 3 | research-paper.pdf | bullets | shallow | en | Feb 28 | Mar 07 | 0.4 KB | | 4 | long-report.pdf | bullets | deep | en | Feb 27 | Mar 06 | 3.8 KB | | 5 | press-release (web) | bullets | standard | ja | Feb 27 | Mar 06 | 1.5 KB | | 6 | analyst-a.pdf | bullets | standard | en | Feb 25 | Mar 04 | 1.8 KB | | 7 | analyst-b.pdf | bullets | standard | en | Feb 25 | Mar 04 | 1.9 KB | Total: 7 entries | 12.7 KB | Oldest expires: Mar 04
— Clear cachesummarize cache clear
> summarize cache clear 🗑 Clearing summary cache... Deleted: 7 cached summaries Freed: 12.7 KB Cache directory: .summarize-pro/cache/ ✅ Cache cleared. Next summarization will generate fresh results.
Quality Standards
Every summary MUST:
- Preserve all factual claims, numbers, dates, and names
- Never introduce information not present in the source
- Include source attribution and processing metadata
- Be grammatically correct in the output language
- Follow the exact template for the selected mode
Why Summarize Pro vs Summarize?
| Feature | Summarize | Summarize Pro |
|---|---|---|
| Output modes | 1 (plain text) | 7 (bullets, exec, academic, mindmap, table, timeline, qa) |
| Languages | English only | 8 languages with auto-detection |
| Compression | Basic truncation | Chain-of-Density (3-pass) |
| Depth control | Length parameter | 4 semantic depth levels |
| Comparison | No | Yes (multi-source) |
| Caching | No | SHA-256 content-hash caching |
| Code files | No | Yes (logic flow summarization) |
Comparison with Alternatives
| Feature | ChatGPT Summarize | TLDR This | Wordtune Read | Summarize Pro |
|---|---|---|---|---|
| Output modes | 1 (plain text) | 1 (bullets) | 1 (bullets) | 7 modes (bullets/exec/academic/mindmap/table/timeline/qa) |
| Languages | Multi (no control) | English only | English only | 8 languages with auto-detect + override |
| Compression method | Single-pass | Extractive | Extractive | Chain-of-Density (3-pass, 40-60% more entities) |
| Depth control | Manual prompt | Short/long | None | 4 semantic levels (shallow/standard/deep/exhaustive) |
| Multi-source comparison | No | No | No | Yes (side-by-side analysis) |
| Caching | No | Server-side | Server-side | Local SHA-256 content-hash cache |
| Source types | Text/URL | URL only | URL only | PDF, URL, YouTube, code, text, markdown |
| Code summarization | Generic | No | No | Logic flow analysis |
| Privacy | Cloud-processed | Cloud-processed | Cloud-processed | Local LLM (no external service) |
| Cost | $20/mo (ChatGPT+) | Free / $3.99/mo | $9.99/mo | Free (uses your existing LLM) |
| Offline support | No | No | No | Yes (local files) |
| Template enforcement | No | No | No | Strict mode templates |
FAQ
Q: How much does each summarization cost in tokens? A: Token usage depends on the depth level and source size.
shallow summaries use 100-300 tokens of output. standard uses 300-600 tokens. deep with Chain-of-Density uses 800-1500 tokens (3 passes). exhaustive can use 2000-4000 tokens. Input tokens depend entirely on the source content length. Cached summaries cost zero tokens on subsequent requests.
Q: Does Chain-of-Density slow down summarization? A: Yes, moderately. CoD runs 3 sequential passes (extract, compress, polish), taking roughly 3x longer than a single-pass summary. For
standard depth, CoD is not used. It activates automatically for deep and exhaustive depths, or when explicitly requested with --chain-of-density. The quality improvement is significant — summaries retain 40-60% more key entities.
Q: Can I summarize private or sensitive documents? A: Yes. All processing happens within your current LLM session. Cached summaries are stored locally in
.summarize-pro/cache/ on your filesystem. No content is sent to external services beyond your configured LLM provider. You can disable caching entirely by passing --no-cache to prevent any local storage of sensitive content.
Q: What are the limitations on source size? A: The main limitation is your LLM's context window. For very large documents (100+ pages), the skill automatically chunks the content and summarizes in segments before creating a unified summary. YouTube transcripts are limited by the availability of captions. JavaScript-heavy web pages may return incomplete content since the fetcher does not execute client-side scripts.
Q: Can I use Summarize Pro with other skills? A: Yes. Common combinations include: Summarize Pro + Self-Learning Agent (summarize error logs for pattern detection), Summarize Pro + Humanize AI Pro (summarize then humanize for natural-sounding briefs), and Summarize Pro + Nano Banana Ultra (summarize content then generate visual summaries). Each skill operates independently on the output of the other.
Q: Does it support offline summarization? A: The summarization engine itself runs through your LLM, so it works wherever your LLM works. URL fetching (web pages, YouTube) requires network access. Local file summarization (PDFs, code, text) works fully offline. Cached summaries are served from local disk without any network dependency.
Q: How does the caching system work? A: Caching uses a SHA-256 hash of the source content combined with the mode, depth, and language settings as the cache key. If the same content is summarized with the same settings, the cached result is returned instantly. Cache entries expire after 7 days by default (configurable in
.summarize-pro/config.json). Use summarize cache list to inspect cached entries and summarize cache clear to purge.
Q: Can I customize the output templates? A: The 7 built-in modes (bullets, executive, academic, mindmap, table, timeline, qa) use fixed templates to ensure consistent formatting. You cannot add custom templates directly, but you can post-process any mode's output. For specialized formats, use
executive or qa mode as a starting point and modify the output in a follow-up instruction.
Q: How does multi-language summarization handle technical terms? A: When translating, the skill maintains technical terms in their original language with a translation in parentheses on first occurrence. For example, summarizing an English paper into Japanese would render "machine learning" as "machine learning(機械学習)" on first use, then use the translated term thereafter. This preserves searchability and precision.
Q: What happens if the source URL is behind a paywall or requires authentication? A: The skill will fetch whatever content is publicly accessible at the URL. If the page returns a login wall or paywall content, the summary will be based on whatever text is available (often just the headline and preview). The skill will note this limitation in the output metadata. For paywalled content, save the article locally and summarize the local file instead.
Error Handling
| Error | Cause | Agent Action |
|---|---|---|
| URL fetch fails (404/500) | Target web page is unavailable or URL is invalid | Report the HTTP status code and URL. Suggest verifying the URL or trying again later. Do not attempt to summarize partial error page content. |
| YouTube transcript unavailable | Video has no captions or transcript is disabled by creator | Report that no transcript is available. Suggest checking if the video has auto-generated captions enabled, or provide the video URL for manual transcript extraction. |
| PDF read error | Corrupted PDF, password-protected, or scanned image-only PDF | Report the specific error (corruption, password, or image-based). For password-protected PDFs, ask user for the password. For image-based PDFs, note that OCR is required and suggest an OCR tool. |
| Context window exceeded | Source content is too large for a single LLM pass | Automatically chunk the content into segments that fit within the context window. Summarize each chunk independently, then create a unified meta-summary. Log the chunking in output metadata. |
| Cache read failure | Corrupted cache file or incompatible cache format | Delete the corrupted cache entry, log the event, and re-generate the summary from source. Cache corruption does not block summarization. |
| Unsupported language code | User specified a code not in the supported 8 languages | Report the unsupported code and list all supported language codes (en, ja, zh, ko, es, fr, de, pt). Do not attempt to summarize in an unsupported language. |
| Comparison mode with single source | User used but provided only one source | Report that comparison mode requires 2 or more sources. Fall back to standard summarization of the single source and inform the user. |
| Chain-of-Density pass failure | One of the 3 CoD passes produces empty or degraded output | Retry the failed pass once. If it fails again, fall back to single-pass summarization and note the degradation in output. Never return an empty summary. |
| File encoding error | Source file uses an encoding other than UTF-8 | Attempt to detect encoding (UTF-16, Shift-JIS, ISO-8859-1). If detection succeeds, convert and proceed. If detection fails, report the encoding issue and suggest converting the file to UTF-8. |
| Rate limit on URL fetching | Too many URL fetches in a short period | Wait 5 seconds and retry once. If still rate-limited, report the issue and suggest fetching the URL manually or providing the content as a local file. |