capsulate
Turn Gmail into structured dashboards — extract and track subscriptions, packages, invoices, job applications, and more.
git clone https://github.com/imekenni/capsulate-skill
git clone --depth=1 https://github.com/imekenni/capsulate-skill ~/.claude/skills/imekenni-capsulate-skill-capsulate
SKILL.mdCapsulate — Email Intelligence Skill
Transform your inbox into structured, queryable dashboards using AI. No backend required — everything runs locally.
What This Does
- You describe what you want to track from your emails
- Claude generates an extraction schema and Gmail search query
- Your emails are searched, filtered, and deduplicated
- Subagents extract structured data from batches of emails in parallel, saving after each round
- A self-contained HTML dashboard is generated and opened in your browser
Prerequisites Check
Before starting, verify the user has the required tools. Run these silently:
which gws gws auth status
If
gws is not installed, tell the user:
Install the Google Workspace CLI:
Then authenticate:npm install -g @googleworkspace/cliThen run this skill again.gws auth login -s gmail
If
gws is installed but the auth check fails for any reason, show the user the actual error output and tell them:
Run
to connect your Gmail account, then try again.gws auth login -s gmail
Re-running an Existing Canvas
If the user runs
/capsulate and canvases already exist at ~/.capsulate/, list them and ask:
You have existing canvases:
(47 items, last updated Mar 28)subscription-trials (12 items, last updated Mar 25)package-trackingWhat would you like to do?
- Refresh an existing canvas (fetch only new emails)
- Open an existing canvas dashboard
- Update schema for an existing canvas (re-extract all items with new fields)
- Create a new canvas
Option: Open
Run
open ~/.capsulate/<CANVAS_NAME>.html and exit.
Option: Refresh
Skip Step 1. Load the saved canvas JSON to get the existing query, schema, known IDs, and
last_updated timestamp. Proceed to Step 2a in incremental mode.
Option: Update schema
Ask the user what fields to add, remove, or change. Show the current schema and proposed new schema for confirmation. Then re-run Steps 2a–4 against all emails (not incremental), replacing all existing items. Merge by
_id: overwrite existing records, keep any that weren't in the new result set.
Option: Create
Proceed from Step 1 as normal.
Step 1 — Define the Canvas
Ask the user two questions:
What do you want to track from your emails? Examples: "subscription trials and renewals", "package deliveries", "invoices and receipts", "job applications", "event invitations"
How far back should I look? Options: last 30 days, last 3 months, last 6 months, last year, or all time
Convert their answer to a
after:YYYY/MM/DD date string based on today's date. Use all time to mean no date filter. Store this as the canvas's time range.
Based on their answers, generate three things and confirm with the user before proceeding:
1a. Canvas Name
A short slug, e.g.
subscription-trials, package-tracking, invoices
Sanitize the slug: lowercase, replace spaces and special characters with hyphens, strip anything that is not
a-z, 0-9, or -. Example: "Q1 invoices / March" → q1-invoices-march.
1b. Gmail Search Query
A Gmail search string that will find the most relevant emails. Be specific to avoid noise. Examples:
- Subscriptions:
subject:(trial OR subscription OR renewal OR "free trial") -category:promotions - Packages:
subject:(shipped OR delivered OR tracking OR "out for delivery") -category:promotions - Invoices:
subject:(invoice OR receipt OR "order confirmation" OR payment) -category:promotions - Job apps:
subject:(application OR interview OR "thank you for applying" OR offer) -category:promotions
1c. Extraction Schema
A JSON object defining what to extract from each email. Keep fields concise and typed. Example for subscriptions:
{ "company": "string — name of the company or service", "plan": "string — plan or tier name if mentioned", "trial_end_date": "date (YYYY-MM-DD) — when trial ends, null if not found", "amount": "string — price or amount, include currency symbol", "status": "enum: trial | active | cancelled | expired | unknown", "action_required": "boolean — does this email require the user to take action", "notes": "string — any other relevant detail in one sentence, null if nothing notable" }
Show the canvas name, query, time range, and schema to the user and ask: "Does this look right? I'll search your Gmail and extract data based on this."
Step 2a — Search Gmail
Once confirmed, build the query and search Gmail for matching emails.
Promotions hint: If the canvas type is unlikely to involve promotional emails (e.g. package tracking, invoices, job applications), append
-category:promotions to exclude Gmail's Promotions tab and reduce noise. Skip this for canvases explicitly about deals, offers, or subscriptions where promotional emails are relevant.
Incremental mode (refreshing an existing canvas): use the canvas's
last_updated timestamp as the date filter to only fetch new emails.
gws gmail +triage --query "<GENERATED_QUERY> after:YYYY/MM/DD" --format json --max 500
Full mode (new canvas or schema update): apply the time range chosen in Step 1. If the user chose "all time", omit the
after: filter.
gws gmail +triage --query "<GENERATED_QUERY> after:YYYY/MM/DD" --format json --max 500
This returns a JSON object with a
messages array, where each item has id, from, subject, and date.
- If 0 results (incremental): tell the user "No new emails since last refresh." Re-open the dashboard and exit.
- If 0 results (full): tell the user and suggest either a broader query or a wider time range. Offer to try again.
- If the result contains 500 items, warn the user that there may be more emails not included within the selected time range and offer to re-run with
.--max 1000
Do not proceed to extraction. Always run Step 2b next to filter and deduplicate the results before spawning any subagents.
Step 2b — Filter & Deduplicate
Before spawning any subagents, reduce the message list in two passes:
Deduplicate: If refreshing, load the existing canvas JSON and collect all known
_id values. Remove any messages already in the saved canvas. Tell the user: "Found X new emails to process (Y already extracted, skipping)."
Pre-filter irrelevant emails: Scan the triage metadata (subject and from — no body fetch needed) and discard clearly irrelevant messages. Apply in order:
- Drop messages where subject matches generic noise:
,Re:
,Fwd:
,[Automated]
,Out of Office
,Delivery Status NotificationUndelivered Mail - Drop messages from bulk-mail senders (noreply@, mailer-daemon@, bounce@, postmaster@, no-reply@) unless the sender domain matches a brand relevant to the canvas (e.g. a shipping carrier for package tracking)
- Apply a canvas-type relevance heuristic to the subject — if the subject clearly cannot contain data matching the schema (e.g. "Weekly digest" for a package-tracking canvas), drop it
Tell the user: "Pre-filtered X of Y emails as irrelevant (Z remaining to extract)."
Extract the
id field from each remaining message to build the final list of IDs to process.
Do not proceed to Step 3 until both deduplication and pre-filtering are complete and the final ID list is ready.
Step 3 — Extract Data (Parallel Subagents)
Ensure the data directory exists before the first round:
mkdir -p ~/.capsulate
Group the email IDs into batches of 5 emails per subagent, running 4 subagents concurrently (20 emails per round). Pause 2 seconds between rounds to respect rate limits.
When building each batch, include the
subject and from metadata from the triage results alongside each ID — the subagent needs these to decide whether to skip without fetching the body.
After each round completes:
- Collect the subagent results, flatten into a list, discard
items (count them), discard JSON parse failures (count them)_skip: true - Save immediately to
: merge new items with any existing items, deduplicate by~/.capsulate/<CANVAS_NAME>.json
with new records overwriting old ones, and update_id
andlast_updated
. Use this format:total{ "canvas": "<CANVAS_NAME>", "description": "<USER'S ORIGINAL DESCRIPTION>", "query": "<GMAIL_QUERY>", "time_range": "<e.g. last 30 days | last 3 months | all time>", "schema": "<SCHEMA_JSON>", "last_updated": "<ISO_8601_TIMESTAMP>", "total": <NUMBER_OF_ITEMS>, "items": [ ...all extracted items so far... ] } - Report progress to the user: "Round X/Y complete — processed emails Z–W. Saved N items so far (S skipped as irrelevant, F failed)."
Each subagent receives this prompt:
Subagent prompt:
You are extracting structured data from emails for a Capsulate canvas.
Canvas:
<CANVAS_NAME>
Schema: <SCHEMA_JSON>
Your task: For each email below (given as
id · subject · from), do the following:
- First, decide from the subject and sender alone whether this email is likely to contain data matching the schema for a
canvas. If it clearly cannot (e.g. a digest, a reply chain, an unsubscribe confirmation), return<CANVAS_NAME>
immediately — do NOT fetch the body.{"_id": "<ID>", "_skip": true} - If potentially relevant, fetch the email:
gws gmail +read --id <EMAIL_ID> --format json --headers - Read the full email body (decode if needed)
- Extract ONLY the fields defined in the schema
- Use
for fields not found in the email — do not invent or guess datanull - Include these metadata fields in each result:
: the email ID_id
: the email subject_subject
: the sender email address_from
: the email date in ISO 8601 format_date
Emails to process (format:
id · subject · from):
<EMAIL_ID_1> · <SUBJECT_1> · <FROM_1>
<EMAIL_ID_2> · <SUBJECT_2> · <FROM_2>
<EMAIL_ID_3> · <SUBJECT_3> · <FROM_3>
<EMAIL_ID_4> · <SUBJECT_4> · <FROM_4>
<EMAIL_ID_5> · <SUBJECT_5> · <FROM_5>
Return ONLY a valid JSON array of objects, one per email, in the same order as the IDs above. No explanation, no markdown — just the JSON array. Skipped emails return
{"_id": "...", "_skip": true}. If an email cannot be read, include the item with all schema fields as null and add "_error": "Could not read email body".
Collect all subagent responses for the round. Each response should be a JSON array — flatten all arrays into a single list. Discard items with
_skip: true (irrelevant, body not fetched). Discard items that fail JSON parsing, but keep a count. After all rounds complete, tell the user: "Done — extracted data from X of Y emails (S skipped as irrelevant, Z failed)." If more than 20% failed (excluding skips), warn the user and suggest re-running.
Step 4 — Generate Dashboard
Generate a self-contained HTML file at
~/.capsulate/<CANVAS_NAME>.html.
Design Requirements
- Single HTML file, no external dependencies (all CSS and JS inline)
- Clean, minimal design: white background, subtle shadows, Inter or system-ui font
- Header showing: canvas name, description, item count, last updated time
- Summary stats bar below the header: compute and display 2–4 useful aggregate stats from the data. Examples:
- Subscriptions: "3 active trials · $47.99/mo total · 2 expiring within 7 days"
- Packages: "5 in transit · 3 delivered this week"
- Invoices: "12 invoices · $1,240.00 total"
- For any canvas: count non-null items per enum value if an enum field exists
- Two views: Card view (default) and Table view — toggle button in header
- Search bar that filters items client-side across all fields
- Date range filter: if any schema field contains "date", show From/To date pickers that filter items client-side
- Group by dropdown: if any field is an enum, allow grouping cards/rows by that field
- Each card/row shows all schema fields with smart formatting:
- Dates formatted as "Mar 28, 2026"
- Booleans as colored badges ("Yes" green / "No" gray)
- Null values shown as "—" in muted gray
- Status enums as colored badges (customize colors per canvas type)
- Sort by any column in table view (click header to cycle asc/desc)
- CSV export button in the header: clicking it downloads all currently-filtered items as a
file named.csv<canvas-name>-<date>.csv - Footer: "Generated by Capsulate · <TIMESTAMP>"
- The JSON data is embedded directly in the HTML as a JS variable — no file reading needed
Data Embedding
Embed the extracted data in a JSON script tag and parse it at runtime to avoid XSS if email content contains
</script>:
<script type="application/json" id="canvas-data"> <JSON_DATA> </script> <script> const CANVAS_DATA = JSON.parse(document.getElementById('canvas-data').textContent); </script>
After writing the file, open it:
open ~/.capsulate/<CANVAS_NAME>.html
Tell the user:
"Dashboard generated at
" "Run~/.capsulate/<CANVAS_NAME>.htmlagain anytime to refresh with new emails."/capsulate
Error Handling
- gws rate limit: the 2-second pause between rounds should prevent this. If a rate limit error still occurs, pause 10 seconds and retry the current batch once before discarding.
- Email body empty or unreadable: include the item with all schema fields as
andnull_error: "Could not read email body" - gws auth expired: tell the user to run
and try againgws auth login -s gmail - Disk write error: tell the user to check permissions on
~/.capsulate/