Gbrain ingest
Route content to specialized ingestion skills. Detects input type and delegates.
git clone https://github.com/garrytan/gbrain
T=$(mktemp -d) && git clone --depth=1 https://github.com/garrytan/gbrain "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ingest" ~/.claude/skills/garrytan-gbrain-ingest && rm -rf "$T"
skills/ingest/SKILL.mdIngest Skill
Ingest meetings, articles, media, documents, and conversations into the brain.
Filing rule: Read
before creating any new page.skills/_brain-filing-rules.md
Contract
- Every fact written to a brain page carries an inline
citation with date and provenance.[Source: ...] - Every entity mention creates a back-link from the entity's page to the page mentioning them (Iron Law).
- Raw sources are preserved for provenance via
with automatic size routing.gbrain files upload-raw - State sections are rewritten with current best understanding, never appended to.
- Entity detection fires on every inbound message; notable entities get pages or updates.
Iron Law: Back-Linking (MANDATORY)
Every mention of a person or company with a brain page MUST create a back-link FROM that entity's page TO the page mentioning them. An unlinked mention is a broken brain. See
skills/_brain-filing-rules.md for format.
Citation Requirements (MANDATORY)
Every fact written to a brain page must carry an inline
[Source: ...] citation.
- User's statements:
[Source: User, {context}, YYYY-MM-DD] - Meeting data:
[Source: Meeting "{title}", YYYY-MM-DD] - Email/message:
[Source: email from {name} re: {subject}, YYYY-MM-DD] - Web content:
[Source: {publication}, {URL}, YYYY-MM-DD] - Social media:
(include link)[Source: X/@handle, YYYY-MM-DD](URL) - Synthesis:
[Source: compiled from {sources}]
Phases
Router note: This skill is a router. For specialized ingestion, see: idea-ingest, media-ingest, meeting-ingestion.
- Parse the source. Extract people, companies, dates, and events from the input.
- For each entity mentioned:
- Read the entity's page from gbrain to check if it exists
- If exists: update compiled_truth (rewrite State section with new info, don't append)
- If new: check notability gate, then store the page in gbrain with the appropriate type and slug
- Append to timeline. Add a timeline entry in gbrain for each event, with date, summary, and source citation.
- Create cross-reference links. Link entities in gbrain for every entity pair mentioned together, using the appropriate relationship type.
- Back-link all entities. Update EVERY mentioned entity's page with a back-link to this page (Iron Law).
- Timeline merge. The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
Entity Detection on Every Message
Production agents should detect entity mentions on EVERY inbound message. This is the signal detection loop that makes the brain compound over time.
Protocol
- Scan the message for entity mentions: people, companies, concepts, original thinking. Fire on every message (no exceptions unless purely operational).
- For each entity detected:
-- does a page already exist?gbrain search "name"- If yes: load context with
. Use the compiled truth to inform your response. Update the page if the message contains new information.gbrain get <slug> - If no: assess notability (see
). If the entity is worth tracking, create a new page withskills/_brain-filing-rules.md
and populate with what you know.gbrain put <type/slug>
- After creating or updating pages: sync to gbrain:
gbrain sync --no-pull --no-embed - Don't block the conversation. Entity detection and enrichment should happen alongside the response, not before it. The user shouldn't wait for brain writes to get an answer.
What counts as notable
- People the user interacts with or discusses (not random mentions)
- Companies relevant to the user's work or interests
- Concepts or frameworks the user references or creates
- The user's own original thinking (ideas, theses, observations) -- highest value
- See
for the full notability gateskills/_brain-filing-rules.md
What to capture from the user's own thinking
Original thinking is the most valuable signal. Capture exact phrasing -- the user's language IS the insight. Don't paraphrase.
- Novel observations or theses
- Frameworks, mental models, heuristics
- Connections between ideas that others miss
- Contrarian positions with reasoning
- Strong reactions to external stimuli (what triggered it and why)
Media Workflows
Content the user encounters should be captured in the brain. File by PRIMARY SUBJECT, not by format (see
skills/_brain-filing-rules.md).
Articles & Web Content
Input: URL shared by user, or article mentioned in conversation.
Process:
- Fetch content (
or equivalent)web_fetch - Extract: title, author, publication, date, full text
- Summarize: executive summary + key arguments (not a rehash)
- Extract entities: people, companies, concepts mentioned
- Save raw source for provenance (see Raw Source Preservation below)
- Analyze for the user: don't just summarize. What's interesting given what you know about them? Flag connections, contradictions, content opportunities.
Write to: appropriate directory per filing rules (about a person ->
people/,
about a company -> companies/, reusable framework -> concepts/, raw data -> sources/)
Videos & Podcasts
Input: URL (YouTube, podcast, etc.) or local audio/video file.
Process:
- Get transcript -- speaker-diarized if possible (services like Diarize.io provide speaker-labeled, word-level timing)
- Save raw transcript (both JSON and human-readable TXT)
- Analyze: executive summary, key ideas, key quotes with speaker attribution, notable stories/anecdotes, people and companies mentioned
- Extract and cross-reference all entities mentioned
- HARD RULE: every video/podcast brain page MUST link to the raw diarized transcript. A page without transcript links is incomplete.
Write to:
media/videos/ or media/podcasts/ with back-links to all entities.
Quality bar:
- Compelling headline (not "This video discusses...")
- Executive summary that makes you want to watch/listen
- Key Ideas as actual insights, not topic labels
- Verbatim quotes with real speaker names (not "speaker_0")
- All entities extracted with context and back-linked
PDFs & Documents
Input: File path or URL.
Process:
- Extract text (OCR if scanned/image PDF)
- Save raw source for provenance
- Summarize: executive summary + key sections + notable data
- Extract entities
- Cross-reference from entity pages
Write to: per filing rules (file by primary subject, not format).
Screenshots & Images
Input: Image file.
Process:
- Analyze content (OCR for text-heavy images, description for photos)
- If tweet screenshot: extract text, author, date, route to social media workflow
- If article screenshot: extract text, route to article workflow
- If data/chart: extract data points, describe findings
Write to: depends on content -- route to the appropriate workflow above.
Meeting Transcripts
Input: Transcript from meeting recording service, or manual notes.
Process:
- Pull full transcript (source of truth -- AI summaries are medium-low trust)
- Save raw transcript for provenance
- Write meeting page with YOUR analysis above the line, raw transcript below
- Entity propagation (MANDATORY): for each attendee and company discussed:
- Update their brain page State section if new info surfaced
- Append to their Timeline with link to the meeting page
- Create page if person/company is notable and has no page yet
- A meeting is NOT fully ingested until all entity pages are updated
Write to:
meetings/YYYY-MM-DD-short-description.md
What makes a good meeting page:
- Reveals the real crux, not a bullet dump
- Connects to existing brain pages (people, companies, deals)
- Flags what changed (status, decisions, new info)
- Names tension or what was left unsaid
- Captures actual dynamic, not performative summary
Social Media Content
Input: Tweet, thread, or social media post.
Process:
- Fetch full content (thread, quote tweets, context)
- If images present: OCR via vision model for full text extraction
- Summarize: what's being said, why it matters, who's involved
- Extract entities and update brain pages
- Include direct link to the original post (MANDATORY for citations)
Write to:
media/x/ for daily aggregation, or entity-specific directories
if the post is primarily about a person/company.
Raw Source Preservation
Every ingested item must have its raw source preserved for provenance.
Use
for automatic size routing:gbrain files upload-raw
gbrain files upload-raw <file> --page <page-slug> --type <type>
- < 100 MB text/PDF: stays in git (brain repo
sidecar directories).raw/ - >= 100 MB OR media (video, audio, images): uploaded to cloud storage
via TUS resumable upload,
pointer left in the brain repo.redirect.yaml
The
.redirect.yaml pointer format:
target: supabase://brain-files/page-slug/filename.mp4 bucket: brain-files storage_path: page-slug/filename.mp4 size: 524288000 size_human: 500 MB hash: sha256:abc123... mime: video/mp4 uploaded: 2026-04-11T... type: transcript
Accessing stored files:
-- generate 1-hour signed URL for viewing/sharinggbrain files signed-url <storage-path>
-- download back to local from cloud storagegbrain files restore <dir>
Use
put_raw_data in gbrain to store raw API responses and metadata (JSON, not binary).
Test Before Bulk
When processing multiple items (batch video ingestion, bulk meeting processing, etc.):
- Test on 3-5 items first. Run in test mode if available.
- Read the actual output. Is the quality good? Are titles compelling (not "This video discusses...")? Are entities extracted and back-linked? Is the format clean?
- Fix what's wrong in the approach/skill, not via one-off patches.
- Only then: bulk execute with throttling, commits every 5-10 items.
The marginal cost of testing 3 items first is near zero. The cost of cleaning up 100 bad pages is enormous.
Quality Rules
- Executive summary in compiled_truth must be updated, not just timeline appended
- State section is REWRITTEN, not appended to. Current best understanding only.
- Timeline entries are reverse-chronological (newest first)
- Every person/company mentioned gets a page if notable (see filing rules)
- Link types: knows, works_at, invested_in, founded, met_at, discussed
- Source attribution: every timeline entry includes [Source: ...] citation
- Back-links: every entity mention creates a back-link (Iron Law)
- Filing: file by primary subject, not format or source (see filing rules)
Anti-Patterns
- Appending to State sections. State is rewritten with the current best understanding on every update. Append-only State sections grow stale and contradictory.
- Ingesting without back-links. An unlinked mention is a broken brain. Every entity mentioned must have a back-link from their page to the page mentioning them.
- Skipping raw source preservation. Every ingested item must have its raw source preserved. A brain page without provenance is unverifiable.
- Bulk processing without sample test. Test on 3-5 items first. Fix quality issues in the approach, not via one-off patches.
- Paraphrasing the user's original thinking. The user's exact language IS the insight. Capture verbatim phrasing for ideas, theses, and frameworks.
Output Format
INGESTED: [title] ================== Page: [slug] Type: [person / company / meeting / media / concept] Source: [source description] Entities detected: N - [entity] -> [created / updated] ([slug]) Back-links created: N Timeline entries: N Raw source: [preserved at path / uploaded to cloud]
Tools Used
- Read a page from gbrain (get_page)
- Store/update a page in gbrain (put_page)
- Add a timeline entry in gbrain (add_timeline_entry)
- Link entities in gbrain (add_link)
- List tags for a page (get_tags)
- Tag a page in gbrain (add_tag)
- Store raw data in gbrain (put_raw_data)
- Check backlinks in gbrain (get_backlinks)