Gbrain ingest

Route content to specialized ingestion skills. Detects input type and delegates.

install

source · Clone the upstream repo

git clone https://github.com/garrytan/gbrain

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/garrytan/gbrain "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ingest" ~/.claude/skills/garrytan-gbrain-ingest && rm -rf "$T"

manifest: skills/ingest/SKILL.md

source content

Ingest Skill

Ingest meetings, articles, media, documents, and conversations into the brain.

Filing rule: Read
skills/_brain-filing-rules.md
before creating any new page.

Contract

Every fact written to a brain page carries an inline
```
[Source: ...]
```
citation with date and provenance.
Every entity mention creates a back-link from the entity's page to the page mentioning them (Iron Law).
Raw sources are preserved for provenance via
```
gbrain files upload-raw
```
with automatic size routing.
State sections are rewritten with current best understanding, never appended to.
Entity detection fires on every inbound message; notable entities get pages or updates.

Iron Law: Back-Linking (MANDATORY)

Every mention of a person or company with a brain page MUST create a back-link FROM that entity's page TO the page mentioning them. An unlinked mention is a broken brain. See

skills/_brain-filing-rules.md

for format.

Citation Requirements (MANDATORY)

Every fact written to a brain page must carry an inline

[Source: ...]

citation.

User's statements:
```
[Source: User, {context}, YYYY-MM-DD]
```
Meeting data:
```
[Source: Meeting "{title}", YYYY-MM-DD]
```

Email/message:

[Source: email from {name} re: {subject}, YYYY-MM-DD]

Web content:

[Source: {publication}, {URL}, YYYY-MM-DD]

Social media:
```
[Source: X/@handle, YYYY-MM-DD](URL)
```
(include link)
Synthesis:
```
[Source: compiled from {sources}]
```

Phases

Router note: This skill is a router. For specialized ingestion, see: idea-ingest, media-ingest, meeting-ingestion.

Parse the source. Extract people, companies, dates, and events from the input.
For each entity mentioned:
- Read the entity's page from gbrain to check if it exists
- If exists: update compiled_truth (rewrite State section with new info, don't append)
- If new: check notability gate, then store the page in gbrain with the appropriate type and slug
Append to timeline. Add a timeline entry in gbrain for each event, with date, summary, and source citation.
Create cross-reference links. Link entities in gbrain for every entity pair mentioned together, using the appropriate relationship type.
Back-link all entities. Update EVERY mentioned entity's page with a back-link to this page (Iron Law).
Timeline merge. The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.

Entity Detection on Every Message

Production agents should detect entity mentions on EVERY inbound message. This is the signal detection loop that makes the brain compound over time.

Protocol

Scan the message for entity mentions: people, companies, concepts, original thinking. Fire on every message (no exceptions unless purely operational).
For each entity detected:
- ```
gbrain search "name"
```
  -- does a page already exist?
- If yes: load context with
```
gbrain get <slug>
```
  . Use the compiled truth to inform your response. Update the page if the message contains new information.
- If no: assess notability (see
```
skills/_brain-filing-rules.md
```
  ). If the entity is worth tracking, create a new page with
```
gbrain put <type/slug>
```
  and populate with what you know.

After creating or updating pages: sync to gbrain:

gbrain sync --no-pull --no-embed

Don't block the conversation. Entity detection and enrichment should happen alongside the response, not before it. The user shouldn't wait for brain writes to get an answer.

What counts as notable

People the user interacts with or discusses (not random mentions)
Companies relevant to the user's work or interests
Concepts or frameworks the user references or creates
The user's own original thinking (ideas, theses, observations) -- highest value
See
```
skills/_brain-filing-rules.md
```
for the full notability gate

What to capture from the user's own thinking

Original thinking is the most valuable signal. Capture exact phrasing -- the user's language IS the insight. Don't paraphrase.

Novel observations or theses
Frameworks, mental models, heuristics
Connections between ideas that others miss
Contrarian positions with reasoning
Strong reactions to external stimuli (what triggered it and why)

Media Workflows

Content the user encounters should be captured in the brain. File by PRIMARY SUBJECT, not by format (see

skills/_brain-filing-rules.md

Articles & Web Content

Input: URL shared by user, or article mentioned in conversation.

Process:

Fetch content (
```
web_fetch
```
or equivalent)
Extract: title, author, publication, date, full text
Summarize: executive summary + key arguments (not a rehash)
Extract entities: people, companies, concepts mentioned
Save raw source for provenance (see Raw Source Preservation below)
Analyze for the user: don't just summarize. What's interesting given what you know about them? Flag connections, contradictions, content opportunities.

Write to: appropriate directory per filing rules (about a person ->

people/

, about a company ->

companies/

, reusable framework ->

concepts/

, raw data ->

sources/

)

Videos & Podcasts

Input: URL (YouTube, podcast, etc.) or local audio/video file.

Process:

Get transcript -- speaker-diarized if possible (services like Diarize.io provide speaker-labeled, word-level timing)
Save raw transcript (both JSON and human-readable TXT)
Analyze: executive summary, key ideas, key quotes with speaker attribution, notable stories/anecdotes, people and companies mentioned
Extract and cross-reference all entities mentioned
HARD RULE: every video/podcast brain page MUST link to the raw diarized transcript. A page without transcript links is incomplete.

Write to:

media/videos/

media/podcasts/

with back-links to all entities.

Quality bar:

Compelling headline (not "This video discusses...")
Executive summary that makes you want to watch/listen
Key Ideas as actual insights, not topic labels
Verbatim quotes with real speaker names (not "speaker_0")
All entities extracted with context and back-linked

PDFs & Documents

Input: File path or URL.

Process:

Extract text (OCR if scanned/image PDF)
Save raw source for provenance
Summarize: executive summary + key sections + notable data
Extract entities
Cross-reference from entity pages

Write to: per filing rules (file by primary subject, not format).

Screenshots & Images

Input: Image file.

Process:

Analyze content (OCR for text-heavy images, description for photos)
If tweet screenshot: extract text, author, date, route to social media workflow
If article screenshot: extract text, route to article workflow
If data/chart: extract data points, describe findings

Write to: depends on content -- route to the appropriate workflow above.

Meeting Transcripts

Input: Transcript from meeting recording service, or manual notes.

Process:

Pull full transcript (source of truth -- AI summaries are medium-low trust)
Save raw transcript for provenance
Write meeting page with YOUR analysis above the line, raw transcript below
Entity propagation (MANDATORY): for each attendee and company discussed:
- Update their brain page State section if new info surfaced
- Append to their Timeline with link to the meeting page
- Create page if person/company is notable and has no page yet
A meeting is NOT fully ingested until all entity pages are updated

Write to:

meetings/YYYY-MM-DD-short-description.md

What makes a good meeting page:

Reveals the real crux, not a bullet dump
Connects to existing brain pages (people, companies, deals)
Flags what changed (status, decisions, new info)
Names tension or what was left unsaid
Captures actual dynamic, not performative summary

Social Media Content

Input: Tweet, thread, or social media post.

Process:

Fetch full content (thread, quote tweets, context)
If images present: OCR via vision model for full text extraction
Summarize: what's being said, why it matters, who's involved
Extract entities and update brain pages
Include direct link to the original post (MANDATORY for citations)

Write to:

media/x/

for daily aggregation, or entity-specific directories if the post is primarily about a person/company.

Raw Source Preservation

Every ingested item must have its raw source preserved for provenance.

Use

gbrain files upload-raw

for automatic size routing:

gbrain files upload-raw <file> --page <page-slug> --type <type>

< 100 MB text/PDF: stays in git (brain repo
```
.raw/
```
sidecar directories)
>= 100 MB OR media (video, audio, images): uploaded to cloud storage via TUS resumable upload,
```
.redirect.yaml
```
pointer left in the brain repo

The

.redirect.yaml

pointer format:

target: supabase://brain-files/page-slug/filename.mp4
bucket: brain-files
storage_path: page-slug/filename.mp4
size: 524288000
size_human: 500 MB
hash: sha256:abc123...
mime: video/mp4
uploaded: 2026-04-11T...
type: transcript

Accessing stored files:

```
gbrain files signed-url <storage-path>
```
-- generate 1-hour signed URL for viewing/sharing
```
gbrain files restore <dir>
```
-- download back to local from cloud storage

Use

put_raw_data

in gbrain to store raw API responses and metadata (JSON, not binary).

Test Before Bulk

When processing multiple items (batch video ingestion, bulk meeting processing, etc.):

Test on 3-5 items first. Run in test mode if available.
Read the actual output. Is the quality good? Are titles compelling (not "This video discusses...")? Are entities extracted and back-linked? Is the format clean?
Fix what's wrong in the approach/skill, not via one-off patches.
Only then: bulk execute with throttling, commits every 5-10 items.

The marginal cost of testing 3 items first is near zero. The cost of cleaning up 100 bad pages is enormous.

Quality Rules

Executive summary in compiled_truth must be updated, not just timeline appended
State section is REWRITTEN, not appended to. Current best understanding only.
Timeline entries are reverse-chronological (newest first)
Every person/company mentioned gets a page if notable (see filing rules)
Link types: knows, works_at, invested_in, founded, met_at, discussed
Source attribution: every timeline entry includes [Source: ...] citation
Back-links: every entity mention creates a back-link (Iron Law)
Filing: file by primary subject, not format or source (see filing rules)

Anti-Patterns

Appending to State sections. State is rewritten with the current best understanding on every update. Append-only State sections grow stale and contradictory.
Ingesting without back-links. An unlinked mention is a broken brain. Every entity mentioned must have a back-link from their page to the page mentioning them.
Skipping raw source preservation. Every ingested item must have its raw source preserved. A brain page without provenance is unverifiable.
Bulk processing without sample test. Test on 3-5 items first. Fix quality issues in the approach, not via one-off patches.
Paraphrasing the user's original thinking. The user's exact language IS the insight. Capture verbatim phrasing for ideas, theses, and frameworks.

Output Format

INGESTED: [title]
==================

Page: [slug]
Type: [person / company / meeting / media / concept]
Source: [source description]

Entities detected: N
- [entity] -> [created / updated] ([slug])

Back-links created: N
Timeline entries: N
Raw source: [preserved at path / uploaded to cloud]

Tools Used

Read a page from gbrain (get_page)
Store/update a page in gbrain (put_page)
Add a timeline entry in gbrain (add_timeline_entry)
Link entities in gbrain (add_link)
List tags for a page (get_tags)
Tag a page in gbrain (add_tag)
Store raw data in gbrain (put_raw_data)
Check backlinks in gbrain (get_backlinks)