Vibecosystem layered-recall
Progressive memory recall with 4 scope layers AND 3 depth layers. Scope: identity > project > room > deep. Depth: IDs only > summary > full. 10-50x token savings through fetch-on-confirmation pattern.
git clone https://github.com/vibeeval/vibecosystem
T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/layered-recall" ~/.claude/skills/vibeeval-vibecosystem-layered-recall && rm -rf "$T"
skills/layered-recall/SKILL.mdLayered Recall
Progressive memory system with two orthogonal dimensions of lazy loading:
- Scope layers - What is relevant (identity, project, domain, deep)
- Depth layers - How much detail to fetch (IDs, summary, full)
Combined savings: 10-50x tokens vs eager loading.
Depth Pattern (Fetch-on-Confirmation)
Instead of loading full memory entries upfront, agents fetch in 3 depths:
Depth 1: IDs only (~10 tokens per match) Agent decides which are worth investigating Depth 2: Summary (~50 tokens per match) Room, type, preview (first 80 chars) Agent confirms relevance Depth 3: Full content (~500+ tokens per match) Only fetched for confirmed matches
Example flow:
1. Agent searches "auth refresh token" 2. Depth 1 returns 8 IDs: d-abc123, d-def456, ... 3. Agent requests Depth 2 for IDs 1-3 4. Sees room=authentication, type=decision, preview="Chose JWT..." 5. Agent confirms IDs 1,3 are relevant 6. Requests Depth 3 only for those 2 entries 7. Gets full content for ~1000 tokens instead of 4000+
The 4 Layers
Layer 1: Identity (always loaded, ~200 tokens) Who is the user? What are their preferences? Layer 2: Critical Facts (per-project, ~500 tokens) Hard constraints, active decisions, blockers Layer 3: Room Recall (on-demand, ~1-2K tokens) Relevant memories for current task domain Layer 4: Deep Search (when needed, ~2-5K tokens) Full semantic search across all memories
Layer Details
Layer 1: Identity (~200 tokens, ALWAYS loaded)
Loaded at every session start. Contains:
- User preferences (language, style, autonomy level)
- Global constraints (no emojis, Turkish responses, etc.)
- Tool preferences (which editors, which terminal)
Source:
~/.claude/projects/*/memory/user_*.md
Layer 2: Critical Facts (~500 tokens, per-project)
Loaded when entering a project directory. Contains:
- Active architectural decisions
- Known blockers and constraints
- Current sprint/milestone goals
- Tech stack and versions
Source:
~/.claude/projects/*/memory/project_*.md + thoughts/CONTEXT.md
Layer 3: Room Recall (~1-2K tokens, on-demand)
Loaded when task domain is detected (auth, database, deploy, etc.). Contains:
- Previous decisions in this domain
- Past errors and fixes
- Patterns that worked
- Patterns that failed
Source: Memory palace rooms +
mature-instincts.json filtered by domain
Trigger: Intent classifier detects domain (e.g., "fix the login bug" -> room: authentication)
Layer 4: Deep Search (~2-5K tokens, explicit)
Only loaded when explicitly needed or when Layers 1-3 don't have enough context. Contains:
- Full semantic search results
- Cross-project pattern matches
- Historical error resolutions
- Archived decisions
Source: PostgreSQL vector search + palace cross-wing search
Trigger: Agent explicitly queries, or user asks "have we done this before?"
Recall Flow
Session Start -> Load Layer 1 (identity) -> Detect project -> Load Layer 2 (facts) -> User sends prompt -> Classify intent/domain -> Load Layer 3 (room) -> If insufficient context -> Load Layer 4 (deep)
Token Budget
| Layer | Tokens | When |
|---|---|---|
| L1 | ~200 | Always |
| L2 | ~500 | Per project |
| L3 | ~1-2K | Per task domain |
| L4 | ~2-5K | On demand |
| Total max | ~8K | Worst case |
vs. loading everything: ~30-50K tokens
Savings: 4-6x token reduction
Integration
With Existing Hooks
-> feeds Layer 2 and Layer 3instinct-loader
-> implements Layer 3 scoringsmart-memory-recall
-> triggers Layer 3 room selectionintent-classifier
-> powers Layer 4 deep searchgraph-indexer
With Memory Palace
- Layer 2 pulls from palace wing index
- Layer 3 pulls from palace room drawers
- Layer 4 searches across all wings
With Agents
- Agents inherit parent's Layer 1-2 context
- Each agent can request Layer 3-4 for their domain
- Agent memories feed back into palace for future recall