Samara-main archive-search

Search raw session transcripts for technical details and reasoning traces

install

source · Clone the upstream repo

git clone https://github.com/claudeaceae/samara-main

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/claudeaceae/samara-main "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/archive-search" ~/.claude/skills/claudeaceae-samara-main-archive-search && rm -rf "$T"

manifest: .claude/skills/archive-search/SKILL.md

source content

Archive Search

Search through raw Claude Code session transcripts to recover technical details, reasoning traces, and thinking blocks that were compressed during episode distillation.

What You'll Get

The archive contains:

✅ Complete thinking blocks (raw reasoning process)
✅ User messages (original intent and questions)
✅ Assistant explanations (first 1000 chars)
❌ Tool results (filtered out - too noisy)
❌ System messages (no semantic value)

Best for:

Technical archaeology: "How did we implement X?"
Decision rationale: "Why did we choose Y over Z?"
Reasoning evolution: "How has our approach to X changed?"
Debugging context: "What was the exact error we saw?"

How Archive Differs from Regular Memory

Memory Type	Content	Use Case
Episodes ( `/episode` )	Curated summaries (2-8 bullets per section)	What happened today
Learnings ( `/search-memory` )	Distilled insights	General knowledge
Archive ( `/archive-search` )	Raw transcripts with thinking blocks	Technical depth, exact discussions

The archive captures ~5-10% of raw transcript bytes by selecting only high-value content (thinking blocks, user questions, assistant reasoning). It's indexed separately to prevent noise pollution of curated memory.

When to Use This Skill

Good use cases:

/archive-search "model fallback implementation"
→ Find thinking blocks about ModelFallbackChain design

/archive-search "Swift vs Python for memory database"
→ Recover exact trade-off analysis

/archive-search "context tracking approaches"
→ See evolution of ContextTracker across sessions

Not helpful:

/archive-search "who is Lucy"
→ Use /person instead (person profiles)

/archive-search "what did we decide about X"
→ Use /search-memory or check decisions.md

Search Tips

Thinking blocks are highest value:

Archive preserves full reasoning traces
Includes technical discussion and trade-offs
Most detailed context for implementation decisions

Be specific:

Good: "memory database indexing strategy"
Too broad: "memory" (will match thousands of chunks)

Look for technical terms:

Good: "ModelFallbackChain", "ContextTracker", "sanitizeResponse"
Generic: "error handling" (use more specific context)

Coverage

Time range: Last 90 days of sessions Project: samara-main only (for now) Update frequency: Incremental sync during dream cycles (3 AM)

Current state: Run

/archive-search stats

to see:

Total indexed chunks
Session count
Thinking block count

Example Session

User:

/archive-search "sanitization logic"

Claude:

Found 3 relevant sessions:

================================================================================
Result 1 | Distance: 0.18 | Role: thinking
Session: 5399ef7a-... | Time: 2026-01-05 15:42:33
--------------------------------------------------------------------------------
The user is asking about the sanitization issue. Let me think through what we
know:

1. Thinking blocks are leaking into messages
2. Session IDs appearing in user-visible text
3. This happens in multi-stream scenarios (webcam + web fetch + conversation)

The root cause is likely in ClaudeInvoker.swift where we parse JSON output.
If parseJsonOutput() falls back to raw text when JSON parsing fails, it would
include everything - thinking blocks, session IDs, etc.

We need a three-layer defense:
- parseJsonOutput() should never fall back to raw
- sanitizeResponse() should strip known patterns
- MessageBus should coordinate all sends through single channel

[...1247 more characters]

Session file: ~/.claude/projects/.../5399ef7a-...jsonl

This gives you the exact reasoning process, not just the summary that ended up in episode logs.

Commands

Once you invoke this skill with

/archive-search <query>

, you can:

Search by query:
- ```
archive-search "your query here"
```
- Returns top 5 most relevant chunks
Filter by role:
- ```
--role thinking
```
  - Only thinking blocks (highest signal)
- ```
--role user
```
  - Only user messages
- ```
--role assistant
```
  - Only assistant text
Adjust result count:
- ```
--n 10
```
  - Get more results
Check index stats:
- ```
stats
```
  - See what's indexed
Quality check:
- ```
sample
```
  - Random sample of 3 chunks
- ```
sample --session-id abc123
```
  - Sample from specific session

Implementation Notes

For Claude instances:

When a user invokes

/archive-search <query>

, you should:

Run:

~/.claude-mind/system/bin/archive-index search "<query>"

Parse and present results clearly
Include session IDs for reference
Explain what the user is seeing (thinking blocks vs user messages)

First-time setup:

If the archive hasn't been built yet:

# Build initial index (takes 5-10 minutes for 90 days)
~/.claude-mind/system/bin/archive-index rebuild

# Check stats
~/.claude-mind/system/bin/archive-index stats

# Test search
~/.claude-mind/system/bin/archive-index search "model fallback"

After initial build, dream cycles handle incremental sync automatically.