Obsidian-wiki claude-history-ingest
git clone https://github.com/Ar9av/obsidian-wiki
T=$(mktemp -d) && git clone --depth=1 https://github.com/Ar9av/obsidian-wiki "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.skills/claude-history-ingest" ~/.claude/skills/ar9av-obsidian-wiki-claude-history-ingest && rm -rf "$T"
.skills/claude-history-ingest/SKILL.mdClaude History Ingest — Conversation Mining
You are extracting knowledge from the user's past Claude Code conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it.
This skill can be invoked directly or via the
wiki-history-ingest router (/wiki-history-ingest claude).
Before You Start
- Read
to get.env
andOBSIDIAN_VAULT_PATH
(defaults toCLAUDE_HISTORY_PATH
)~/.claude - Read
at the vault root to check what's already been ingested.manifest.json - Read
at the vault root to know what the wiki already containsindex.md
Ingest Modes
Append Mode (default)
Check
.manifest.json for each source file (conversation JSONL, memory file). Only process:
- Files not in the manifest (new conversations, new memory files, new projects)
- Files whose modification time is newer than their
in the manifestingested_at
This is usually what you want — the user ran a few new sessions and wants to capture the delta.
Full Mode
Process everything regardless of manifest. Use after a
wiki-rebuild or if the user explicitly asks.
Claude Code Data Layout
Claude Code stores data in two locations. Scan both.
Source 1: ~/.claude/
(CLI sessions)
~/.claude/~/.claude/ ├── projects/ # Per-project directories │ ├── -Users-name-project-a/ # Path-derived name (slashes → dashes) │ │ ├── <session-uuid>.jsonl # Conversation data (JSONL) │ │ └── memory/ # Structured memories │ │ ├── MEMORY.md # Memory index │ │ ├── user_*.md # User profile memories │ │ ├── feedback_*.md # Workflow feedback memories │ │ └── project_*.md # Project context memories │ ├── -Users-name-project-b/ │ │ └── ... ├── sessions/ # Session metadata (JSON) │ └── <pid>.json # {pid, sessionId, cwd, startedAt, kind, entrypoint} ├── history.jsonl # Global session history ├── tasks/ # Subagent task data ├── plans/ # Saved plans └── settings.json
Source 2: ~/Library/Application Support/Claude/local-agent-mode-sessions/
(Desktop app agent sessions)
~/Library/Application Support/Claude/local-agent-mode-sessions/The Claude desktop app stores local agent mode sessions here. The structure is deeply nested:
~/Library/Application Support/Claude/local-agent-mode-sessions/ └── <outer-uuid>/ └── <inner-uuid>/ ├── local_<session-uuid>.json # Session metadata └── local_<session-uuid>/ ├── audit.jsonl # Audit log — tool calls, file reads, commands run └── .claude/ └── projects/ └── <path-encoded-name>/ # Same path-encoding as ~/.claude/projects/ └── <uuid>.jsonl # Conversation transcript (same JSONL format as CLI)
How to find all local-agent-mode sessions:
# Find all session metadata files find ~/Library/Application\ Support/Claude/local-agent-mode-sessions -name "local_*.json" -maxdepth 4 # Find all audit logs find ~/Library/Application\ Support/Claude/local-agent-mode-sessions -name "audit.jsonl" # Find all conversation transcripts find ~/Library/Application\ Support/Claude/local-agent-mode-sessions -name "*.jsonl" -path "*/.claude/projects/*"
Session metadata (
) — JSON file with fields like local_<uuid>.json
sessionId, cwd, startedAt, model, title. Read this first to understand the session context before opening the transcript.
Audit log (
) — Each line is a JSON record of one agent action: tool calls (Read, Write, Bash, Edit), file accesses, shell commands executed, MCP calls. Useful for understanding what the agent actually did — often richer signal than the conversation text alone. Fields: audit.jsonl
type, toolName, input, output, timestamp, sessionId.
Conversation transcript (
) — Identical format to CLI conversation JSONL. Parse the same way as .claude/projects/.../<uuid>.jsonl
~/.claude/projects/*/*.jsonl.
Key data sources ranked by value (both locations combined):
- Memory files (
) — Pre-distilled, already wiki-friendly. Gold.~/.claude/projects/*/memory/*.md - Conversation JSONL (both
and desktop app transcripts) — Full conversation transcripts. Rich but noisy.~/.claude/projects/*/*.jsonl - Audit logs (
in desktop sessions) — Tool-call level record of what was done. Useful for extracting concrete actions, file patterns, and command patterns even when the conversation is sparse.audit.jsonl - Session metadata (
andsessions/*.json
) — Tells you which project, when, and what CWD.local_*.json
Step 1: Survey and Compute Delta
Scan both data locations and compare against
.manifest.json:
# --- Source 1: CLI sessions (~/.claude) --- # Find all projects Glob: ~/.claude/projects/*/ # Find memory files (highest value) Glob: ~/.claude/projects/*/memory/*.md # Find conversation JSONL files Glob: ~/.claude/projects/*/*.jsonl # --- Source 2: Desktop app local-agent-mode sessions --- DESKTOP_SESSIONS="$HOME/Library/Application Support/Claude/local-agent-mode-sessions" # Session metadata find "$DESKTOP_SESSIONS" -name "local_*.json" -maxdepth 4 # Audit logs find "$DESKTOP_SESSIONS" -name "audit.jsonl" # Conversation transcripts find "$DESKTOP_SESSIONS" -name "*.jsonl" -path "*/.claude/projects/*"
Build a unified inventory and classify each file:
- New — not in manifest → needs ingesting
- Modified — in manifest but file is newer → needs re-ingesting
- Unchanged — in manifest and not modified → skip in append mode
Report to the user: "Found X CLI projects, Y desktop sessions. Memory files: A. Conversations: B. Audit logs: C. Delta: D new, E modified."
Step 2: Ingest Memory Files First
Memory files are already structured with YAML frontmatter:
--- name: memory-name description: one-line description type: user|feedback|project|reference --- Memory content here.
For each memory file:
- Read it and parse the frontmatter
type → feeds into an entity page about the user, or concept pages about their domainuser
type → feeds into skills pages (workflow patterns, what works, what doesn't)feedback
type → feeds into entity pages for the projectproject
type → feeds into reference pages pointing to external resourcesreference
The
MEMORY.md index file in each project is a quick summary — read it first to decide which individual memory files are worth reading in full.
Step 3: Parse Conversation JSONL
Each JSONL file is one conversation session. Each line is a JSON object:
{ "type": "user|assistant|progress|file-history-snapshot", "message": { "role": "user|assistant", "content": "text string" }, "uuid": "...", "timestamp": "2026-03-15T10:30:00.000Z", "sessionId": "...", "cwd": "/path/to/project", "version": "2.1.59" }
For assistant messages,
content may be an array of content blocks:
{ "content": [ {"type": "thinking", "text": "..."}, {"type": "text", "text": "The actual response..."}, {"type": "tool_use", "name": "Read", "input": {...}} ] }
What to extract from conversations:
- Filter to
andtype: "user"
entries onlytype: "assistant" - For assistant entries, extract
blocks (skiptext
andthinking
— those are noise)tool_use - The
field tells you which project this conversation belongs tocwd - The project directory name (e.g.,
) tells you the project path-Users-name-Documents-projects-my-app
Skip these:
— internal agent progress updatestype: "progress"
— file state trackingtype: "file-history-snapshot"- Subagent conversations (under
subdirectories) — unless the user specifically askssubagents/
Step 3b: Parse Audit Logs (desktop sessions only)
For each
audit.jsonl found under local-agent-mode-sessions/, read it line by line. Each line is a JSON record of one agent action:
{ "type": "tool_call", "toolName": "Bash", "input": {"command": "npm test"}, "output": "...", "timestamp": "2026-04-10T14:22:00Z", "sessionId": "..." }
What to extract from audit logs:
- File access patterns — which files does the agent repeatedly Read or Edit? These are the high-value files in the project. Note them as project references.
- Shell commands — recurring Bash commands reveal the project's build/test/deploy workflow. Distill these into a
page (e.g. "how this project is built and tested").skills/ - Tool call sequences — if the agent always does Read → Edit → Bash in a particular order, that's a workflow pattern worth capturing.
- Error patterns — failed tool calls (non-zero exit codes, error outputs) reveal pain points, known rough edges, or recurring bugs.
- MCP tool calls — calls to MCP tools reveal which external services and APIs the project integrates with.
Skip from audit logs:
- Routine file reads with no pattern (e.g. reading config files once)
- Tool outputs that are just noise (long stack traces, verbose logs) — summarize the error class, not the full output
- Anything that looks like secrets, tokens, or credentials in command arguments or outputs
Cross-reference with the conversation transcript: The audit log tells you what happened; the conversation tells you why. When both are available for the same session, use them together — the audit log grounds the conversation in concrete actions.
Read the paired
local_<uuid>.json session metadata before processing the audit log — it gives you cwd, startedAt, and title to contextualize the actions.
Step 4: Cluster by Topic
Don't create one wiki page per conversation. Instead:
- Group extracted knowledge by topic across conversations
- A single conversation about "debugging auth + setting up CI" → two separate topics
- Three conversations across different days about "React performance" → one merged topic
- The project directory name gives you a natural first-level grouping
Step 5: Distill into Wiki Pages
Each Claude project maps to a project directory in the vault. The project directory name from
~/.claude/projects/ encodes the original path — decode it to get a clean project name:
-Users/Documents/projects/my-Project → myproject -Users/Documents/projects/Another-app → anotherapp
Project-specific vs. global knowledge
| What you found | Where it goes | Example |
|---|---|---|
| Project architecture decisions | | |
| Project-specific debugging | | |
| General concept the user learned | (global) | |
| Recurring problem across projects | (global) | |
| A tool/service used | (global) | |
| Patterns across many conversations | (global) | |
For each project with content, create or update the project overview page at
projects/<name>/<name>.md — named after the project, not _project.md. Obsidian's graph view uses the filename as the node label, so _project.md makes every project show up as _project in the graph. Naming it <name>.md gives each project a distinct, readable node name.
Important: Distill the knowledge, not the conversation. Don't write "In a conversation on March 15, the user asked about X." Write the knowledge itself, with the conversation as a source attribution.
Write a
frontmatter field on every new/updated page — 1–2 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it. summary:
wiki-query's cheap retrieval path reads this field to avoid opening page bodies.
Mark provenance per the convention in
llm-wiki (Provenance Markers section):
- Memory files are mostly extracted — the user wrote them by hand and they're already distilled. Treat memory-derived claims as extracted unless you're stitching together claims from multiple memory files.
- Conversation distillation is mostly inferred. You're synthesizing a coherent claim from many turns of dialogue, often filling in implicit reasoning. Apply
liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.^[inferred] - Use
when the user changed their mind across sessions or when assistant and user contradicted each other and the resolution is unclear.^[ambiguous] - Write a
frontmatter block on every new/updated page summarizing the rough mix.provenance:
Step 6: Update Manifest, Journal, and Special Files
Update .manifest.json
.manifest.jsonFor each source file processed, add/update its entry with:
,ingested_at
,size_bytesmodified_at
: one ofsource_type
,"claude_conversation"
,"claude_memory"
,"claude_audit_log""claude_desktop_session"
: the decoded project nameproject
andpages_created
listspages_updated
Also update the
projects section of the manifest:
{ "project-name": { "source_path": "~/.claude/projects/-Users-...", "vault_path": "projects/project-name", "last_ingested": "TIMESTAMP", "conversations_ingested": 5, "conversations_total": 8, "memory_files_ingested": 3, "desktop_sessions_ingested": 2, "audit_logs_ingested": 2 } }
Create journal entry + update special files
Update
index.md and log.md per the standard process:
- [TIMESTAMP] CLAUDE_HISTORY_INGEST projects=N conversations=M desktop_sessions=D audit_logs=A pages_updated=X pages_created=Y mode=append|full
— Read hot.md
$OBSIDIAN_VAULT_PATH/hot.md (create from the template in wiki-ingest if missing). Update Recent Activity with a one-line summary — e.g. "Ingested 5 Claude conversations across 2 projects; surfaced patterns in API design and testing strategy." Keep the last 3 operations. Update Active Threads if any ongoing project is now better understood. Update updated timestamp.
Privacy
- Distill and synthesize — don't copy raw conversation text verbatim
- Skip anything that looks like secrets, API keys, passwords, tokens
- If you encounter personal/sensitive content, ask the user before including it
- The user's conversations may reference other people — be thoughtful about what goes in the wiki
Reference
See
references/claude-data-format.md for more details on the data structures.