Obsidian-wiki codex-history-ingest
git clone https://github.com/Ar9av/obsidian-wiki
T=$(mktemp -d) && git clone --depth=1 https://github.com/Ar9av/obsidian-wiki "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.skills/codex-history-ingest" ~/.claude/skills/ar9av-obsidian-wiki-codex-history-ingest && rm -rf "$T"
.skills/codex-history-ingest/SKILL.mdCodex History Ingest — Conversation Mining
You are extracting knowledge from the user's past Codex sessions and distilling it into the Obsidian wiki. Session logs are rich but noisy: focus on durable knowledge, not operational telemetry.
This skill can be invoked directly or via the
wiki-history-ingest router (/wiki-history-ingest codex).
Before You Start
- Read
to get.env
andOBSIDIAN_VAULT_PATH
(default toCODEX_HISTORY_PATH
if unset)~/.codex - Read
at the vault root to check what has already been ingested.manifest.json - Read
at the vault root to understand what the wiki already containsindex.md
Ingest Modes
Append Mode (default)
Check
.manifest.json for each source file. Only process:
- Files not in the manifest (new session rollouts, new index files)
- Files whose modification time is newer than
in the manifestingested_at
Use this mode for regular syncs.
Full Mode
Process everything regardless of manifest. Use after
wiki-rebuild or if the user explicitly asks for a full re-ingest.
Codex Data Layout
Codex stores local artifacts under
~/.codex/.
~/.codex/ ├── sessions/ # Session rollout logs by date │ └── YYYY/MM/DD/ │ └── rollout-<timestamp>-<id>.jsonl ├── archived_sessions/ # Archived rollout logs ├── session_index.jsonl # Lightweight index of thread id/name/updated_at ├── history.jsonl # Local transcript history (if persistence enabled) ├── config.toml # User config (contains history settings) └── state_*.sqlite / logs_*.sqlite # Runtime DBs (usually skip)
Key data sources ranked by value
— best inventory source for IDs, titles, and freshnesssession_index.jsonl
— rich structured transcript eventssessions/**/rollout-*.jsonl
— useful fallback/timeline aid if enabledhistory.jsonl
Avoid ingesting SQLite internals unless the user explicitly asks.
Step 1: Survey and Compute Delta
Scan
CODEX_HISTORY_PATH and compare against .manifest.json:
~/.codex/session_index.jsonl~/.codex/sessions/**/rollout-*.jsonl
(optional; only if user asks for archived history)~/.codex/archived_sessions/**
(optional fallback)~/.codex/history.jsonl
Classify each file:
- New — not in manifest
- Modified — in manifest but file is newer than
ingested_at - Unchanged — already ingested and unchanged
Report a concise delta summary before deep parsing.
Step 2: Parse Session Index First
session_index.jsonl typically has entries like:
{"id":"...","thread_name":"...","updated_at":"..."}
Use it to:
- Build a canonical session inventory
- Prioritize recent/high-signal sessions
- Map rollout IDs to human-readable thread names
Step 3: Parse Rollout JSONL Safely
Each
rollout-*.jsonl line is an event envelope with:
{ "timestamp": "...", "type": "session_meta|turn_context|event_msg|response_item", "payload": { ... } }
Extraction rules
- Prioritize user intent and assistant-visible outputs
- Favor
records with user/assistant message contentresponse_item - Use
selectively for meaningful milestones; ignore pure telemetryevent_msg - Treat
as metadata (cwd, model, ids), not user knowledgesession_meta
Skip/noise filters
- Token accounting events
- Tool plumbing with no semantic content
- Raw command output unless it contains reusable decisions/patterns
- Repeated plan snapshots unless they add novel decisions
Critical privacy filter
Rollout logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim system/developer prompts or secrets.
- Remove API keys, tokens, passwords, credentials
- Redact private identifiers unless relevant and approved
- Summarize instead of quoting raw transcripts
Step 4: Cluster by Topic
Do not create one wiki page per session.
- Group by stable topics across many sessions
- Split mixed sessions into separate themes
- Merge recurring concepts across dates/projects
- Use
from metadata to infer project scopecwd
Step 5: Distill into Wiki Pages
Route extracted knowledge using existing wiki conventions:
- Project-specific architecture/process ->
projects/<name>/... - General concepts ->
concepts/ - Recurring techniques/debug playbooks ->
skills/ - Tools/services ->
entities/ - Cross-session patterns ->
synthesis/
For each impacted project, create/update
projects/<name>/<name>.md (project name as filename, never _project.md).
Writing rules
- Distill knowledge, not chronology
- Avoid "on date X we discussed..." unless date context is essential
- Add
frontmatter on each new/updated page (1-2 sentences, <= 200 chars)summary: - Add provenance markers:
when directly grounded in explicit session content^[extracted]
when synthesizing patterns across events/sessions^[inferred]
when sessions conflict^[ambiguous]
- Add/update
frontmatter mix for each changed pageprovenance:
Step 6: Update Manifest, Log, and Index
Update .manifest.json
.manifest.jsonFor each processed source file:
,ingested_at
,size_bytesmodified_at
:source_type
|codex_rollout
|codex_indexcodex_history
: inferred project name (when applicable)project
,pages_createdpages_updated
Add/update a top-level project/session summary block:
{ "project-name": { "source_path": "~/.codex/sessions/...", "last_ingested": "TIMESTAMP", "sessions_ingested": 12, "sessions_total": 40, "index_updated_at": "TIMESTAMP" } }
Update special files
Update
index.md and log.md:
- [TIMESTAMP] CODEX_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full
— Read hot.md
$OBSIDIAN_VAULT_PATH/hot.md (create from the template in wiki-ingest if missing). Update Recent Activity with a one-line summary — e.g. "Ingested 12 Codex sessions; surfaced recurring patterns in CLI tooling and shell scripting." Keep the last 3 operations. Update updated timestamp.
Privacy and Compliance
- Distill and synthesize; avoid raw transcript dumps
- Default to redaction for anything that looks sensitive
- Ask the user before storing personal/sensitive details
- Keep references to other people minimal and purpose-bound
Reference
See
references/codex-data-format.md for field-level parsing notes and extraction guidance.