Obsidian-wiki codex-history-ingest

install

source · Clone the upstream repo

git clone https://github.com/Ar9av/obsidian-wiki

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Ar9av/obsidian-wiki "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.skills/codex-history-ingest" ~/.claude/skills/ar9av-obsidian-wiki-codex-history-ingest && rm -rf "$T"

manifest: .skills/codex-history-ingest/SKILL.md

source content

Codex History Ingest — Conversation Mining

You are extracting knowledge from the user's past Codex sessions and distilling it into the Obsidian wiki. Session logs are rich but noisy: focus on durable knowledge, not operational telemetry.

This skill can be invoked directly or via the

wiki-history-ingest

router (

/wiki-history-ingest codex

Before You Start

Read

.env

to get

OBSIDIAN_VAULT_PATH

and

CODEX_HISTORY_PATH

(default to

~/.codex

if unset)

Read
```
.manifest.json
```
at the vault root to check what has already been ingested
Read
```
index.md
```
at the vault root to understand what the wiki already contains

Ingest Modes

Append Mode (default)

Check

.manifest.json

for each source file. Only process:

Files not in the manifest (new session rollouts, new index files)
Files whose modification time is newer than
```
ingested_at
```
in the manifest

Use this mode for regular syncs.

Full Mode

Process everything regardless of manifest. Use after

wiki-rebuild

or if the user explicitly asks for a full re-ingest.

Codex Data Layout

Codex stores local artifacts under

~/.codex/

~/.codex/
├── sessions/                          # Session rollout logs by date
│   └── YYYY/MM/DD/
│       └── rollout-<timestamp>-<id>.jsonl
├── archived_sessions/                 # Archived rollout logs
├── session_index.jsonl                # Lightweight index of thread id/name/updated_at
├── history.jsonl                      # Local transcript history (if persistence enabled)
├── config.toml                        # User config (contains history settings)
└── state_*.sqlite / logs_*.sqlite     # Runtime DBs (usually skip)

Key data sources ranked by value

```
session_index.jsonl
```
— best inventory source for IDs, titles, and freshness
```
sessions/**/rollout-*.jsonl
```
— rich structured transcript events
```
history.jsonl
```
— useful fallback/timeline aid if enabled

Avoid ingesting SQLite internals unless the user explicitly asks.

Step 1: Survey and Compute Delta

Scan

CODEX_HISTORY_PATH

and compare against

.manifest.json

```
~/.codex/session_index.jsonl
```
```
~/.codex/sessions/**/rollout-*.jsonl
```
```
~/.codex/archived_sessions/**
```
(optional; only if user asks for archived history)
```
~/.codex/history.jsonl
```
(optional fallback)

Classify each file:

New — not in manifest
Modified — in manifest but file is newer than
```
ingested_at
```
Unchanged — already ingested and unchanged

Report a concise delta summary before deep parsing.

Step 2: Parse Session Index First

session_index.jsonl

typically has entries like:

{"id":"...","thread_name":"...","updated_at":"..."}

Use it to:

Build a canonical session inventory
Prioritize recent/high-signal sessions
Map rollout IDs to human-readable thread names

Step 3: Parse Rollout JSONL Safely

Each

rollout-*.jsonl

line is an event envelope with:

{
  "timestamp": "...",
  "type": "session_meta|turn_context|event_msg|response_item",
  "payload": { ... }
}

Extraction rules

Prioritize user intent and assistant-visible outputs
Favor
```
response_item
```
records with user/assistant message content
Use
```
event_msg
```
selectively for meaningful milestones; ignore pure telemetry
Treat
```
session_meta
```
as metadata (cwd, model, ids), not user knowledge

Skip/noise filters

Token accounting events
Tool plumbing with no semantic content
Raw command output unless it contains reusable decisions/patterns
Repeated plan snapshots unless they add novel decisions

Critical privacy filter

Rollout logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim system/developer prompts or secrets.

Remove API keys, tokens, passwords, credentials
Redact private identifiers unless relevant and approved
Summarize instead of quoting raw transcripts

Step 4: Cluster by Topic

Do not create one wiki page per session.

Group by stable topics across many sessions
Split mixed sessions into separate themes
Merge recurring concepts across dates/projects
Use
```
cwd
```
from metadata to infer project scope

Step 5: Distill into Wiki Pages

Route extracted knowledge using existing wiki conventions:

Project-specific architecture/process ->
```
projects/<name>/...
```
General concepts ->
```
concepts/
```
Recurring techniques/debug playbooks ->
```
skills/
```
Tools/services ->
```
entities/
```
Cross-session patterns ->
```
synthesis/
```

For each impacted project, create/update

projects/<name>/<name>.md

(project name as filename, never

_project.md

Writing rules

Distill knowledge, not chronology
Avoid "on date X we discussed..." unless date context is essential
Add
```
summary:
```
frontmatter on each new/updated page (1-2 sentences, <= 200 chars)
Add provenance markers:
- ```
^[extracted]
```
  when directly grounded in explicit session content
- ```
^[inferred]
```
  when synthesizing patterns across events/sessions
- ```
^[ambiguous]
```
  when sessions conflict
Add/update
```
provenance:
```
frontmatter mix for each changed page

Step 6: Update Manifest, Log, and Index

Update

.manifest.json

For each processed source file:

```
ingested_at
```
,
```
size_bytes
```
,
```
modified_at
```

source_type

codex_rollout

codex_index

codex_history

```
project
```
: inferred project name (when applicable)
```
pages_created
```
,
```
pages_updated
```

Add/update a top-level project/session summary block:

{
  "project-name": {
    "source_path": "~/.codex/sessions/...",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 12,
    "sessions_total": 40,
    "index_updated_at": "TIMESTAMP"
  }
}

Update special files

Update

index.md

and

log.md

- [TIMESTAMP] CODEX_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full

hot.md

— Read

$OBSIDIAN_VAULT_PATH/hot.md

(create from the template in

wiki-ingest

if missing). Update Recent Activity with a one-line summary — e.g. "Ingested 12 Codex sessions; surfaced recurring patterns in CLI tooling and shell scripting." Keep the last 3 operations. Update

updated

timestamp.

Privacy and Compliance

Distill and synthesize; avoid raw transcript dumps
Default to redaction for anything that looks sensitive
Ask the user before storing personal/sensitive details
Keep references to other people minimal and purpose-bound

Reference

See

references/codex-data-format.md

for field-level parsing notes and extraction guidance.