install
source · Clone the upstream repo
git clone https://github.com/openai/symphony
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openai/symphony "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.codex/skills/debug" ~/.claude/skills/openai-symphony-debug && rm -rf "$T"
manifest:
.codex/skills/debug/SKILL.mdsource content
Debug
Goals
- Find why a run is stuck, retrying, or failing.
- Correlate Linear issue identity to a Codex session quickly.
- Read the right logs in the right order to isolate root cause.
Log Sources
- Primary runtime log:
log/symphony.log- Default comes from
(SymphonyElixir.LogFile
).log/symphony.log - Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
- Default comes from
- Rotated runtime logs:
log/symphony.log*- Check these when the relevant run is older.
Correlation Keys
: human ticket key (example:issue_identifier
)MT-625
: Linear UUID (stable internal ID)issue_id
: Codex thread-turn pair (session_id
)<thread_id>-<turn_id>
elixir/docs/logging.md requires these fields for issue/session lifecycle logs. Use
them as your join keys during debugging.
Quick Triage (Stuck Run)
- Confirm scheduler/worker symptoms for the ticket.
- Find recent lines for the ticket (
first).issue_identifier - Extract
from matching lines.session_id - Trace that
across start, stream, completion/failure, and stall handling logs.session_id - Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.
Commands
# 1) Narrow by ticket key (fastest entry point) rg -n "issue_identifier=MT-625" log/symphony.log* # 2) If needed, narrow by Linear UUID rg -n "issue_id=<linear-uuid>" log/symphony.log* # 3) Pull session IDs seen for that ticket rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u # 4) Trace one session end-to-end rg -n "session_id=<thread>-<turn>" log/symphony.log* # 5) Focus on stuck/retry signals rg -n "Issue stalled|scheduling retry|turn_timeout|turn_failed|Codex session failed|Codex session ended with error" log/symphony.log*
Investigation Flow
- Locate the ticket slice:
- Search by
.issue_identifier=<KEY> - If noise is high, add
.issue_id=<UUID>
- Search by
- Establish timeline:
- Identify first
.Codex session started ... session_id=... - Follow with
,Codex session completed
, or worker exit lines.ended with error
- Identify first
- Classify the problem:
- Stall loop:
.Issue stalled ... restarting with backoff - App-server startup:
.Codex session failed ... - Turn execution failure:
,turn_failed
,turn_cancelled
, orturn_timeout
.ended with error - Worker crash:
.Agent task exited ... reason=...
- Stall loop:
- Validate scope:
- Check whether failures are isolated to one issue/session or repeating across multiple tickets.
- Capture evidence:
- Save key log lines with timestamps,
,issue_identifier
, andissue_id
.session_id - Record probable root cause and the exact failing stage.
- Save key log lines with timestamps,
Reading Codex Session Logs
In Symphony, Codex session diagnostics are emitted into
log/symphony.log and
keyed by session_id. Read them as a lifecycle:
Codex session started ... session_id=...- Session stream/lifecycle events for the same
session_id - Terminal event:
, orCodex session completed ...
, orCodex session ended with error ...Issue stalled ... restarting with backoff
For one specific session investigation, keep the trace narrow:
- Capture one
for the ticket.session_id - Build a timestamped slice for only that session:
rg -n "session_id=<thread>-<turn>" log/symphony.log*
- Mark the exact failing stage:
- Startup failure before stream events (
).Codex session failed ... - Turn/runtime failure after stream events (
/turn_*
).ended with error - Stall recovery (
).Issue stalled ... restarting with backoff
- Startup failure before stream events (
- Pair findings with
andissue_identifier
from nearby lines to confirm you are not mixing concurrent retries.issue_id
Always pair session findings with
issue_identifier/issue_id to avoid mixing
concurrent runs.
Notes
- Prefer
overrg
for speed on large logs.grep - Check rotated logs (
) before concluding data is missing.log/symphony.log* - If required context fields are missing in new log statements, align with
conventions.elixir/docs/logging.md