Symphony debug

Debug

install

source · Clone the upstream repo

git clone https://github.com/openai/symphony

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openai/symphony "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.codex/skills/debug" ~/.claude/skills/openai-symphony-debug && rm -rf "$T"

manifest: .codex/skills/debug/SKILL.md

source content

Debug

Goals

Find why a run is stuck, retrying, or failing.
Correlate Linear issue identity to a Codex session quickly.
Read the right logs in the right order to isolate root cause.

Log Sources

Primary runtime log:
```
log/symphony.log
```
- Default comes from
```
SymphonyElixir.LogFile
```
  (
```
log/symphony.log
```
  ).
- Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
Rotated runtime logs:
```
log/symphony.log*
```
- Check these when the relevant run is older.

Correlation Keys

```
issue_identifier
```
: human ticket key (example:
```
MT-625
```
)
```
issue_id
```
: Linear UUID (stable internal ID)
```
session_id
```
: Codex thread-turn pair (
```
<thread_id>-<turn_id>
```
)

elixir/docs/logging.md

requires these fields for issue/session lifecycle logs. Use them as your join keys during debugging.

Quick Triage (Stuck Run)

Confirm scheduler/worker symptoms for the ticket.
Find recent lines for the ticket (
```
issue_identifier
```
first).
Extract
```
session_id
```
from matching lines.
Trace that
```
session_id
```
across start, stream, completion/failure, and stall handling logs.
Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.

Commands

# 1) Narrow by ticket key (fastest entry point)
rg -n "issue_identifier=MT-625" log/symphony.log*

# 2) If needed, narrow by Linear UUID
rg -n "issue_id=<linear-uuid>" log/symphony.log*

# 3) Pull session IDs seen for that ticket
rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u

# 4) Trace one session end-to-end
rg -n "session_id=<thread>-<turn>" log/symphony.log*

# 5) Focus on stuck/retry signals
rg -n "Issue stalled|scheduling retry|turn_timeout|turn_failed|Codex session failed|Codex session ended with error" log/symphony.log*

Investigation Flow

Locate the ticket slice:
- Search by
```
issue_identifier=<KEY>
```
  .
- If noise is high, add
```
issue_id=<UUID>
```
  .

Establish timeline:

Identify first

Codex session started ... session_id=...

Follow with
```
Codex session completed
```
,
```
ended with error
```
, or worker exit lines.

Classify the problem:

Stall loop:

Issue stalled ... restarting with backoff

App-server startup:
```
Codex session failed ...
```
.

Turn execution failure:

turn_failed

turn_cancelled

turn_timeout

, or

ended with error

Worker crash:
```
Agent task exited ... reason=...
```
.

Validate scope:
- Check whether failures are isolated to one issue/session or repeating across multiple tickets.
Capture evidence:
- Save key log lines with timestamps,
```
issue_identifier
```
  ,
```
issue_id
```
  , and
```
session_id
```
  .
- Record probable root cause and the exact failing stage.

Reading Codex Session Logs

In Symphony, Codex session diagnostics are emitted into

log/symphony.log

and keyed by

session_id

. Read them as a lifecycle:

Codex session started ... session_id=...

Session stream/lifecycle events for the same
```
session_id
```

Terminal event:

```
Codex session completed ...
```
, or
```
Codex session ended with error ...
```
, or

Issue stalled ... restarting with backoff

For one specific session investigation, keep the trace narrow:

Capture one
```
session_id
```
for the ticket.

Build a timestamped slice for only that session:

rg -n "session_id=<thread>-<turn>" log/symphony.log*

Mark the exact failing stage:
- Startup failure before stream events (
```
Codex session failed ...
```
  ).
- Turn/runtime failure after stream events (
```
turn_*
```
  /
```
ended with error
```
  ).
- Stall recovery (
```
Issue stalled ... restarting with backoff
```
  ).
Pair findings with
```
issue_identifier
```
and
```
issue_id
```
from nearby lines to confirm you are not mixing concurrent retries.

Always pair session findings with

issue_identifier

issue_id

to avoid mixing concurrent runs.

Notes

Prefer
```
rg
```
over
```
grep
```
for speed on large logs.
Check rotated logs (
```
log/symphony.log*
```
) before concluding data is missing.
If required context fields are missing in new log statements, align with
```
elixir/docs/logging.md
```
conventions.