Continuous-Claude-v3 braintrust-tracing

Braintrust tracing for Claude Code - hook architecture, sub-agent correlation, debugging

install
source · Clone the upstream repo
git clone https://github.com/parcadei/Continuous-Claude-v3
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/parcadei/Continuous-Claude-v3 "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/braintrust-tracing" ~/.claude/skills/parcadei-continuous-claude-v3-braintrust-tracing && rm -rf "$T"
manifest: .claude/skills/braintrust-tracing/SKILL.md
source content

Braintrust Tracing for Claude Code

Comprehensive guide to tracing Claude Code sessions in Braintrust, including sub-agent correlation.

Architecture Overview

                         PARENT SESSION
                    +---------------------+
                    |  SessionStart       |
                    |  (creates root)     |
                    +----------+----------+
                               |
                    +----------v----------+
                    |  UserPromptSubmit   |
                    |  (creates Turn)     |
                    +----------+----------+
                               |
          +--------------------+--------------------+
          |                    |                    |
+---------v--------+  +--------v--------+  +--------v--------+
| PostToolUse      |  | PostToolUse     |  | PreToolUse      |
| (Read span)      |  | (Edit span)     |  | (Task - inject) |
+------------------+  +-----------------+  +--------+--------+
                                                    |
                                         +----------v----------+
                                         |   SUB-AGENT         |
                                         |   SessionStart      |
                                         |   (NEW root_span_id)|
                                         +----------+----------+
                                                    |
                                         +----------v----------+
                                         |   SubagentStop      |
                                         |   (has session_id)  |
                                         +---------------------+

Hook Event Flow

HookTriggerCreatesKey Fields
SessionStartSession beginsRoot span
session_id
,
root_span_id
UserPromptSubmitUser sends promptTurn span
prompt
,
turn_number
PreToolUseBefore tool runs(modifies Task prompts)
tool_input.prompt
PostToolUseAfter tool runsTool span
tool_name
,
input
,
output
StopTurn completesLLM spans
model
,
tokens
,
tool_calls
SubagentStopSub-agent finishes(no span)
session_id
of sub-agent
SessionEndSession ends(finalizes root)
turn_count
,
tool_count

Trace Hierarchy

Session (task span) - root_span_id = session_id
|
+-- Turn 1 (task span)
|   |
|   +-- claude-sonnet (llm span) - model call with tool_use
|   +-- Read (tool span)
|   +-- Edit (tool span)
|   +-- claude-sonnet (llm span) - response after tools
|
+-- Turn 2 (task span)
|   |
|   +-- claude-sonnet (llm span)
|   +-- Task (tool span) -----> [Sub-agent session - SEPARATE trace]
|   +-- claude-sonnet (llm span)
|
+-- Turn 3 ...

Sub-Agent Tracing: What Works and What Doesn't

What Doesn't Work

SessionStart doesn't receive the Task prompt.

We tried injecting trace context into Task prompts via PreToolUse:

# PreToolUse hook injects:
[BRAINTRUST_TRACE_CONTEXT]
{"root_span_id": "abc", "parent_span_id": "xyz", "project_id": "123"}
[/BRAINTRUST_TRACE_CONTEXT]

But SessionStart only receives session metadata, not the modified prompt. The injected context is lost.

What DOES Work

Task spans in parent session contain everything:

  • agentId
    - identifier for the sub-agent run
  • totalTokens
    ,
    totalToolUseCount
    - metrics
  • content
    - full agent response/summary
  • tool_input.prompt
    - original task prompt
  • tool_input.subagent_type
    - agent type (e.g., "oracle")

SubagentStop hook receives the sub-agent's

session_id
:

  • This equals the sub-agent's orphaned trace
    root_span_id
  • Allows correlation between parent Task span and child trace

The Correlation Pattern

Current state: Sub-agents create orphaned traces (new

root_span_id
).

Correlation method:

  1. Query parent session's Task spans for agent metadata
  2. Match
    agentId
    or timing with orphaned traces
  3. Sub-agent's
    session_id
    = its trace's
    root_span_id

Future solution (not yet implemented):

SubagentStop fires -> writes session_id to temp file
PostToolUse (Task) -> reads temp file -> adds child_session_id to Task span metadata

This would link:

Task.agentId
+
Task.child_session_id
-> orphaned trace
root_span_id

State Management

Per-Session State Files

~/.claude/state/braintrust_sessions/
  {session_id}.json       # Per-session state

Each session file contains:

{
  "root_span_id": "abc-123",
  "project_id": "proj-456",
  "turn_count": 5,
  "tool_count": 23,
  "current_turn_span_id": "turn-789",
  "current_turn_start": 1703456789,
  "started": "2025-12-24T10:00:00.000Z",
  "is_subagent": false
}

Global State

~/.claude/state/braintrust_global.json   # Cached project_id
~/.claude/state/braintrust_hook.log      # Debug log

Debugging Commands

Check if Tracing is Active

# View hook logs in real-time
tail -f ~/.claude/state/braintrust_hook.log

# Check if session has state
cat ~/.claude/state/braintrust_sessions/*.json | jq -s '.'

# Verify environment
echo "TRACE_TO_BRAINTRUST=$TRACE_TO_BRAINTRUST"
echo "BRAINTRUST_API_KEY=${BRAINTRUST_API_KEY:+set}"

Query Braintrust Directly

# List recent sessions
uv run python -m runtime.harness scripts/braintrust_analyze.py --sessions 5

# Analyze last session
uv run python -m runtime.harness scripts/braintrust_analyze.py --last-session

# Replay specific session
uv run python -m runtime.harness scripts/braintrust_analyze.py --replay <session-id>

# Find sub-agent traces (orphaned roots)
uv run python -m runtime.harness scripts/braintrust_analyze.py --agent-stats

Debug Hook Execution

# Enable verbose logging
export BRAINTRUST_CC_DEBUG=true

# Test hooks manually
echo '{"session_id":"test-123","type":"resume"}' | \
  bash "$CLAUDE_PROJECT_DIR/.claude/plugins/braintrust-tracing/hooks/session_start.sh"

# Test PreToolUse (Task injection)
echo '{"session_id":"test-123","tool_name":"Task","tool_input":{"prompt":"test"}}' | \
  bash "$CLAUDE_PROJECT_DIR/.claude/plugins/braintrust-tracing/hooks/pre_tool_use.sh"

Troubleshooting Checklist

  1. No traces appearing:

    • Check
      TRACE_TO_BRAINTRUST=true
      in
      .claude/settings.local.json
    • Verify API key:
      echo $BRAINTRUST_API_KEY
    • Check logs:
      tail -20 ~/.claude/state/braintrust_hook.log
  2. Sub-agents not linking:

    • This is expected - sub-agents create orphaned traces
    • Use
      --agent-stats
      to find agent activity
    • Correlate via timing or
      agentId
      in parent Task span
  3. Missing spans:

    • Check
      current_turn_span_id
      in session state
    • Ensure Stop hook runs (turn finalization)
    • Look for "Failed to create" errors in log
  4. State corruption:

    • Remove session state:
      rm ~/.claude/state/braintrust_sessions/*.json
    • Clear global cache:
      rm ~/.claude/state/braintrust_global.json

Key Files

FilePurpose
.claude/plugins/braintrust-tracing/hooks/common.sh
Shared utilities, API, state management
.claude/plugins/braintrust-tracing/hooks/session_start.sh
Creates root span, handles sub-agent context
.claude/plugins/braintrust-tracing/hooks/user_prompt_submit.sh
Creates Turn spans per user message
.claude/plugins/braintrust-tracing/hooks/pre_tool_use.sh
Injects trace context into Task prompts
.claude/plugins/braintrust-tracing/hooks/post_tool_use.sh
Creates tool spans, captures agent/skill metadata
.claude/plugins/braintrust-tracing/hooks/stop_hook.sh
Creates LLM spans, finalizes Turns
.claude/plugins/braintrust-tracing/hooks/session_end.sh
Finalizes session, triggers learning extraction
scripts/braintrust_analyze.py
Query and analyze traced sessions
~/.claude/state/braintrust_sessions/
Per-session state files
~/.claude/state/braintrust_hook.log
Debug log

Environment Variables

VariableRequiredDefaultDescription
TRACE_TO_BRAINTRUST
Yes-Set to
"true"
to enable
BRAINTRUST_API_KEY
Yes-API key for Braintrust
BRAINTRUST_CC_PROJECT
No
claude-code
Project name
BRAINTRUST_CC_DEBUG
No
false
Verbose logging
BRAINTRUST_API_URL
No
https://api.braintrust.dev
API endpoint

Session Learnings

What We Learned About Sub-Agent Tracing (Dec 2025)

Attempted: Inject trace context via PreToolUse into Task prompts.

Result: Failed - SessionStart only receives session metadata, not the prompt.

Discovery: Task spans already contain rich sub-agent data:

  • metadata.agent_type
    - agent type from
    subagent_type
  • metadata.skill_name
    - skill from Skill tool
  • tool_input
    - full prompt sent to agent
  • tool_output
    - agent response

Current correlation path:

  1. Parent session Task span has
    agentId
    and timing
  2. Sub-agent creates orphaned trace with
    root_span_id = session_id
  3. SubagentStop provides the sub-agent's
    session_id
  4. Manual correlation: match timing or use
    session_id
    link

Future work: Write

child_session_id
to Task span metadata from PostToolUse after SubagentStop.

What We Learned About Sub-Agent Correlation

The Problem

  • Sub-agents spawned via Task tool create orphaned Braintrust traces
  • Parent session has Task spans with
    agentId
    , sub-agent has separate
    session_id
  • No built-in link between them

What DOESN'T Work

1. Prompt injection via PreToolUse

SessionStart hook only receives session metadata (

session_id
,
type
,
cwd
), NOT the prompt. Injected trace context is never seen.

The hook receives:

{
  "session_id": "...",
  "type": "start|resume|compact|clear",
  "cwd": "...",
  "env": {...}
}

No prompt field exists - context injection is impossible at SessionStart.

2. SubagentStop → PostToolUse file handoff

Race condition. These are independent async hooks with no timing guarantees:

  • SubagentStop fires when sub-agent session ends
  • PostToolUse (Task) fires when Task tool completes
  • No ordering guarantee between them
  • Writing to a correlation file creates a race

3. PreToolUse correlation files

SessionStart can't access the

task_span_id
because it has no context about which Task spawned it. PreToolUse modifies prompts but doesn't create a reliably accessible state file that SessionStart can find.

What DOES Work

Post-hoc matching for dataset building:

Parent session Task spans contain:

  • agentId
    - identifier for the sub-agent run
  • totalTokens
    ,
    totalToolUseCount
    - aggregated metrics
  • content
    - full agent response/summary
  • tool_input.prompt
    - original task prompt
  • tool_input.subagent_type
    - agent type (e.g., "oracle")
  • Start/end timestamps

Sub-agent sessions contain:

  • session_id
    (equals orphaned trace
    root_span_id
    )
  • Start/end timestamps
  • All internal spans and tool calls

Correlation strategy:

  1. Export parent session traces (query parent
    root_span_id
    )
  2. Export sub-agent traces (query all sessions created within parent's time window)
  3. Match by:
    • Timing: Task span end ≈ sub-agent session end
    • Metadata:
      subagent_type
      from Task prompt
    • IDs: SubagentStop hook provides
      session_id
      (can be captured and logged)

Architecture Insight

SessionStart input is intentionally minimal - it contains no prompt or tool context:

interface SessionStartInput {
  session_id: string;
  type: "start" | "resume" | "compact" | "clear";
  cwd: string;
  env: { [key: string]: string };
  // NO: prompt, tool_context, task_span_id, parent_span_id
}

This design boundary prevents real-time correlation at hook time.

Recommendation

For building agent run datasets with sub-agent correlation:

  1. In-session logging: Capture SubagentStop
    session_id
    in logs or state
  2. Post-session export: Query Braintrust API for parent and sub-agent traces
  3. Offline correlation: Match traces by timing and metadata in a script
  4. Don't try real-time linking: Hooks don't have necessary context

Example script pattern:

# 1. Export parent session
braintrust_analyze.py --replay <parent-session-id> > parent_traces.json

# 2. Query for orphaned sub-agent traces (those created during parent's time window)
braintrust_analyze.py --agent-stats > all_agent_traces.json

# 3. Correlate in Python:
#    - Parent Task spans -> agentId, timestamps, subagent_type
#    - Orphaned traces -> root_span_id, timestamps
#    - Match by timing and type

This approach is reliable, testable, and doesn't require hooks to maintain implicit state.