Continuous-Claude-v3 braintrust-tracing
Braintrust tracing for Claude Code - hook architecture, sub-agent correlation, debugging
git clone https://github.com/parcadei/Continuous-Claude-v3
T=$(mktemp -d) && git clone --depth=1 https://github.com/parcadei/Continuous-Claude-v3 "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/braintrust-tracing" ~/.claude/skills/parcadei-continuous-claude-v3-braintrust-tracing && rm -rf "$T"
.claude/skills/braintrust-tracing/SKILL.mdBraintrust Tracing for Claude Code
Comprehensive guide to tracing Claude Code sessions in Braintrust, including sub-agent correlation.
Architecture Overview
PARENT SESSION +---------------------+ | SessionStart | | (creates root) | +----------+----------+ | +----------v----------+ | UserPromptSubmit | | (creates Turn) | +----------+----------+ | +--------------------+--------------------+ | | | +---------v--------+ +--------v--------+ +--------v--------+ | PostToolUse | | PostToolUse | | PreToolUse | | (Read span) | | (Edit span) | | (Task - inject) | +------------------+ +-----------------+ +--------+--------+ | +----------v----------+ | SUB-AGENT | | SessionStart | | (NEW root_span_id)| +----------+----------+ | +----------v----------+ | SubagentStop | | (has session_id) | +---------------------+
Hook Event Flow
| Hook | Trigger | Creates | Key Fields |
|---|---|---|---|
| SessionStart | Session begins | Root span | , |
| UserPromptSubmit | User sends prompt | Turn span | , |
| PreToolUse | Before tool runs | (modifies Task prompts) | |
| PostToolUse | After tool runs | Tool span | , , |
| Stop | Turn completes | LLM spans | , , |
| SubagentStop | Sub-agent finishes | (no span) | of sub-agent |
| SessionEnd | Session ends | (finalizes root) | , |
Trace Hierarchy
Session (task span) - root_span_id = session_id | +-- Turn 1 (task span) | | | +-- claude-sonnet (llm span) - model call with tool_use | +-- Read (tool span) | +-- Edit (tool span) | +-- claude-sonnet (llm span) - response after tools | +-- Turn 2 (task span) | | | +-- claude-sonnet (llm span) | +-- Task (tool span) -----> [Sub-agent session - SEPARATE trace] | +-- claude-sonnet (llm span) | +-- Turn 3 ...
Sub-Agent Tracing: What Works and What Doesn't
What Doesn't Work
SessionStart doesn't receive the Task prompt.
We tried injecting trace context into Task prompts via PreToolUse:
# PreToolUse hook injects: [BRAINTRUST_TRACE_CONTEXT] {"root_span_id": "abc", "parent_span_id": "xyz", "project_id": "123"} [/BRAINTRUST_TRACE_CONTEXT]
But SessionStart only receives session metadata, not the modified prompt. The injected context is lost.
What DOES Work
Task spans in parent session contain everything:
- identifier for the sub-agent runagentId
,totalTokens
- metricstotalToolUseCount
- full agent response/summarycontent
- original task prompttool_input.prompt
- agent type (e.g., "oracle")tool_input.subagent_type
SubagentStop hook receives the sub-agent's
:session_id
- This equals the sub-agent's orphaned trace
root_span_id - Allows correlation between parent Task span and child trace
The Correlation Pattern
Current state: Sub-agents create orphaned traces (new
root_span_id).
Correlation method:
- Query parent session's Task spans for agent metadata
- Match
or timing with orphaned tracesagentId - Sub-agent's
= its trace'ssession_idroot_span_id
Future solution (not yet implemented):
SubagentStop fires -> writes session_id to temp file PostToolUse (Task) -> reads temp file -> adds child_session_id to Task span metadata
This would link:
Task.agentId + Task.child_session_id -> orphaned trace root_span_id
State Management
Per-Session State Files
~/.claude/state/braintrust_sessions/ {session_id}.json # Per-session state
Each session file contains:
{ "root_span_id": "abc-123", "project_id": "proj-456", "turn_count": 5, "tool_count": 23, "current_turn_span_id": "turn-789", "current_turn_start": 1703456789, "started": "2025-12-24T10:00:00.000Z", "is_subagent": false }
Global State
~/.claude/state/braintrust_global.json # Cached project_id ~/.claude/state/braintrust_hook.log # Debug log
Debugging Commands
Check if Tracing is Active
# View hook logs in real-time tail -f ~/.claude/state/braintrust_hook.log # Check if session has state cat ~/.claude/state/braintrust_sessions/*.json | jq -s '.' # Verify environment echo "TRACE_TO_BRAINTRUST=$TRACE_TO_BRAINTRUST" echo "BRAINTRUST_API_KEY=${BRAINTRUST_API_KEY:+set}"
Query Braintrust Directly
# List recent sessions uv run python -m runtime.harness scripts/braintrust_analyze.py --sessions 5 # Analyze last session uv run python -m runtime.harness scripts/braintrust_analyze.py --last-session # Replay specific session uv run python -m runtime.harness scripts/braintrust_analyze.py --replay <session-id> # Find sub-agent traces (orphaned roots) uv run python -m runtime.harness scripts/braintrust_analyze.py --agent-stats
Debug Hook Execution
# Enable verbose logging export BRAINTRUST_CC_DEBUG=true # Test hooks manually echo '{"session_id":"test-123","type":"resume"}' | \ bash "$CLAUDE_PROJECT_DIR/.claude/plugins/braintrust-tracing/hooks/session_start.sh" # Test PreToolUse (Task injection) echo '{"session_id":"test-123","tool_name":"Task","tool_input":{"prompt":"test"}}' | \ bash "$CLAUDE_PROJECT_DIR/.claude/plugins/braintrust-tracing/hooks/pre_tool_use.sh"
Troubleshooting Checklist
-
No traces appearing:
- Check
inTRACE_TO_BRAINTRUST=true.claude/settings.local.json - Verify API key:
echo $BRAINTRUST_API_KEY - Check logs:
tail -20 ~/.claude/state/braintrust_hook.log
- Check
-
Sub-agents not linking:
- This is expected - sub-agents create orphaned traces
- Use
to find agent activity--agent-stats - Correlate via timing or
in parent Task spanagentId
-
Missing spans:
- Check
in session statecurrent_turn_span_id - Ensure Stop hook runs (turn finalization)
- Look for "Failed to create" errors in log
- Check
-
State corruption:
- Remove session state:
rm ~/.claude/state/braintrust_sessions/*.json - Clear global cache:
rm ~/.claude/state/braintrust_global.json
- Remove session state:
Key Files
| File | Purpose |
|---|---|
| Shared utilities, API, state management |
| Creates root span, handles sub-agent context |
| Creates Turn spans per user message |
| Injects trace context into Task prompts |
| Creates tool spans, captures agent/skill metadata |
| Creates LLM spans, finalizes Turns |
| Finalizes session, triggers learning extraction |
| Query and analyze traced sessions |
| Per-session state files |
| Debug log |
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
| Yes | - | Set to to enable |
| Yes | - | API key for Braintrust |
| No | | Project name |
| No | | Verbose logging |
| No | | API endpoint |
Session Learnings
What We Learned About Sub-Agent Tracing (Dec 2025)
Attempted: Inject trace context via PreToolUse into Task prompts.
Result: Failed - SessionStart only receives session metadata, not the prompt.
Discovery: Task spans already contain rich sub-agent data:
- agent type frommetadata.agent_typesubagent_type
- skill from Skill toolmetadata.skill_name
- full prompt sent to agenttool_input
- agent responsetool_output
Current correlation path:
- Parent session Task span has
and timingagentId - Sub-agent creates orphaned trace with
root_span_id = session_id - SubagentStop provides the sub-agent's
session_id - Manual correlation: match timing or use
linksession_id
Future work: Write
child_session_id to Task span metadata from PostToolUse after SubagentStop.
What We Learned About Sub-Agent Correlation
The Problem
- Sub-agents spawned via Task tool create orphaned Braintrust traces
- Parent session has Task spans with
, sub-agent has separateagentIdsession_id - No built-in link between them
What DOESN'T Work
1. Prompt injection via PreToolUse
SessionStart hook only receives session metadata (
session_id, type, cwd), NOT the prompt. Injected trace context is never seen.
The hook receives:
{ "session_id": "...", "type": "start|resume|compact|clear", "cwd": "...", "env": {...} }
No prompt field exists - context injection is impossible at SessionStart.
2. SubagentStop → PostToolUse file handoff
Race condition. These are independent async hooks with no timing guarantees:
- SubagentStop fires when sub-agent session ends
- PostToolUse (Task) fires when Task tool completes
- No ordering guarantee between them
- Writing to a correlation file creates a race
3. PreToolUse correlation files
SessionStart can't access the
task_span_id because it has no context about which Task spawned it. PreToolUse modifies prompts but doesn't create a reliably accessible state file that SessionStart can find.
What DOES Work
Post-hoc matching for dataset building:
Parent session Task spans contain:
- identifier for the sub-agent runagentId
,totalTokens
- aggregated metricstotalToolUseCount
- full agent response/summarycontent
- original task prompttool_input.prompt
- agent type (e.g., "oracle")tool_input.subagent_type- Start/end timestamps
Sub-agent sessions contain:
(equals orphaned tracesession_id
)root_span_id- Start/end timestamps
- All internal spans and tool calls
Correlation strategy:
- Export parent session traces (query parent
)root_span_id - Export sub-agent traces (query all sessions created within parent's time window)
- Match by:
- Timing: Task span end ≈ sub-agent session end
- Metadata:
from Task promptsubagent_type - IDs: SubagentStop hook provides
(can be captured and logged)session_id
Architecture Insight
SessionStart input is intentionally minimal - it contains no prompt or tool context:
interface SessionStartInput { session_id: string; type: "start" | "resume" | "compact" | "clear"; cwd: string; env: { [key: string]: string }; // NO: prompt, tool_context, task_span_id, parent_span_id }
This design boundary prevents real-time correlation at hook time.
Recommendation
For building agent run datasets with sub-agent correlation:
- In-session logging: Capture SubagentStop
in logs or statesession_id - Post-session export: Query Braintrust API for parent and sub-agent traces
- Offline correlation: Match traces by timing and metadata in a script
- Don't try real-time linking: Hooks don't have necessary context
Example script pattern:
# 1. Export parent session braintrust_analyze.py --replay <parent-session-id> > parent_traces.json # 2. Query for orphaned sub-agent traces (those created during parent's time window) braintrust_analyze.py --agent-stats > all_agent_traces.json # 3. Correlate in Python: # - Parent Task spans -> agentId, timestamps, subagent_type # - Orphaned traces -> root_span_id, timestamps # - Match by timing and type
This approach is reliable, testable, and doesn't require hooks to maintain implicit state.