Agentic-context-engine kayba-stage-2-domain-context
Gather domain context about the repository and agent — system prompt, tool definitions, domain docs, and behavior patterns from traces. Trigger when the user says "run stage 2", "gather context", "domain context", or when invoked by the kayba-pipeline orchestrator.
git clone https://github.com/kayba-ai/agentic-context-engine
T=$(mktemp -d) && git clone --depth=1 https://github.com/kayba-ai/agentic-context-engine "$T" && mkdir -p ~/.claude/skills && cp -r "$T/ace/cli/skills/kayba-pipeline/stage-2-domain-context" ~/.claude/skills/kayba-ai-agentic-context-engine-kayba-stage-2-domain-context-c4dc12 && rm -rf "$T"
ace/cli/skills/kayba-pipeline/stage-2-domain-context/SKILL.mdStage 2: Domain Context Gathering
Understand the agent's world — what it does, what tools it has, and what "success" looks like.
Inputs
— path to directory containing trace JSON filesTRACES_FOLDER
Process
0. Detect trace format
Before reading traces, identify the framework that produced them. Read 1 trace file and check:
| Signal | Framework |
|---|---|
, , with // | tau2-bench |
with , | LangChain / LangSmith |
with , , | LlamaIndex |
at top level | Raw OpenAI API logs |
with , | OpenTelemetry / Arize / Langfuse |
Record the detected format in the output under Trace Format. All subsequent trace-reading steps use the field paths appropriate for that format.
If the format is unrecognized, note the top-level keys and structure, then proceed best-effort with field names found in the data.
1. Detect architecture
Read 2-3 traces and determine if this is a single-agent or multi-agent system:
- Single agent: one
entry, one conversation thread, tool calls from one identityagent_info - Multi-agent / router: look for multiple
entries, routing tool calls (e.g.,agent_info
,transfer_to_*
), sub-conversation arrays, or distinct system prompts per agent identitydelegate_to_*
If multi-agent: document each agent separately (name, role, tools, handoff triggers) and note the routing logic. The remaining steps apply per-agent.
2. Find the system prompt
Use a fallback chain — stop at the first hit:
- Config files — grep for keys:
,system_prompt
,system_message
,instructions
,AGENT_INSTRUCTION
in YAML/JSON/TOML/Python/JS filesSYSTEM_PROMPT - Source code — search for prompt template strings, f-strings, or
calls that build the system message (look in agent implementation files).format() - Trace extraction — read 3 trace files from
:{TRACES_FOLDER}- Check
(tau2-bench format)info.environment_info.policy - Check first message with
in the messages arrayrole: "system" - Check
fields for system-level contentraw_data
- Check
- Not found — if none of the above yields a system prompt, explicitly record
in the output and flag this for the orchestrator. Do not fabricate or guess.SYSTEM_PROMPT_STATUS: NOT_FOUND
When found, record both the prompt content and its source location (file path + line, or trace field path).
3. Extract tool definitions
Two-pass approach: source code first (ground truth), then traces (usage evidence).
Pass 1 — Source code discovery:
- Search for tool/function definition patterns:
,@tool
,@is_tool
, function schema arrays, OpenAPI specs,def tool_
argumentstools=[] - For each tool, extract from source:
- Name
- Input parameters with types and defaults
- Return type / output schema (document the structure, not just "returns a dict")
- Side effects: READ (no state change), WRITE (mutates state), GENERIC (neither)
- Validation rules the tool does NOT enforce (critical — grep for comments like "API does not check", "agent must enforce")
Pass 2 — Trace usage evidence:
- Read ALL traces (if <= 20) or a stratified sample (see step 4 for sampling)
- Extract every unique
from assistant messagestool_calls[].name - Extract every
response to document actual output shapesrole: "tool" - For each tool, record one example input/output pair from traces
Reconcile the two passes:
- Tools in source but NOT in traces = "available but unused" — flag these; they may be relevant for edge cases the agent should handle
- Tools in traces but NOT in source = possible dynamic tools or external APIs — investigate
Output the full tool inventory as a table with columns: Name, Category, Input Schema, Output Schema, Observed in Traces (Y/N), Unvalidated Rules.
4. Find domain documentation
- READMEs, product docs, wiki links
- Policy files (e.g.,
, domain-specific docs)data/*/policy.md - Inline code comments explaining business logic
- Test files that describe expected behavior
- Anything that explains what the agent does and what "success" means for its users
5. Catalogue agent behavior patterns
Trace selection — stratified sampling (do not just grab "5-10 random traces"):
- Count total traces in
. If <= 20, read ALL of them.{TRACES_FOLDER} - If > 20, select a stratified sample:
- Sort by
— include at least 2 per unique reasontermination_reason - Sort by conversation length (message count) — include shortest, longest, and 2 median
- Sort by tool call count — include lowest and highest
- If task outcomes are available (pass/fail), include at least 3 of each
- Target: ~15 traces total, or 30% of the corpus, whichever is larger
- Sort by
For each selected trace, document:
- Function call frequency — which tools are called most, in what order
- Tool call sequences — common tool chains (e.g., get_user -> get_reservation -> cancel)
- Success patterns — what does a thread that accomplishes its goal look like?
- Failure patterns — what does a thread that fails or gets stuck look like?
- Error patterns — what error strings appear in tool outputs? Group by root cause
- Policy violation patterns — where does the agent break its own rules? (e.g., multiple tool calls per turn, acting without confirmation)
- User feedback signals — reverts, ratings, explicit corrections, escalations, stop tokens, transfer tokens
6. Write findings
Write all findings to
eval/stage2_domain_context.md:
# Domain Context ## Trace Format - Framework: [detected framework name] - Key field paths: [e.g., simulation.messages[], info.environment_info.policy] ## Architecture - Type: [single-agent | multi-agent] - [If multi-agent: agent roster with roles and handoff triggers] ## Agent Purpose [1-2 sentence summary of what this agent does] ## System Prompt - **Source**: [file path + line, or trace field path, or NOT_FOUND] - **Status**: [verbatim | reconstructed | not_found] [The system prompt content, or "NOT_FOUND — downstream stages should account for missing system prompt"] ## Tools | Tool | Category | Input Schema | Output Schema | In Traces? | Unvalidated Rules | |------|----------|-------------|---------------|------------|-------------------| | tool_name | READ/WRITE/GENERIC | `{param: type}` | `{field: type}` | Y/N | "API does not check X" | ### Tools available but never called in traces - [tool_name — why it matters] ## Domain Rules [Key business rules, constraints, policies the agent must follow] ## Behavior Patterns ### Success patterns - [pattern 1] ### Failure patterns - [pattern 1] ### Policy violation patterns - [violation with frequency: N/M turns] ### Error patterns | Error | Frequency | Root cause | |-------|-----------|------------| | error string | N traces | cause | ### User feedback signals - [signal 1]
Outputs
eval/stage2_domain_context.md