Agentic-context-engine kayba-pipeline
End-to-end agent evaluation and improvement pipeline. Takes a traces folder and optional HITL flag, then orchestrates sub-agents through 7 stages — each stage is its own skill invoked by a dedicated sub-agent. Trigger when the user says "run the pipeline", "kayba pipeline", "evaluate and fix", "full eval", "analyze traces and fix", or provides a traces folder with intent to improve their agent.
git clone https://github.com/kayba-ai/agentic-context-engine
T=$(mktemp -d) && git clone --depth=1 https://github.com/kayba-ai/agentic-context-engine "$T" && mkdir -p ~/.claude/skills && cp -r "$T/ace/cli/skills/kayba-pipeline" ~/.claude/skills/kayba-ai-agentic-context-engine-kayba-pipeline-1f97e4 && rm -rf "$T"
ace/cli/skills/kayba-pipeline/SKILL.mdkayba-pipeline
End-to-end pipeline: analyze traces → define metrics → build rubric → plan fixes → implement fixes.
Each stage is a separate skill file that can be run independently or as part of this pipeline.
Inputs
The user provides two things:
— path to a directory containing trace JSON filesTRACES_FOLDER
—HITL
ortrue
— whether to pause for human review before implementing fixesfalse
If the user doesn't specify HITL, default to
true (safe default).
Pipeline overview
┌─────────────────────────────────────────────────────────────────────┐ │ Stage 1: Kayba API Analysis → skill: kayba-pipeline:stage-1-api-analysis │ │ Stage 2: Domain Context Gathering → skill: kayba-pipeline:stage-2-domain-context │ │ ─── stages 1 & 2 run in parallel ─── │ │ Stage 3: Metrics & Analysis → skill: kayba-pipeline:stage-3-metrics │ │ Stage 4: Rubric Definition → skill: kayba-pipeline:stage-4-rubric │ │ Stage 5: Action Plan → skill: kayba-pipeline:stage-5-action-plan │ │ Stage 6: HITL Gate → skill: kayba-pipeline:stage-6-hitl │ │ Stage 7: Fix Implementation → skill: kayba-pipeline:stage-7-fixer │ └─────────────────────────────────────────────────────────────────────┘
Orchestration instructions
You are the orchestrator. Your job is to:
- Create the
directory andeval/eval/pipeline_log.md - Spawn sub-agents that invoke stage skills via the Skill tool
- Coordinate stage ordering and handle the HITL gate
Setup
Create
eval/ directory and initialize eval/pipeline_log.md:
# Pipeline Log | Stage | Name | Status | Started | Completed | Notes | |-------|------|--------|---------|-----------|-------| | 1 | Kayba API Analysis | pending | | | | | 2 | Domain Context | pending | | | | | 3 | Metrics & Analysis | pending | | | | | 4 | Rubric Definition | pending | | | | | 5 | Action Plan | pending | | | | | 6 | HITL Gate | pending | | | | | 7 | Fix Implementation | pending | | | |
Stages 1 & 2 — run in parallel
Spawn two sub-agents in parallel using the Agent tool:
Agent 1:
- Name:
api-analyst - Type:
general-purpose - Prompt:
Invoke the skill "kayba-pipeline:stage-1-api-analysis" using the Skill tool. The traces folder is: {TRACES_FOLDER}. Follow the skill instructions completely.
Agent 2:
- Name:
domain-scout - Type:
general-purpose - Prompt:
Invoke the skill "kayba-pipeline:stage-2-domain-context" using the Skill tool. The traces folder is: {TRACES_FOLDER}. Follow the skill instructions completely.
Wait for both to complete before proceeding.
Stage 3 — sequential
Spawn one sub-agent after stages 1 & 2 complete:
- Name:
metric-engineer - Type:
general-purpose - Prompt:
Invoke the skill "kayba-pipeline:stage-3-metrics" using the Skill tool. The traces folder is: {TRACES_FOLDER}. Follow the skill instructions completely — this includes iterating on the metrics until you're satisfied.
Stage 4 — sequential
Spawn one sub-agent after stage 3 completes:
- Name:
rubric-builder - Type:
general-purpose - Prompt:
Invoke the skill "kayba-pipeline:stage-4-rubric" using the Skill tool. Follow the skill instructions completely.
Stage 5 — sequential
Spawn one sub-agent after stage 4 completes:
- Name:
action-planner - Type:
general-purpose - Prompt:
Invoke the skill "kayba-pipeline:stage-5-action-plan" using the Skill tool. Follow the skill instructions completely.
Stage 6 — HITL Gate
If
is HITL
:true
Spawn one sub-agent after stage 5 completes:
- Name:
hitl-reviewer - Type:
general-purpose - Prompt:
Invoke the skill "kayba-pipeline:stage-6-hitl" using the Skill tool. Follow the skill instructions completely. Present the full review to the user and collect their decision before proceeding.
Wait for the sub-agent to complete. Check
eval/stage6_decision.md for the outcome:
- If decision is "Approve all" or "Approve with modifications" — proceed to Stage 7
- If decision is "Reject" — re-run Stage 5 with the user feedback recorded in
, then re-run Stage 6eval/stage6_decision.md - Only proceed to Stage 7 after a clear approval is recorded
If
is HITL
:false
- Skip to Stage 7
- Log "HITL skipped" in
eval/pipeline_log.md
Stage 7 — sequential
Spawn one sub-agent after stage 6 completes (or is skipped):
- Name:
fixer - Type:
general-purpose - Prompt:
Invoke the skill "kayba-pipeline:stage-7-fixer" using the Skill tool. Follow the skill instructions completely.
Error handling
- If any stage fails, log the failure in
with the stage number and erroreval/pipeline_log.md - Do not proceed to dependent stages if a prerequisite failed
- If Stage 1 fails (kayba CLI issues), ask the user whether to proceed without API insights — if yes, skip Stage 1 and have Stage 3 work from domain context + raw traces only
After completion
Update
eval/pipeline_log.md with final status for all stages. Report to the user:
- How many stages completed successfully
- Summary of metrics (from rubric)
- Summary of fixes applied (from changes log)