Agentforce-adlc observing-agentforce

Analyze production Agentforce agent behavior using session traces and Data Cloud. TRIGGER when: user queries STDM session data or Data Cloud trace records; investigates production agent failures, regressions, or performance issues; asks about session traces, conversation logs, or agent metrics; wants to reproduce a reported production issue in preview; runs findSessions or trace analysis queries. DO NOT TRIGGER when: user creates, modifies, or debugs .agent files during development (use developing-agentforce); writes or runs test specs (use testing-agentforce); uses sf agent preview for local development iteration; deploys or publishes agents.

install
source · Clone the upstream repo
git clone https://github.com/SalesforceAIResearch/agentforce-adlc
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/SalesforceAIResearch/agentforce-adlc "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/observing-agentforce" ~/.claude/skills/salesforceairesearch-agentforce-adlc-observing-agentforce && rm -rf "$T"
manifest: skills/observing-agentforce/SKILL.md
source content

Agentforce Observability

Improve Agentforce agents using session trace data and live preview testing.

Three-phase workflow:

  • Observe -- Query STDM sessions from Data Cloud (if available), OR run test suites + preview with local traces as fallback
  • Reproduce -- Use
    sf agent preview
    to simulate problematic conversations live
  • Improve -- Edit the
    .agent
    file directly, validate, publish, verify

Platform Notes

  • Shell examples below use bash syntax. On Windows, use PowerShell equivalents or Git Bash.
  • Replace
    python3
    with
    python
    on Windows.
  • Replace
    /tmp/
    with
    $env:TEMP\
    (PowerShell) or
    %TEMP%\
    (cmd).
  • Replace
    jq
    with
    python -c "import json,sys; ..."
    if jq is not installed.

Routing

Gather these inputs before starting:

  • Org alias (required)
  • Agent API name (required for preview and deploy; ask if not provided)
  • Agent file path (optional) -- path to the
    .agent
    file, typically
    force-app/main/default/aiAuthoringBundles/<AgentName>/<AgentName>.agent
    . Auto-detect if not provided.
  • Session IDs (optional) -- analyze specific sessions; if absent, query last 7 days
  • Days to look back (optional, default 7)

Determine intent from user input:

  • No specific action -> run all three phases: Observe -> surface issues -> ask if user wants to Reproduce and/or Improve
  • "analyze" / "sessions" / "what's wrong" -> Phase 1 only, then suggest next steps
  • "reproduce" / "test" / "preview" -> Phase 2 (run Phase 1 first if no issues in hand)
  • "fix" / "improve" / "update" -> Phase 3 (run Phase 1 first if no issues in hand)

Resolve agent name

Before any STDM query, resolve the user-provided agent name against the org to get the exact

MasterLabel
and
DeveloperName
:

sf data query --json \
  --query "SELECT Id, MasterLabel, DeveloperName FROM GenAiPlannerDefinition WHERE MasterLabel LIKE '%<user-provided-name>%' OR DeveloperName LIKE '%<user-provided-name>%'" \
  -o <org>
  • MasterLabel
    = display name used by STDM
    findSessions
    and Agent Builder UI (e.g. "Order Service")
  • DeveloperName
    = API name with version suffix used in metadata (e.g. "OrderService_v9")
  • The
    --api-name
    flag for
    sf agent preview/activate/publish
    uses
    DeveloperName
    without the
    _vN
    suffix (e.g. "OrderService")

Store these values:

  • AGENT_MASTER_LABEL
    -- for
    findSessions()
    agent filter
  • AGENT_API_NAME
    --
    DeveloperName
    without
    _vN
    suffix, for
    sf agent
    CLI commands
  • PLANNER_ID
    -- the Salesforce record ID for this agent

Locate the .agent file

Step 1 -- Search locally:

find <project-root>/force-app/main/default/aiAuthoringBundles -name "*.agent" 2>/dev/null

If the user provided an agent file path, use that directly. Otherwise, search for files matching

AGENT_API_NAME
.

Step 2 -- If not found locally, retrieve from the org:

sf project retrieve start --json --metadata "AiAuthoringBundle:<AGENT_API_NAME>" -o <org>

Known bug:

sf project retrieve start
creates a double-nested path:
force-app/main/default/main/default/aiAuthoringBundles/...
. Fix it immediately after retrieve:

if [ -d "force-app/main/default/main/default/aiAuthoringBundles" ]; then
  mkdir -p force-app/main/default/aiAuthoringBundles
  cp -r force-app/main/default/main/default/aiAuthoringBundles/* \
    force-app/main/default/aiAuthoringBundles/
  rm -rf force-app/main/default/main
fi

Step 3 -- Validate the retrieved file:

Read the

.agent
file and verify it has proper Agent Script structure:

  • system:
    block with
    instructions:
  • config:
    block with
    developer_name:
  • start_agent
    or
    topic
    blocks with
    reasoning: instructions:
  • Each topic should have distinct
    instructions:
    content (not identical across topics)

Store the resolved path as

AGENT_FILE
for Phase 3.


Phase 0: Discover Data Space

Before running any STDM query, determine the correct Data Cloud Data Space API name.

sf api request rest "/services/data/v63.0/ssot/data-spaces" -o <org>

Note:

sf api request rest
is a beta command -- do not add
--json
(that flag is unsupported and causes an error).

The response shape is:

{
  "dataSpaces": [
    {
      "id": "0vhKh000000g3DjIAI",
      "label": "default",
      "name": "default",
      "status": "Active",
      "description": "Your org's default data space."
    }
  ],
  "totalSize": 1
}

The

name
field is the API name to pass to
AgentforceOptimizeService
.

Decision logic:

  • If the command fails (e.g. 404 or permission error), fall back to
    'default'
    and note it as an assumption.
  • Filter to only
    status: "Active"
    entries.
  • If exactly one active Data Space exists, use it automatically and confirm to the user: "Using Data Space:
    <name>
    ".
  • If multiple active Data Spaces exist, show the list (label + name) and ask the user which to use.

Store the selected

name
value as
DATA_SPACE
for all subsequent steps.

Prerequisite check: STDM DMOs

After deploying the helper class (step 1.0), run a quick probe to verify the STDM Data Model Objects exist in Data Cloud:

sf apex run -o <org> -f /dev/stdin << 'APEX'
ConnectApi.CdpQueryInput qi = new ConnectApi.CdpQueryInput();
qi.sql = 'SELECT ssot__Id__c FROM "ssot__AiAgentSession__dlm" LIMIT 1';
try {
    ConnectApi.CdpQueryOutputV2 out = ConnectApi.CdpQuery.queryAnsiSqlV2(qi, '<DATA_SPACE>');
    System.debug('STDM_CHECK:OK rows=' + (out.data != null ? out.data.size() : 0));
} catch (Exception e) {
    System.debug('STDM_CHECK:FAIL ' + e.getMessage());
}
APEX

If

STDM_CHECK:FAIL
: STDM is not activated. Inform the user and switch to Phase 1-ALT:

STDM (Session Trace Data Model) is not available in this org. To enable: Setup -> Data Cloud -> Data Streams and verify "Agentforce Activity" is active. Proceeding with fallback: test suites + local traces.

If

STDM_CHECK:OK
, proceed to Phase 1 (STDM path).


Phase 1-ALT: Observe Without STDM (Fallback Path)

When STDM is not available, use test suites and

sf agent preview --authoring-bundle
with local trace analysis.

Data sourceWhen to useProsCons
STDM (Phase 1)Historical production analysisReal user data, volumeRequires Data Cloud, 15-min lag
Test suites + local traces (Phase 1-ALT)Dev iteration, orgs without STDMInstant, full LLM prompt, variable statePreview only, no real user data

1-ALT.1 Run existing test suite (if available)

sf agent test list --json -o <org>
sf agent test run --json --api-name <TestSuiteName> --wait 10 --result-format json -o <org> | tee /tmp/test_run.json
JOB_ID=$(python3 -c "import json; print(json.load(open('/tmp/test_run.json'))['result']['runId'])")
sf agent test results --json --job-id "$JOB_ID" --result-format json -o <org>

1-ALT.2 Derive test utterances from .agent file (if no test suite)

If no test suite exists, derive utterances: one per non-entry topic (from

description:
keywords), one per key action, one guardrail test, one multi-turn test.

1-ALT.3 Preview with
--authoring-bundle
(local traces)

Run each test utterance through preview to generate local trace files:

sf agent preview start --json --authoring-bundle <BundleName> -o <org> | tee /tmp/preview_start.json
SESSION_ID=$(python3 -c "import json; print(json.load(open('/tmp/preview_start.json'))['result']['sessionId'])")

sf agent preview send --json --session-id "$SESSION_ID" --authoring-bundle <BundleName> \
  --utterance "$UTT" -o <org> | tee /tmp/preview_response.json

sf agent preview end --json --session-id "$SESSION_ID" --authoring-bundle <BundleName> -o <org>

Trace file location:

.sfdx/agents/{BundleName}/sessions/{sessionId}/traces/{planId}.json

1-ALT.4 Local trace diagnosis

Issue typeTrace command
Topic misroute
jq -r '.plan[] | select(.type=="NodeEntryStateStep") | .data.agent_name' "$TRACE"
Action not called
jq -r '.plan[] | select(.type=="EnabledToolsStep") | .data.enabled_tools[]' "$TRACE"
LOW adherence
jq -r '.plan[] | select(.type=="ReasoningStep") | {category, reason}' "$TRACE"
Variable capture fail
jq -r '.plan[] | select(.type=="VariableUpdateStep") | .data.variable_updates[]' "$TRACE"
Vague instructions
jq -r '.plan[] | select(.type=="LLMStep") | .data.messages_sent[0].content' "$TRACE"

DefaultTopic trace quirk: With

--authoring-bundle
, the root
.topic
field often shows
"DefaultTopic"
even when routing works. Always use
NodeEntryStateStep.data.agent_name
for the real topic chain.

Entry answering directly (SMALL_TALK pattern): If

start_agent
trace shows
SMALL_TALK
grounding and transition tools visible but none invoked, add "You are a router only. Do NOT answer questions directly." to
start_agent
instructions.

1-ALT.5 Classify and present

Classify issues using the categories in

references/issue-classification.md
. After presenting findings, automatically proceed to agent config evidence analysis.


Phase 1: Observe -- Query STDM

Full STDM query details, Apex service deployment, and response parsing: see

references/stdm-queries.md

1.0 Deploy helper class (once per org)

Deploy

AgentforceOptimizeService
Apex class to the org. Check if already deployed first:

sf data query --json --query "SELECT Id, Name FROM ApexClass WHERE Name = 'AgentforceOptimizeService'" -o <org>

If not deployed, copy from skill directory and deploy. See

references/stdm-queries.md
for full steps.

1.1 Find sessions

Query recent sessions using

findSessions()
. Parse
DEBUG|STDM_RESULT:
from the Apex debug log. If
findSessions
returns empty, switch to Phase 1-ALT.

1.2 Get conversation details

Use

getMultipleConversationDetails()
for up to 5 sessions (most recent first). Returns turn-by-turn data with messages, steps, topics, and action results.

1.2b Get LLM prompt/response (optional)

When LOW adherence detected, use

getLlmStepDetails()
to get the actual LLM prompt and response.

1.2c Get aggregated metrics (recommended first step)

Use

getAggregatedMetrics()
for high-level health dashboard: session rates, top intents, quality distribution, RAG averages.

1.2d Get moment insights (per-session detail)

Use

getMomentInsights()
for intent summaries, quality scores (1-5), and retriever metrics per session.

1.2e Run observability queries (RAG deep-dive)

Use

runObservabilityQuery()
for targeted RAG analysis: KnowledgeGap, Hallucination, RetrievalQuality, AnswerRelevancy, Leaderboard.

1.3 Reconstruct conversations

Render turn-by-turn timeline from

ConversationData
JSON for each session.

1.4 Identify issues

Full issue pattern table and classification categories: see

references/issue-classification.md

Check each session for: action errors, topic misroutes, missing actions, wrong inputs, variable capture failures, no transitions, slow actions, LOW adherence, abandoned sessions, dead topics, publish drift, dead hub anti-pattern, entry answering directly, and safety issues.

Priority: P1 = action errors, misroutes, LOW adherence; P2 = missing actions, variable bugs, knowledge gaps; P3 = performance, abandoned sessions.

1.5 Present findings and agent config evidence

Present sessions analyzed, issues grouped by root cause category, and uplift estimate. Then automatically proceed to analyze the

.agent
file to confirm root causes.

Full structural analysis checks, cross-reference procedures, and publish drift detection: see

references/issue-classification.md

Retrieve the

.agent
file from the org, run automated checks (topic count vs action blocks, dead hub detection, orphan actions, cross-topic variable dependencies), and cross-reference STDM symptoms against the file structure.


Phase 2: Reproduce -- Live Preview

Full preview procedures, trace diagnosis commands, and classification criteria: see

references/reproduce-reference.md

Build one test scenario per confirmed issue from Phase 1. Run each through

sf agent preview
with
--authoring-bundle
(generates local traces). Run each scenario 3 times and classify:

VerdictCriteria
[CONFIRMED]
Same failure in 3/3 runs
[INTERMITTENT]
Failure in 1-2 of 3 runs
[NOT REPRODUCED]
Passes in 3/3 runs

Only

[CONFIRMED]
and
[INTERMITTENT]
issues proceed to Phase 3.

Key commands:

sf agent preview start --json --authoring-bundle <Name> -o <org>
sf agent preview send --json --session-id "$SID" --utterance "<text>" --authoring-bundle <Name> -o <org>
sf agent preview end --json --session-id "$SID" --authoring-bundle <Name> -o <org>

Trace location:

.sfdx/agents/{Name}/sessions/{sessionId}/traces/{planId}.json


Phase 3: Improve -- Edit .agent File Directly

Full procedures for pre-flight checks, fix mapping, instruction principles, regression prevention, deployment chain, verification, safety re-verification, and test case creation: see

references/improve-reference.md

3.0 Pre-flight

Verify all action targets exist and are registered in the org before editing. If targets are missing, present options: deploy stubs, remove actions, register via UI, or proceed with routing-only fixes.

3.1-3.3 Map issue, edit, and follow instruction principles

Map each confirmed issue to a fix location in the

.agent
file (description, instructions, actions, bindings, transitions). Use the Edit tool for targeted changes. Follow instruction principles: name actions explicitly, state pre-conditions, scope tightly, keep persona in
system:
only.

3.4 Regression prevention

Establish baseline before editing. Make minimal edits. Test immediately after each edit. One fix per publish cycle. Check cross-topic dependencies. Test adjacent topics.

3.5 Apply fixes

Read the

.agent
file, edit with the Edit tool (tabs for indentation), show the diff.

3.6 Validate, deploy, publish, activate

# Validate (dry run)
sf agent validate authoring-bundle --json --api-name <AGENT_API_NAME> -o <org>

# Publish (compile + deploy + activate)
sf agent publish authoring-bundle --json --api-name <AGENT_API_NAME> -o <org>

If publish fails, use deploy + activate fallback (note: incomplete -- does not propagate

reasoning: actions:
to live metadata).

3.7 Verify

Run Phase 2 scenarios post-fix. Check trace for correct routing, grounding, tools, and variables. After 24-48 hours, re-run Phase 1 to compare against baseline.

3.7b Safety re-verification (required)

Re-run safety review (

Section 15 of /developing-agentforce
) on the modified
.agent
file. Revert any changes that introduce BLOCK findings.

3.8 Update Testing Center test cases

Create regression test cases from confirmed issues in Testing Center YAML format. Deploy with

sf agent test create
and verify all previously-broken scenarios pass.


Reference Files

ReferenceContents
references/stdm-queries.md
STDM query procedures, Apex service deployment, response parsing
references/issue-classification.md
Issue pattern table, root cause categories, structural analysis checks
references/reproduce-reference.md
Phase 2 preview procedures, trace diagnosis, classification criteria
references/improve-reference.md
Phase 3 editing, deployment chain, verification, safety, test cases
references/stdm-schema.md
DMO field schemas, data hierarchy, quality notes, agent name resolution