Metabase add-tracing

Add OpenTelemetry tracing spans to Clojure code following Metabase tracing conventions. Use when instrumenting backend code with trace coverage.

install
source · Clone the upstream repo
git clone https://github.com/metabase/metabase
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/metabase/metabase "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/add-tracing" ~/.claude/skills/metabase-metabase-add-tracing && rm -rf "$T"
manifest: .claude/skills/add-tracing/SKILL.md
source content

Add Tracing Spans to Clojure Code

This skill helps you add OpenTelemetry (OTel) tracing spans to the Metabase backend codebase using the custom

tracing/with-span
macro.

Reference Files

  • src/metabase/tracing/core.clj
    -
    with-span
    macro, group registry, SDK lifecycle,
    best-effort-sanitize-sql
    , Pyroscope integration
  • src/metabase/task/impl.clj
    -
    defjob
    macro that wraps Quartz jobs with root spans
  • .clj-kondo/config/modules/config.edn
    - Module boundary configuration

Module Architecture

The tracing module has a deliberately minimal API surface. Only 2 namespaces are public (listed in

:api
in the module config):

NamespaceRoleStatus
tracing.core
Primary API:
with-span
, groups, SDK lifecycle, Pyroscope, MDC,
best-effort-sanitize-sql
Public API
tracing.init
Side-effect loader for
quartz
and
settings
Public API (init convention)
tracing.attributes
best-effort-sanitize-sql
implementation (re-exported via
tracing.core
)
Internal
tracing.settings
Setting definitions (
MB_TRACING_*
env vars)
Internal
tracing.quartz
Quartz JDBC proxy + JobListenerInternal

Rules:

  • Only require
    [metabase.tracing.core :as tracing]
    from outside the module.
    tracing/best-effort-sanitize-sql
    and all other public functions are available from this single namespace.
  • Do not add new API namespaces. Add new public functions to
    tracing.core
    instead.
  • Do not require internal namespaces (
    tracing.attributes
    ,
    tracing.settings
    ,
    tracing.quartz
    ) from outside the module.
  • :uses :any
    on the
    core
    module does NOT bypass the target module's
    :api
    check — internal namespaces are still enforced.

Cyclic Dependency Avoidance

tracing/core.clj
is required by many modules across the codebase. It must NOT compile-time require
tracing.settings
, as this creates transitive cyclic load dependencies (e.g.,
settings/core -> tracing/settings -> tracing/core -> events/impl -> events/core
).

Instead,

tracing/core.clj
uses
requiring-resolve
for settings access:

;; CORRECT — lazy runtime resolution, no compile-time dependency
((requiring-resolve 'metabase.tracing.settings/tracing-enabled))

;; WRONG — creates cyclic load dependency
(require '[metabase.tracing.settings :as settings])
(settings/tracing-enabled)

External library namespaces (clj-otel API, SDK, exporters) are safe to require normally — they don't participate in Metabase namespace cycles.

Important:

requiring-resolve
must use literal quoted symbols. Kondo hooks validate that
required-namespaces
are all simple symbols, so dynamic construction fails:

;; CORRECT — literal quoted symbol
(requiring-resolve 'metabase.tracing.settings/tracing-endpoint)

;; WRONG — kondo hook rejects this: "Assert failed: (every? simple-symbol? required-namespaces)"
(requiring-resolve (symbol "metabase.tracing.settings" "tracing-endpoint"))

Quick Checklist

When adding tracing spans:

  • Module has
    tracing
    in its
    :uses
    set in
    .clj-kondo/config/modules/config.edn
  • Added
    [metabase.tracing.core :as tracing]
    to ns requires (alphabetically sorted)
  • Span wraps a meaningful I/O boundary (not pure computation)
  • Group matches the domain (check
    src/metabase/tracing/core.clj
    for registered groups; add a new one if none fit)
  • Span name follows dot-notation convention (
    "domain.subsystem.operation"
    )
  • Attributes use namespaced keywords (
    :search/query-length
    ,
    :db/id
    )
  • No sensitive data in attributes (use
    best-effort-sanitize-sql
    for HoneySQL, never raw SQL)
  • No new tracing namespaces created (add to
    tracing.core
    instead)
  • No
    DO_NOT_ADD_NEW_FILES_HERE.txt
    violations in the target directory
  • Run
    clj-kondo --lint <files>
    to verify 0 errors, 0 warnings
  • Add or update tests in the corresponding
    test/
    path (see Testing section below)
  • Run tests:
    clojure -X:dev:test :only <test-ns>

The
with-span
Macro

(tracing/with-span group span-name attrs & body)
  • group - A keyword selecting which trace group this span belongs to (e.g.,
    :tasks
    ,
    :sync
    )
  • span-name - A string identifying the span in traces (e.g.,
    "search.execute"
    )
  • attrs - A map of span attributes (e.g.,
    {:db/id 42}
    )
  • body - The code to execute inside the span

When disabled: zero overhead -- single atom deref + boolean check, body runs directly. When enabled: creates OTel span AND injects

trace_id
/
span_id
into Log4j2 MDC for log-to-trace correlation.

Trace Groups

Groups are registered in

src/metabase/tracing/core.clj
. Check that file for the current list. The general rule: match the group to the domain, not the call site. If code runs inside a Quartz job but is logically search work, use
:search
, not
:tasks
.

To add a new group:

;; In src/metabase/tracing/core.clj
(register-group! :my-domain "Description of what this covers")

Users enable groups via

MB_TRACING_GROUPS=tasks,search,sync
(comma-separated, or
"all"
).

Naming Conventions

Span Names

Use dot-separated hierarchical names:

"domain.subsystem.operation"
. The domain prefix should match the group name:

search.execute              -- `:search` group
sync.fingerprint.table      -- `:sync` group
task.session-cleanup.delete -- `:tasks` group
db-app.collection-items     -- `:db-app` group

Attributes

Use namespaced keywords. The namespace groups related attributes:

:db/id              -- Database ID (integer)
:db/engine          -- Database engine name (string)
:db/statement       -- Sanitized SQL (string, via best-effort-sanitize-sql)
:search/engine      -- Search engine name (string)
:search/query-length -- Query string length (integer)
:sync/table         -- Table name (string)
:sync/step          -- Sync step name (string)
:task/name          -- Task name (string)
:http/method        -- HTTP method (string)
:http/url           -- Request URL (string)

Invent new namespaced attributes as needed (e.g.,

:pulse/id
,
:transform/count
). Keep values as primitives (strings, numbers, booleans) -- no maps or collections.

Step-by-Step: Adding a Span

1. Check module boundaries

Look up the module for your namespace in

.clj-kondo/config/modules/config.edn
. If
tracing
is not in the module's
:uses
set, add it (keep alphabetically sorted):

my-module
{:team "MyTeam"
 :uses #{analytics config tracing util}}

2. Add the require

(ns metabase.my-module.thing
  (:require
   [metabase.tracing.core :as tracing]
   [metabase.util :as u]))

best-effort-sanitize-sql
is available from
tracing.core
— no additional require needed.

3. Identify the I/O boundary

Only wrap code at meaningful I/O boundaries:

DO trace:

  • External API calls (embedding APIs, metabot, webhooks)
  • Database queries (both app DB and user DB)
  • Network requests (HTTP calls to external services)
  • Heavy batch processing (batch indexing, batch embedding)
  • Top-level orchestration functions that coordinate multiple sub-operations

DO NOT trace:

  • Pure computation (sorting, filtering, mapping)
  • Simple single-row lookups (
    t2/select-one :model/Setting :key k
    )
  • Every function in a call chain (only boundaries matter)
  • Trivial operations (string formatting, hash calculations)

4. Wrap with
with-span

;; Simple span (no attributes needed)
(tracing/with-span :search "search.init-index" {}
  (do-expensive-thing))

;; Span with static attributes
(tracing/with-span :sync "sync.fingerprint.table"
                   {:db/id (:db_id table)
                    :sync/table (:name table)}
  (fingerprint-fields! table fields))

;; Span with computed attributes
(tracing/with-span :search "search.execute"
                   {:search/engine       (name (:search-engine ctx))
                    :search/query-length (count (:search-string ctx))}
  (search.engine/results ctx))

;; Span with sanitized SQL (for dynamic HoneySQL queries)
(let [hsql {:delete-from [(t2/table-name :model/Session)]
            :where [:< :created_at oldest-allowed]}]
  (tracing/with-span :tasks "task.session-cleanup.delete"
                     {:db/statement (tracing/best-effort-sanitize-sql hsql)}
    (t2/query-one hsql)))

;; Sub-spans breaking a function into I/O phases
(let [embedding (tracing/with-span :search "search.semantic.embedding"
                                   {:search.semantic/provider (:provider model)}
                  (get-embedding model search-string))
      results   (tracing/with-span :search "search.semantic.db-query" {}
                  (into [] xform reducible))]
  (process results))

;; Per-item iteration spans
(doseq [e (search.engine/active-engines)]
  (tracing/with-span :search "search.ingestion.update" {:search/engine (name e)}
    (search.engine/update! e batch)))

5. Add tests

Create or update tests in the corresponding

test/
path. Follow the patterns in existing tracing tests:

  • Reference tests:
    test/metabase/tracing/quartz_test.clj
    ,
    test/metabase/server/middleware/trace_test.clj
  • Use
    tracing/init-enabled-groups!
    /
    tracing/shutdown-groups!
    with
    try
    /
    finally
    to manage group lifecycle
  • Test both enabled and disabled paths (verify zero overhead when group is off)
  • Use
    reify
    mocks for Java interfaces (Connection, PreparedStatement, JobListener, etc.)
  • Add
    (set! *warn-on-reflection* true)
    and type-hint proxy/reify calls to avoid reflection warnings
(deftest my-span-enabled-test
  (testing "when group is enabled, span is created"
    (try
      (tracing/init-enabled-groups! "my-group" "INFO")
      ;; ... test that span behavior occurs ...
      (finally
        (tracing/shutdown-groups!)))))

(deftest my-span-disabled-test
  (testing "when group is disabled, code runs without tracing"
    (tracing/shutdown-groups!)
    ;; ... test that code still works, no wrapping applied ...
    ))

6. Lint and run tests

# Lint modified source and test files — expect 0 errors, 0 warnings
clj-kondo --lint path/to/modified/file.clj path/to/test/file.clj

# Run tests (requires Java 21+)
  clojure -X:dev:test :only my-ns.test-ns

Expect: all tests pass, 0 failures, 0 errors, no reflection warnings from your files.

Sanitizing SQL for Attributes

When including SQL in span attributes, always use

tracing/best-effort-sanitize-sql
. This converts HoneySQL maps to parameterized SQL strings where values become
?
placeholders -- no data leaks.

(let [hsql {:delete-from [:core_session]
            :where [:< :created_at some-timestamp]}]
  (tracing/with-span :tasks "task.cleanup.delete"
                     {:db/statement (tracing/best-effort-sanitize-sql hsql)}
    (t2/query-one hsql)))
;; Trace attribute: db/statement = "DELETE FROM core_session WHERE created_at < ?"

Rules:

  • Never put raw SQL strings or user-provided values in attributes
  • Use
    best-effort-sanitize-sql
    only for app DB (HoneySQL) queries
  • For external/user DB queries, trace only timing and counts, not SQL content

Defjob and Root Spans

The

defjob
macro in
metabase.task.impl
automatically wraps every Quartz job with a
:tasks
root span:

(task/defjob ^{DisallowConcurrentExecution true} SessionCleanup [_]
  (cleanup-sessions!))
;; Automatically creates span: "task.SessionCleanup" {:task/name "SessionCleanup"}

You do NOT need a root span inside

defjob
bodies. Add child spans for I/O inside the job.

For code on plain

Thread
s (not Quartz), add the root span manually:

(defn init! []
  (tracing/with-span :search "search.task.init" {}
    (search/init-index!)))

What NOT to Do

Span Usage Mistakes

;; WRONG - pure computation, no I/O
(tracing/with-span :search "search.format-results" {}
  (map format-result results))

;; WRONG - trivial single-row lookup
(tracing/with-span :db-app "db-app.get-setting" {}
  (t2/select-one :model/Setting :key "my-setting"))

;; WRONG - raw SQL in attributes (data leak)
(tracing/with-span :tasks "task.cleanup" {:db/statement raw-sql-string}
  (execute! raw-sql-string))

;; WRONG - wrong group (search work should use :search, not :tasks)
(tracing/with-span :tasks "search.execute" {} ...)

;; WRONG - redundant nesting (do-search already has a span)
(tracing/with-span :search "search.process" {}
  (let [results (do-search ctx)]
    (tracing/with-span :search "search.format" {}
      (format-results results))))

Architecture Mistakes

;; WRONG - creating a new tracing namespace
(ns metabase.tracing.my-feature ...)

;; WRONG - requiring internal tracing namespaces from outside the module
(ns metabase.my-module.thing
  (:require [metabase.tracing.attributes :as trace-attrs]      ;; internal!
            [metabase.tracing.settings :as tracing.settings]   ;; internal!
            [metabase.tracing.quartz :as tracing.quartz]))     ;; internal!

;; WRONG - adding compile-time requires to tracing/core.clj for settings or SDK
;; This creates cyclic load dependencies because tracing/core is widely required
(ns metabase.tracing.core
  (:require [metabase.tracing.settings :as settings]))  ;; causes cycle!

;; WRONG - dynamic symbol construction with requiring-resolve (kondo rejects it)
(requiring-resolve (symbol "metabase.tracing.settings" "tracing-enabled"))

Configuration

All settings are env-var-only (defined in

src/metabase/tracing/settings.clj
):

# Core
MB_TRACING_ENABLED=true              # Enable tracing (default: false)
MB_TRACING_ENDPOINT=host:4317        # OTLP collector endpoint (default: http://localhost:4317)
MB_TRACING_GROUPS=tasks,search,sync  # Comma-separated groups or "all" (default: all)
MB_TRACING_SERVICE_NAME=metabase     # Service name in traces (default: hostname)
MB_TRACING_LOG_LEVEL=DEBUG           # Log threshold for traced threads: TRACE/DEBUG/INFO (default: INFO)

# Batch span processor tuning
MB_TRACING_MAX_QUEUE_SIZE=2048       # Max spans queued for export; drops when full (default: 2048)
MB_TRACING_EXPORT_TIMEOUT_MS=10000   # Max wait for batch export to complete (default: 10000)
MB_TRACING_SCHEDULE_DELAY_MS=5000    # Delay between consecutive batch exports (default: 5000)