Metabase add-tracing
Add OpenTelemetry tracing spans to Clojure code following Metabase tracing conventions. Use when instrumenting backend code with trace coverage.
git clone https://github.com/metabase/metabase
T=$(mktemp -d) && git clone --depth=1 https://github.com/metabase/metabase "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/add-tracing" ~/.claude/skills/metabase-metabase-add-tracing && rm -rf "$T"
.claude/skills/add-tracing/SKILL.mdAdd Tracing Spans to Clojure Code
This skill helps you add OpenTelemetry (OTel) tracing spans to the Metabase backend codebase using the custom
tracing/with-span macro.
Reference Files
-src/metabase/tracing/core.clj
macro, group registry, SDK lifecycle,with-span
, Pyroscope integrationbest-effort-sanitize-sql
-src/metabase/task/impl.clj
macro that wraps Quartz jobs with root spansdefjob
- Module boundary configuration.clj-kondo/config/modules/config.edn
Module Architecture
The tracing module has a deliberately minimal API surface. Only 2 namespaces are public (listed in
:api in the module config):
| Namespace | Role | Status |
|---|---|---|
| Primary API: , groups, SDK lifecycle, Pyroscope, MDC, | Public API |
| Side-effect loader for and | Public API (init convention) |
| implementation (re-exported via ) | Internal |
| Setting definitions ( env vars) | Internal |
| Quartz JDBC proxy + JobListener | Internal |
Rules:
- Only require
from outside the module.[metabase.tracing.core :as tracing]
and all other public functions are available from this single namespace.tracing/best-effort-sanitize-sql - Do not add new API namespaces. Add new public functions to
instead.tracing.core - Do not require internal namespaces (
,tracing.attributes
,tracing.settings
) from outside the module.tracing.quartz
on the:uses :any
module does NOT bypass the target module'score
check — internal namespaces are still enforced.:api
Cyclic Dependency Avoidance
tracing/core.clj is required by many modules across the codebase. It must NOT compile-time require tracing.settings, as this creates transitive cyclic load dependencies (e.g., settings/core -> tracing/settings -> tracing/core -> events/impl -> events/core).
Instead,
tracing/core.clj uses requiring-resolve for settings access:
;; CORRECT — lazy runtime resolution, no compile-time dependency ((requiring-resolve 'metabase.tracing.settings/tracing-enabled)) ;; WRONG — creates cyclic load dependency (require '[metabase.tracing.settings :as settings]) (settings/tracing-enabled)
External library namespaces (clj-otel API, SDK, exporters) are safe to require normally — they don't participate in Metabase namespace cycles.
Important:
requiring-resolve must use literal quoted symbols. Kondo hooks validate that required-namespaces are all simple symbols, so dynamic construction fails:
;; CORRECT — literal quoted symbol (requiring-resolve 'metabase.tracing.settings/tracing-endpoint) ;; WRONG — kondo hook rejects this: "Assert failed: (every? simple-symbol? required-namespaces)" (requiring-resolve (symbol "metabase.tracing.settings" "tracing-endpoint"))
Quick Checklist
When adding tracing spans:
- Module has
in itstracing
set in:uses.clj-kondo/config/modules/config.edn - Added
to ns requires (alphabetically sorted)[metabase.tracing.core :as tracing] - Span wraps a meaningful I/O boundary (not pure computation)
- Group matches the domain (check
for registered groups; add a new one if none fit)src/metabase/tracing/core.clj - Span name follows dot-notation convention (
)"domain.subsystem.operation" - Attributes use namespaced keywords (
,:search/query-length
):db/id - No sensitive data in attributes (use
for HoneySQL, never raw SQL)best-effort-sanitize-sql - No new tracing namespaces created (add to
instead)tracing.core - No
violations in the target directoryDO_NOT_ADD_NEW_FILES_HERE.txt - Run
to verify 0 errors, 0 warningsclj-kondo --lint <files> - Add or update tests in the corresponding
path (see Testing section below)test/ - Run tests:
clojure -X:dev:test :only <test-ns>
The with-span
Macro
with-span(tracing/with-span group span-name attrs & body)
- group - A keyword selecting which trace group this span belongs to (e.g.,
,:tasks
):sync - span-name - A string identifying the span in traces (e.g.,
)"search.execute" - attrs - A map of span attributes (e.g.,
){:db/id 42} - body - The code to execute inside the span
When disabled: zero overhead -- single atom deref + boolean check, body runs directly. When enabled: creates OTel span AND injects
trace_id/span_id into Log4j2 MDC for log-to-trace correlation.
Trace Groups
Groups are registered in
src/metabase/tracing/core.clj. Check that file for the current list. The general rule: match the group to the domain, not the call site. If code runs inside a Quartz job but is logically search work, use :search, not :tasks.
To add a new group:
;; In src/metabase/tracing/core.clj (register-group! :my-domain "Description of what this covers")
Users enable groups via
MB_TRACING_GROUPS=tasks,search,sync (comma-separated, or "all").
Naming Conventions
Span Names
Use dot-separated hierarchical names:
"domain.subsystem.operation". The domain prefix should match the group name:
search.execute -- `:search` group sync.fingerprint.table -- `:sync` group task.session-cleanup.delete -- `:tasks` group db-app.collection-items -- `:db-app` group
Attributes
Use namespaced keywords. The namespace groups related attributes:
:db/id -- Database ID (integer) :db/engine -- Database engine name (string) :db/statement -- Sanitized SQL (string, via best-effort-sanitize-sql) :search/engine -- Search engine name (string) :search/query-length -- Query string length (integer) :sync/table -- Table name (string) :sync/step -- Sync step name (string) :task/name -- Task name (string) :http/method -- HTTP method (string) :http/url -- Request URL (string)
Invent new namespaced attributes as needed (e.g.,
:pulse/id, :transform/count). Keep values as primitives (strings, numbers, booleans) -- no maps or collections.
Step-by-Step: Adding a Span
1. Check module boundaries
Look up the module for your namespace in
.clj-kondo/config/modules/config.edn. If tracing is not in the module's :uses set, add it (keep alphabetically sorted):
my-module {:team "MyTeam" :uses #{analytics config tracing util}}
2. Add the require
(ns metabase.my-module.thing (:require [metabase.tracing.core :as tracing] [metabase.util :as u]))
best-effort-sanitize-sql is available from tracing.core — no additional require needed.
3. Identify the I/O boundary
Only wrap code at meaningful I/O boundaries:
DO trace:
- External API calls (embedding APIs, metabot, webhooks)
- Database queries (both app DB and user DB)
- Network requests (HTTP calls to external services)
- Heavy batch processing (batch indexing, batch embedding)
- Top-level orchestration functions that coordinate multiple sub-operations
DO NOT trace:
- Pure computation (sorting, filtering, mapping)
- Simple single-row lookups (
)t2/select-one :model/Setting :key k - Every function in a call chain (only boundaries matter)
- Trivial operations (string formatting, hash calculations)
4. Wrap with with-span
with-span;; Simple span (no attributes needed) (tracing/with-span :search "search.init-index" {} (do-expensive-thing)) ;; Span with static attributes (tracing/with-span :sync "sync.fingerprint.table" {:db/id (:db_id table) :sync/table (:name table)} (fingerprint-fields! table fields)) ;; Span with computed attributes (tracing/with-span :search "search.execute" {:search/engine (name (:search-engine ctx)) :search/query-length (count (:search-string ctx))} (search.engine/results ctx)) ;; Span with sanitized SQL (for dynamic HoneySQL queries) (let [hsql {:delete-from [(t2/table-name :model/Session)] :where [:< :created_at oldest-allowed]}] (tracing/with-span :tasks "task.session-cleanup.delete" {:db/statement (tracing/best-effort-sanitize-sql hsql)} (t2/query-one hsql))) ;; Sub-spans breaking a function into I/O phases (let [embedding (tracing/with-span :search "search.semantic.embedding" {:search.semantic/provider (:provider model)} (get-embedding model search-string)) results (tracing/with-span :search "search.semantic.db-query" {} (into [] xform reducible))] (process results)) ;; Per-item iteration spans (doseq [e (search.engine/active-engines)] (tracing/with-span :search "search.ingestion.update" {:search/engine (name e)} (search.engine/update! e batch)))
5. Add tests
Create or update tests in the corresponding
test/ path. Follow the patterns in existing tracing tests:
- Reference tests:
,test/metabase/tracing/quartz_test.cljtest/metabase/server/middleware/trace_test.clj - Use
/tracing/init-enabled-groups!
withtracing/shutdown-groups!
/try
to manage group lifecyclefinally - Test both enabled and disabled paths (verify zero overhead when group is off)
- Use
mocks for Java interfaces (Connection, PreparedStatement, JobListener, etc.)reify - Add
and type-hint proxy/reify calls to avoid reflection warnings(set! *warn-on-reflection* true)
(deftest my-span-enabled-test (testing "when group is enabled, span is created" (try (tracing/init-enabled-groups! "my-group" "INFO") ;; ... test that span behavior occurs ... (finally (tracing/shutdown-groups!))))) (deftest my-span-disabled-test (testing "when group is disabled, code runs without tracing" (tracing/shutdown-groups!) ;; ... test that code still works, no wrapping applied ... ))
6. Lint and run tests
# Lint modified source and test files — expect 0 errors, 0 warnings clj-kondo --lint path/to/modified/file.clj path/to/test/file.clj # Run tests (requires Java 21+) clojure -X:dev:test :only my-ns.test-ns
Expect: all tests pass, 0 failures, 0 errors, no reflection warnings from your files.
Sanitizing SQL for Attributes
When including SQL in span attributes, always use
tracing/best-effort-sanitize-sql. This converts HoneySQL maps to parameterized SQL strings where values become ? placeholders -- no data leaks.
(let [hsql {:delete-from [:core_session] :where [:< :created_at some-timestamp]}] (tracing/with-span :tasks "task.cleanup.delete" {:db/statement (tracing/best-effort-sanitize-sql hsql)} (t2/query-one hsql))) ;; Trace attribute: db/statement = "DELETE FROM core_session WHERE created_at < ?"
Rules:
- Never put raw SQL strings or user-provided values in attributes
- Use
only for app DB (HoneySQL) queriesbest-effort-sanitize-sql - For external/user DB queries, trace only timing and counts, not SQL content
Defjob and Root Spans
The
defjob macro in metabase.task.impl automatically wraps every Quartz job with a :tasks root span:
(task/defjob ^{DisallowConcurrentExecution true} SessionCleanup [_] (cleanup-sessions!)) ;; Automatically creates span: "task.SessionCleanup" {:task/name "SessionCleanup"}
You do NOT need a root span inside
defjob bodies. Add child spans for I/O inside the job.
For code on plain
Threads (not Quartz), add the root span manually:
(defn init! [] (tracing/with-span :search "search.task.init" {} (search/init-index!)))
What NOT to Do
Span Usage Mistakes
;; WRONG - pure computation, no I/O (tracing/with-span :search "search.format-results" {} (map format-result results)) ;; WRONG - trivial single-row lookup (tracing/with-span :db-app "db-app.get-setting" {} (t2/select-one :model/Setting :key "my-setting")) ;; WRONG - raw SQL in attributes (data leak) (tracing/with-span :tasks "task.cleanup" {:db/statement raw-sql-string} (execute! raw-sql-string)) ;; WRONG - wrong group (search work should use :search, not :tasks) (tracing/with-span :tasks "search.execute" {} ...) ;; WRONG - redundant nesting (do-search already has a span) (tracing/with-span :search "search.process" {} (let [results (do-search ctx)] (tracing/with-span :search "search.format" {} (format-results results))))
Architecture Mistakes
;; WRONG - creating a new tracing namespace (ns metabase.tracing.my-feature ...) ;; WRONG - requiring internal tracing namespaces from outside the module (ns metabase.my-module.thing (:require [metabase.tracing.attributes :as trace-attrs] ;; internal! [metabase.tracing.settings :as tracing.settings] ;; internal! [metabase.tracing.quartz :as tracing.quartz])) ;; internal! ;; WRONG - adding compile-time requires to tracing/core.clj for settings or SDK ;; This creates cyclic load dependencies because tracing/core is widely required (ns metabase.tracing.core (:require [metabase.tracing.settings :as settings])) ;; causes cycle! ;; WRONG - dynamic symbol construction with requiring-resolve (kondo rejects it) (requiring-resolve (symbol "metabase.tracing.settings" "tracing-enabled"))
Configuration
All settings are env-var-only (defined in
src/metabase/tracing/settings.clj):
# Core MB_TRACING_ENABLED=true # Enable tracing (default: false) MB_TRACING_ENDPOINT=host:4317 # OTLP collector endpoint (default: http://localhost:4317) MB_TRACING_GROUPS=tasks,search,sync # Comma-separated groups or "all" (default: all) MB_TRACING_SERVICE_NAME=metabase # Service name in traces (default: hostname) MB_TRACING_LOG_LEVEL=DEBUG # Log threshold for traced threads: TRACE/DEBUG/INFO (default: INFO) # Batch span processor tuning MB_TRACING_MAX_QUEUE_SIZE=2048 # Max spans queued for export; drops when full (default: 2048) MB_TRACING_EXPORT_TIMEOUT_MS=10000 # Max wait for batch export to complete (default: 10000) MB_TRACING_SCHEDULE_DELAY_MS=5000 # Delay between consecutive batch exports (default: 5000)