Dotfiles databricks-model-serving

Deploy and query Databricks Model Serving endpoints. Use when (1) deploying MLflow models or AI agents to endpoints, (2) creating ChatAgent/ResponsesAgent agents, (3) integrating UC Functions or Vector Search tools, (4) querying deployed endpoints, (5) checking endpoint status. Covers classical ML models, custom pyfunc, and GenAI agents.

install
source · Clone the upstream repo
git clone https://github.com/msbaek/dotfiles
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/msbaek/dotfiles "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/databricks-model-serving" ~/.claude/skills/msbaek-dotfiles-databricks-model-serving && rm -rf "$T"
manifest: .claude/skills/databricks-model-serving/SKILL.md
source content

Databricks Model Serving

Deploy MLflow models and AI agents to scalable REST API endpoints.

Quick Decision: What Are You Deploying?

Model TypePatternReference
Traditional ML (sklearn, xgboost)
mlflow.sklearn.autolog()
1-classical-ml.md
Custom Python model
mlflow.pyfunc.PythonModel
2-custom-pyfunc.md
GenAI Agent (LangGraph, tool-calling)
ResponsesAgent
3-genai-agents.md

Prerequisites

  • DBR 16.1+ recommended (pre-installed GenAI packages)
  • Unity Catalog enabled workspace
  • Model Serving enabled

Foundation Model API Endpoints

ALWAYS use exact endpoint names from this table. NEVER guess or abbreviate.

Chat / Instruct Models

Endpoint NameProviderNotes
databricks-gpt-5-2
OpenAILatest GPT, 400K context
databricks-gpt-5-1
OpenAIInstant + Thinking modes
databricks-gpt-5-1-codex-max
OpenAICode-specialized (high perf)
databricks-gpt-5-1-codex-mini
OpenAICode-specialized (cost-opt)
databricks-gpt-5
OpenAI400K context, reasoning
databricks-gpt-5-mini
OpenAICost-optimized reasoning
databricks-gpt-5-nano
OpenAIHigh-throughput, lightweight
databricks-gpt-oss-120b
OpenAIOpen-weight, 128K context
databricks-gpt-oss-20b
OpenAILightweight open-weight
databricks-claude-opus-4-6
AnthropicMost capable, 1M context
databricks-claude-sonnet-4-6
AnthropicHybrid reasoning
databricks-claude-sonnet-4-5
AnthropicHybrid reasoning
databricks-claude-opus-4-5
AnthropicDeep analysis, 200K context
databricks-claude-sonnet-4
AnthropicHybrid reasoning
databricks-claude-opus-4-1
Anthropic200K context, 32K output
databricks-claude-haiku-4-5
AnthropicFastest, cost-effective
databricks-claude-3-7-sonnet
AnthropicRetiring April 2026
databricks-meta-llama-3-3-70b-instruct
Meta128K context, multilingual
databricks-meta-llama-3-1-405b-instruct
MetaRetiring May 2026 (PT)
databricks-meta-llama-3-1-8b-instruct
MetaLightweight, 128K context
databricks-llama-4-maverick
MetaMoE architecture
databricks-gemini-3-1-pro
Google1M context, hybrid reasoning
databricks-gemini-3-pro
Google1M context, hybrid reasoning
databricks-gemini-3-flash
GoogleFast, cost-efficient
databricks-gemini-2-5-pro
Google1M context, Deep Think
databricks-gemini-2-5-flash
Google1M context, hybrid reasoning
databricks-gemma-3-12b
Google128K context, multilingual
databricks-qwen3-next-80b-a3b-instruct
AlibabaEfficient MoE

Embedding Models

Endpoint NameDimensionsMax TokensNotes
databricks-gte-large-en
10248192English, not normalized
databricks-bge-large-en
1024512English, normalized
databricks-qwen3-embedding-0-6b
up to 1024~32K100+ languages, instruction-aware

Common Defaults

  • Agent LLM:
    databricks-meta-llama-3-3-70b-instruct
    (good balance of quality/cost)
  • Embedding:
    databricks-gte-large-en
  • Code tasks:
    databricks-gpt-5-1-codex-mini
    or
    databricks-gpt-5-1-codex-max

These are pay-per-token endpoints available in every workspace. For production, consider provisioned throughput mode. See supported models.

Reference Files

TopicFileWhen to Read
Classical ML1-classical-ml.mdsklearn, xgboost, autolog
Custom PyFunc2-custom-pyfunc.mdCustom preprocessing, signatures
GenAI Agents3-genai-agents.mdResponsesAgent, LangGraph
Tools Integration4-tools-integration.mdUC Functions, Vector Search
Development & Testing5-development-testing.mdMCP workflow, iteration
Logging & Registration6-logging-registration.mdmlflow.pyfunc.log_model
Deployment7-deployment.mdJob-based async deployment
Querying Endpoints8-querying-endpoints.mdSDK, REST, MCP tools
Package Requirements9-package-requirements.mdDBR versions, pip

Quick Start: Deploy a GenAI Agent

Step 1: Install Packages (in notebook or via MCP)

%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic
dbutils.library.restartPython()

Or via MCP:

execute_code(code="%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic")

Step 2: Create Agent File

Create

agent.py
locally with
ResponsesAgent
pattern (see 3-genai-agents.md).

Step 3: Upload to Workspace

manage_workspace_files(
    action="upload",
    local_path="./my_agent",
    workspace_path="/Workspace/Users/you@company.com/my_agent"
)

Step 4: Test Agent

execute_code(
    file_path="./my_agent/test_agent.py",
    cluster_id="<cluster_id>"
)

Step 5: Log Model

execute_code(
    file_path="./my_agent/log_model.py",
    cluster_id="<cluster_id>"
)

Step 6: Deploy (Async via Job)

See 7-deployment.md for job-based deployment that doesn't timeout.

Step 7: Query Endpoint

manage_serving_endpoint(
    action="query",
    name="my-agent-endpoint",
    messages=[{"role": "user", "content": "Hello!"}]
)

Quick Start: Deploy a Classical ML Model

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression

# Enable autolog with auto-registration
mlflow.sklearn.autolog(
    log_input_examples=True,
    registered_model_name="main.models.my_classifier"
)

# Train - model is logged and registered automatically
model = LogisticRegression()
model.fit(X_train, y_train)

Then deploy via UI or SDK. See 1-classical-ml.md.


MCP Tools

If MCP tools are not available, use the SDK/CLI examples in the reference files below.

Development & Testing

ToolPurpose
manage_workspace_files
(action="upload")
Upload agent files to workspace
execute_code
Install packages, test agent, log model

Deployment

ToolPurpose
manage_jobs
(action="create")
Create deployment job (one-time)
manage_job_runs
(action="run_now")
Kick off deployment (async)
manage_job_runs
(action="get")
Check deployment job status

manage_serving_endpoint - Querying

ActionDescriptionRequired Params
get
Check endpoint status (READY/NOT_READY/NOT_FOUND)name
list
List all endpoints(none, optional limit)
query
Send requests to endpointname + one of: messages, inputs, dataframe_records

Example usage:

# Check endpoint status
manage_serving_endpoint(action="get", name="my-agent-endpoint")

# List all endpoints
manage_serving_endpoint(action="list")

# Query a chat/agent endpoint
manage_serving_endpoint(
    action="query",
    name="my-agent-endpoint",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=500
)

# Query a traditional ML endpoint
manage_serving_endpoint(
    action="query",
    name="sklearn-classifier",
    dataframe_records=[{"age": 25, "income": 50000, "credit_score": 720}]
)

Common Workflows

Check Endpoint Status After Deployment

manage_serving_endpoint(action="get", name="my-agent-endpoint")

Returns:

{
    "name": "my-agent-endpoint",
    "state": "READY",
    "served_entities": [...]
}

Query a Chat/Agent Endpoint

manage_serving_endpoint(
    action="query",
    name="my-agent-endpoint",
    messages=[
        {"role": "user", "content": "What is Databricks?"}
    ],
    max_tokens=500
)

Query a Traditional ML Endpoint

manage_serving_endpoint(
    action="query",
    name="sklearn-classifier",
    dataframe_records=[
        {"age": 25, "income": 50000, "credit_score": 720}
    ]
)

Common Issues

IssueSolution
Invalid output formatUse
self.create_text_output_item(text, id)
- NOT raw dicts!
Endpoint NOT_READYDeployment takes ~15 min. Use
manage_serving_endpoint(action="get")
to poll.
Package not foundSpecify exact versions in
pip_requirements
when logging model
Tool timeoutUse job-based deployment, not synchronous calls
Auth error on endpointEnsure
resources
specified in
log_model
for auto passthrough
Model not foundCheck Unity Catalog path:
catalog.schema.model_name

Critical: ResponsesAgent Output Format

WRONG - raw dicts don't work:

return ResponsesAgentResponse(output=[{"role": "assistant", "content": "..."}])

CORRECT - use helper methods:

return ResponsesAgentResponse(
    output=[self.create_text_output_item(text="...", id="msg_1")]
)

Available helper methods:

  • self.create_text_output_item(text, id)
    - text responses
  • self.create_function_call_item(id, call_id, name, arguments)
    - tool calls
  • self.create_function_call_output_item(call_id, output)
    - tool results

Related Skills

Resources