Dotfiles databricks-model-serving
Deploy and query Databricks Model Serving endpoints. Use when (1) deploying MLflow models or AI agents to endpoints, (2) creating ChatAgent/ResponsesAgent agents, (3) integrating UC Functions or Vector Search tools, (4) querying deployed endpoints, (5) checking endpoint status. Covers classical ML models, custom pyfunc, and GenAI agents.
git clone https://github.com/msbaek/dotfiles
T=$(mktemp -d) && git clone --depth=1 https://github.com/msbaek/dotfiles "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/databricks-model-serving" ~/.claude/skills/msbaek-dotfiles-databricks-model-serving && rm -rf "$T"
.claude/skills/databricks-model-serving/SKILL.mdDatabricks Model Serving
Deploy MLflow models and AI agents to scalable REST API endpoints.
Quick Decision: What Are You Deploying?
| Model Type | Pattern | Reference |
|---|---|---|
| Traditional ML (sklearn, xgboost) | | 1-classical-ml.md |
| Custom Python model | | 2-custom-pyfunc.md |
| GenAI Agent (LangGraph, tool-calling) | | 3-genai-agents.md |
Prerequisites
- DBR 16.1+ recommended (pre-installed GenAI packages)
- Unity Catalog enabled workspace
- Model Serving enabled
Foundation Model API Endpoints
ALWAYS use exact endpoint names from this table. NEVER guess or abbreviate.
Chat / Instruct Models
| Endpoint Name | Provider | Notes |
|---|---|---|
| OpenAI | Latest GPT, 400K context |
| OpenAI | Instant + Thinking modes |
| OpenAI | Code-specialized (high perf) |
| OpenAI | Code-specialized (cost-opt) |
| OpenAI | 400K context, reasoning |
| OpenAI | Cost-optimized reasoning |
| OpenAI | High-throughput, lightweight |
| OpenAI | Open-weight, 128K context |
| OpenAI | Lightweight open-weight |
| Anthropic | Most capable, 1M context |
| Anthropic | Hybrid reasoning |
| Anthropic | Hybrid reasoning |
| Anthropic | Deep analysis, 200K context |
| Anthropic | Hybrid reasoning |
| Anthropic | 200K context, 32K output |
| Anthropic | Fastest, cost-effective |
| Anthropic | Retiring April 2026 |
| Meta | 128K context, multilingual |
| Meta | Retiring May 2026 (PT) |
| Meta | Lightweight, 128K context |
| Meta | MoE architecture |
| 1M context, hybrid reasoning | |
| 1M context, hybrid reasoning | |
| Fast, cost-efficient | |
| 1M context, Deep Think | |
| 1M context, hybrid reasoning | |
| 128K context, multilingual | |
| Alibaba | Efficient MoE |
Embedding Models
| Endpoint Name | Dimensions | Max Tokens | Notes |
|---|---|---|---|
| 1024 | 8192 | English, not normalized |
| 1024 | 512 | English, normalized |
| up to 1024 | ~32K | 100+ languages, instruction-aware |
Common Defaults
- Agent LLM:
(good balance of quality/cost)databricks-meta-llama-3-3-70b-instruct - Embedding:
databricks-gte-large-en - Code tasks:
ordatabricks-gpt-5-1-codex-minidatabricks-gpt-5-1-codex-max
These are pay-per-token endpoints available in every workspace. For production, consider provisioned throughput mode. See supported models.
Reference Files
| Topic | File | When to Read |
|---|---|---|
| Classical ML | 1-classical-ml.md | sklearn, xgboost, autolog |
| Custom PyFunc | 2-custom-pyfunc.md | Custom preprocessing, signatures |
| GenAI Agents | 3-genai-agents.md | ResponsesAgent, LangGraph |
| Tools Integration | 4-tools-integration.md | UC Functions, Vector Search |
| Development & Testing | 5-development-testing.md | MCP workflow, iteration |
| Logging & Registration | 6-logging-registration.md | mlflow.pyfunc.log_model |
| Deployment | 7-deployment.md | Job-based async deployment |
| Querying Endpoints | 8-querying-endpoints.md | SDK, REST, MCP tools |
| Package Requirements | 9-package-requirements.md | DBR versions, pip |
Quick Start: Deploy a GenAI Agent
Step 1: Install Packages (in notebook or via MCP)
%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic dbutils.library.restartPython()
Or via MCP:
execute_code(code="%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic")
Step 2: Create Agent File
Create
agent.py locally with ResponsesAgent pattern (see 3-genai-agents.md).
Step 3: Upload to Workspace
manage_workspace_files( action="upload", local_path="./my_agent", workspace_path="/Workspace/Users/you@company.com/my_agent" )
Step 4: Test Agent
execute_code( file_path="./my_agent/test_agent.py", cluster_id="<cluster_id>" )
Step 5: Log Model
execute_code( file_path="./my_agent/log_model.py", cluster_id="<cluster_id>" )
Step 6: Deploy (Async via Job)
See 7-deployment.md for job-based deployment that doesn't timeout.
Step 7: Query Endpoint
manage_serving_endpoint( action="query", name="my-agent-endpoint", messages=[{"role": "user", "content": "Hello!"}] )
Quick Start: Deploy a Classical ML Model
import mlflow import mlflow.sklearn from sklearn.linear_model import LogisticRegression # Enable autolog with auto-registration mlflow.sklearn.autolog( log_input_examples=True, registered_model_name="main.models.my_classifier" ) # Train - model is logged and registered automatically model = LogisticRegression() model.fit(X_train, y_train)
Then deploy via UI or SDK. See 1-classical-ml.md.
MCP Tools
If MCP tools are not available, use the SDK/CLI examples in the reference files below.
Development & Testing
| Tool | Purpose |
|---|---|
(action="upload") | Upload agent files to workspace |
| Install packages, test agent, log model |
Deployment
| Tool | Purpose |
|---|---|
(action="create") | Create deployment job (one-time) |
(action="run_now") | Kick off deployment (async) |
(action="get") | Check deployment job status |
manage_serving_endpoint - Querying
| Action | Description | Required Params |
|---|---|---|
| Check endpoint status (READY/NOT_READY/NOT_FOUND) | name |
| List all endpoints | (none, optional limit) |
| Send requests to endpoint | name + one of: messages, inputs, dataframe_records |
Example usage:
# Check endpoint status manage_serving_endpoint(action="get", name="my-agent-endpoint") # List all endpoints manage_serving_endpoint(action="list") # Query a chat/agent endpoint manage_serving_endpoint( action="query", name="my-agent-endpoint", messages=[{"role": "user", "content": "Hello!"}], max_tokens=500 ) # Query a traditional ML endpoint manage_serving_endpoint( action="query", name="sklearn-classifier", dataframe_records=[{"age": 25, "income": 50000, "credit_score": 720}] )
Common Workflows
Check Endpoint Status After Deployment
manage_serving_endpoint(action="get", name="my-agent-endpoint")
Returns:
{ "name": "my-agent-endpoint", "state": "READY", "served_entities": [...] }
Query a Chat/Agent Endpoint
manage_serving_endpoint( action="query", name="my-agent-endpoint", messages=[ {"role": "user", "content": "What is Databricks?"} ], max_tokens=500 )
Query a Traditional ML Endpoint
manage_serving_endpoint( action="query", name="sklearn-classifier", dataframe_records=[ {"age": 25, "income": 50000, "credit_score": 720} ] )
Common Issues
| Issue | Solution |
|---|---|
| Invalid output format | Use - NOT raw dicts! |
| Endpoint NOT_READY | Deployment takes ~15 min. Use to poll. |
| Package not found | Specify exact versions in when logging model |
| Tool timeout | Use job-based deployment, not synchronous calls |
| Auth error on endpoint | Ensure specified in for auto passthrough |
| Model not found | Check Unity Catalog path: |
Critical: ResponsesAgent Output Format
WRONG - raw dicts don't work:
return ResponsesAgentResponse(output=[{"role": "assistant", "content": "..."}])
CORRECT - use helper methods:
return ResponsesAgentResponse( output=[self.create_text_output_item(text="...", id="msg_1")] )
Available helper methods:
- text responsesself.create_text_output_item(text, id)
- tool callsself.create_function_call_item(id, call_id, name, arguments)
- tool resultsself.create_function_call_output_item(call_id, output)
Related Skills
- databricks-agent-bricks - Pre-built agent tiles that deploy to model-serving endpoints
- databricks-vector-search - Create vector indexes used as retriever tools in agents
- databricks-genie - Genie Spaces can serve as agents in multi-agent setups
- databricks-mlflow-evaluation - Evaluate model and agent quality before deployment
- databricks-jobs - Job-based async deployment used for agent endpoints