Dotfiles databricks-app-python
Builds Python-based Databricks applications using Dash, Streamlit, Gradio, Flask, FastAPI, or Reflex. Handles OAuth authorization (app and user auth), app resources, SQL warehouse and Lakebase connectivity, model serving integration, foundation model APIs, LLM integration, and deployment. Use when building Python web apps, dashboards, ML demos, or REST APIs for Databricks, or when the user mentions Streamlit, Dash, Gradio, Flask, FastAPI, Reflex, or Databricks app.
git clone https://github.com/msbaek/dotfiles
T=$(mktemp -d) && git clone --depth=1 https://github.com/msbaek/dotfiles "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/databricks-app-python" ~/.claude/skills/msbaek-dotfiles-databricks-app-python && rm -rf "$T"
.claude/skills/databricks-app-python/SKILL.mdDatabricks Python Application
Build Python-based Databricks applications. For full examples and recipes, see the Databricks Apps Cookbook.
Critical Rules (always follow)
- MUST confirm framework choice or use Framework Selection below
- MUST use SDK
for authentication (never hardcode tokens)Config() - MUST use
app.yaml
for resources (never hardcode resource IDs)valueFrom - MUST use
for Dash app layout and stylingdash-bootstrap-components - MUST use
for Streamlit database connections@st.cache_resource - MUST deploy Flask with Gunicorn, FastAPI with uvicorn (not dev servers)
Required Steps
Copy this checklist and verify each item:
- [ ] Framework selected - [ ] Auth strategy decided: app auth, user auth, or both - [ ] App resources identified (SQL warehouse, Lakebase, serving endpoint, etc.) - [ ] Backend data strategy decided (SQL warehouse, Lakebase, or SDK) - [ ] Deployment method: CLI or DABs
Framework Selection
| Framework | Best For | app.yaml Command |
|---|---|---|
| Dash | Production dashboards, BI tools, complex interactivity | |
| Streamlit | Rapid prototyping, data science apps, internal tools | |
| Gradio | ML demos, model interfaces, chat UIs | |
| Flask | Custom REST APIs, lightweight apps, webhooks | |
| FastAPI | Async APIs, auto-generated OpenAPI docs | |
| Reflex | Full-stack Python apps without JavaScript | |
Default: Recommend Streamlit for prototypes, Dash for production dashboards, FastAPI for APIs, Gradio for ML demos.
Quick Reference
| Concept | Details |
|---|---|
| Runtime | Python 3.11, Ubuntu 22.04, 2 vCPU, 6 GB RAM |
| Pre-installed | Dash 2.18.1, Streamlit 1.38.0, Gradio 4.44.0, Flask 3.0.3, FastAPI 0.115.0 |
| Auth (app) | Service principal via — auto-injected / |
| Auth (user) | header — see 1-authorization.md |
| Resources | in app.yaml — see 2-app-resources.md |
| Cookbook | https://apps-cookbook.dev/ |
| Docs | https://docs.databricks.com/aws/en/dev-tools/databricks-apps/ |
Detailed Guides
Authorization: Use 1-authorization.md when configuring app or user authorization — covers service principal auth, on-behalf-of user tokens, OAuth scopes, and per-framework code examples. (Keywords: OAuth, service principal, user auth, on-behalf-of, access token, scopes)
App resources: Use 2-app-resources.md when connecting your app to Databricks resources — covers SQL warehouses, Lakebase, model serving, secrets, volumes, and the
valueFrom pattern. (Keywords: resources, valueFrom, SQL warehouse, model serving, secrets, volumes, connections)
Frameworks: See 3-frameworks.md for Databricks-specific patterns per framework — covers Dash, Streamlit, Gradio, Flask, FastAPI, and Reflex with auth integration, deployment commands, and Cookbook links. (Keywords: Dash, Streamlit, Gradio, Flask, FastAPI, Reflex, framework selection)
Deployment: Use 4-deployment.md when deploying your app — covers Databricks CLI, Asset Bundles (DABs), app.yaml configuration, and post-deployment verification. (Keywords: deploy, CLI, DABs, asset bundles, app.yaml, logs)
Lakebase: Use 5-lakebase.md when using Lakebase (PostgreSQL) as your app's data layer — covers auto-injected env vars, psycopg2/asyncpg patterns, and when to choose Lakebase vs SQL warehouse. (Keywords: Lakebase, PostgreSQL, psycopg2, asyncpg, transactional, PGHOST)
MCP tools: Use 6-mcp-approach.md for managing app lifecycle via MCP tools — covers creating, deploying, monitoring, and deleting apps programmatically. (Keywords: MCP, create app, deploy app, app logs)
Foundation Models: See examples/llm_config.py for calling Databricks foundation model APIs — covers OAuth M2M auth, OpenAI-compatible client wiring, and token caching. (Keywords: foundation model, LLM, OpenAI client, chat completions)
Workflow
-
Determine the task type:
New app from scratch? → Use Framework Selection, then read 3-frameworks.md Setting up authorization? → Read 1-authorization.md Connecting to data/resources? → Read 2-app-resources.md Using Lakebase (PostgreSQL)? → Read 5-lakebase.md Deploying to Databricks? → Read 4-deployment.md Using MCP tools? → Read 6-mcp-approach.md Calling foundation model/LLM APIs? → See examples/llm_config.py
-
Follow the instructions in the relevant guide
-
For full code examples, browse https://apps-cookbook.dev/
Core Architecture
All Python Databricks apps follow this pattern:
app-directory/ ├── app.py # Main application (or framework-specific name) ├── models.py # Pydantic data models ├── backend.py # Data access layer ├── requirements.txt # Additional Python dependencies ├── app.yaml # Databricks Apps configuration └── README.md
Backend Toggle Pattern
import os from databricks.sdk.core import Config USE_MOCK = os.getenv("USE_MOCK_BACKEND", "true").lower() == "true" if USE_MOCK: from backend_mock import MockBackend as Backend else: from backend_real import RealBackend as Backend backend = Backend()
SQL Warehouse Connection (shared across all frameworks)
from databricks.sdk.core import Config from databricks import sql cfg = Config() # Auto-detects credentials from environment conn = sql.connect( server_hostname=cfg.host, http_path=f"/sql/1.0/warehouses/{os.getenv('DATABRICKS_WAREHOUSE_ID')}", credentials_provider=lambda: cfg.authenticate, )
Pydantic Models
from pydantic import BaseModel, Field from datetime import datetime from enum import Enum class Status(str, Enum): ACTIVE = "active" PENDING = "pending" class EntityOut(BaseModel): id: str name: str status: Status created_at: datetime class EntityIn(BaseModel): name: str = Field(..., min_length=1) status: Status = Status.PENDING
Common Issues
| Issue | Solution |
|---|---|
| Connection exhausted | Use (Streamlit) or connection pooling |
| Auth token not found | Check header — only available when deployed, not locally |
| App won't start | Check command matches framework; check |
| Resource not accessible | Add resource via UI, verify SP has permissions, use in app.yaml |
| Import error on deploy | Add missing packages to (pre-installed packages don't need listing) |
| Lakebase app crashes on start | / are NOT pre-installed — MUST add to |
| Port conflict | Apps must bind to env var (defaults to 8000). Never use 8080. Streamlit is auto-configured; for others, read the env var in code or use 8000 in app.yaml command |
| Streamlit: set_page_config error | must be the first Streamlit command |
| Dash: unstyled layout | Add ; use |
| Slow queries | Use Lakebase for transactional/low-latency; SQL warehouse for analytical queries |
Platform Constraints
| Constraint | Details |
|---|---|
| Runtime | Python 3.11, Ubuntu 22.04 LTS |
| Compute | 2 vCPUs, 6 GB memory (default) |
| Pre-installed frameworks | Dash, Streamlit, Gradio, Flask, FastAPI, Shiny |
| Custom packages | Add to in app root |
| Network | Apps can reach Databricks APIs; external access depends on workspace config |
| User auth | Public Preview — workspace admin must enable before adding scopes |
Official Documentation
- Databricks Apps Overview — main docs hub
- Apps Cookbook — ready-to-use code snippets (Streamlit, Dash, Reflex, FastAPI)
- Authorization — app auth and user auth
- Resources — SQL warehouse, Lakebase, serving, secrets
- app.yaml Reference — command and env config
- System Environment — pre-installed packages, runtime details
Related Skills
- databricks-app-apx - full-stack apps with FastAPI + React
- databricks-bundles - deploying apps via DABs
- databricks-python-sdk - backend SDK integration
- databricks-lakebase-provisioned - adding persistent PostgreSQL state
- databricks-model-serving - serving ML models for app integration