Skillshub databricks-deploy-integration
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/databricks-deploy-integration" ~/.claude/skills/comeonoliver-skillshub-databricks-deploy-integration && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/databricks-deploy-integration/SKILL.mdsource content
Databricks Deploy Integration
Overview
Deploy Databricks jobs, DLT pipelines, and ML models using Declarative Automation Bundles (DABs, formerly Asset Bundles). Bundles provide infrastructure-as-code with
databricks.yml defining resources, targets (dev/staging/prod), variables, and permissions. The CLI handles validation, deployment, and lifecycle management.
Prerequisites
- Databricks CLI v0.200+ (
)databricks --version - Workspace access with service principal for automated deploys
bundle configuration at project rootdatabricks.yml
Instructions
Step 1: Initialize a Bundle
# Create from a template databricks bundle init # Available templates: # - default-python: Python notebook project # - default-sql: SQL project # - mlops-stacks: Full MLOps template with feature engineering
Step 2: Configure databricks.yml
databricks.yml# databricks.yml — single source of truth for project deployment bundle: name: sales-etl-pipeline workspace: host: ${DATABRICKS_HOST} variables: catalog: description: Unity Catalog name default: dev_catalog alert_email: description: Alert notification email default: dev@company.com warehouse_size: default: "2X-Small" include: - resources/*.yml targets: dev: default: true mode: development # dev mode auto-prefixes resources with [username] and enables debug workspace: root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/dev variables: catalog: dev_catalog staging: workspace: root_path: /Shared/.bundle/${bundle.name}/staging variables: catalog: staging_catalog alert_email: staging-alerts@company.com prod: mode: production # production mode prevents accidental destruction workspace: root_path: /Shared/.bundle/${bundle.name}/prod variables: catalog: prod_catalog alert_email: oncall@company.com warehouse_size: "Medium"
Step 3: Define Resources
# resources/jobs.yml resources: jobs: daily_etl: name: "daily-etl-${bundle.target}" max_concurrent_runs: 1 timeout_seconds: 14400 schedule: quartz_cron_expression: "0 0 6 * * ?" timezone_id: "UTC" email_notifications: on_failure: ["${var.alert_email}"] tasks: - task_key: extract notebook_task: notebook_path: ./src/extract.py base_parameters: catalog: "${var.catalog}" job_cluster_key: etl - task_key: transform depends_on: [{task_key: extract}] notebook_task: notebook_path: ./src/transform.py job_cluster_key: etl - task_key: load depends_on: [{task_key: transform}] notebook_task: notebook_path: ./src/load.py job_cluster_key: etl job_clusters: - job_cluster_key: etl new_cluster: spark_version: "14.3.x-scala2.12" node_type_id: "i3.xlarge" autoscale: min_workers: 1 max_workers: 4 aws_attributes: availability: SPOT_WITH_FALLBACK first_on_demand: 1
# resources/pipelines.yml (DLT) resources: pipelines: dlt_pipeline: name: "dlt-pipeline-${bundle.target}" target: "${var.catalog}.silver" catalog: "${var.catalog}" libraries: - notebook: path: ./src/dlt_pipeline.py continuous: false development: ${bundle.target == "dev"}
Step 4: Deploy Lifecycle Commands
# Validate — checks YAML syntax, variable resolution, permissions databricks bundle validate -t staging # Deploy — creates/updates jobs, uploads notebooks, syncs config databricks bundle deploy -t staging # Summary — show what's deployed databricks bundle summary -t staging # Run — trigger a specific job/pipeline databricks bundle run daily_etl -t staging # Run and wait for completion databricks bundle run daily_etl -t staging --restart-all-workflows # Sync — live-reload files during development databricks bundle sync -t dev --watch # Destroy — remove all deployed resources (dev only!) databricks bundle destroy -t dev --auto-approve
Step 5: Promote Staging to Production
# 1. Validate staging is clean databricks bundle validate -t staging # 2. Deploy and test on staging databricks bundle deploy -t staging RUN=$(databricks bundle run daily_etl -t staging --output json | jq -r '.run_id') databricks runs get --run-id $RUN | jq '.state.result_state' # 3. After staging passes, deploy to production databricks bundle validate -t prod databricks bundle deploy -t prod # 4. Verify production deployment databricks bundle summary -t prod databricks jobs list --output json | \ jq '.[] | select(.settings.name | contains("daily-etl-prod"))'
Step 6: Permissions in Bundles
# resources/jobs.yml — add permissions block resources: jobs: daily_etl: name: "daily-etl-${bundle.target}" permissions: - group_name: data-engineers level: CAN_MANAGE - group_name: data-analysts level: CAN_VIEW - service_principal_name: cicd-service-principal level: CAN_MANAGE_RUN
Output
with multi-target deployment (dev/staging/prod)databricks.yml- Job and pipeline resources defined as code
- Environment-specific variables (catalog, alerts, sizing)
- Promotion workflow from staging to production
- Permissions managed declaratively in bundle config
Error Handling
| Issue | Cause | Solution |
|---|---|---|
fails | Invalid YAML or unresolved variable | Check variable definitions and target config |
on deploy | Service principal lacks workspace access | Add SP to workspace in Account Console |
| Resource name collision across targets | Bundle auto-prefixes in mode |
| Too many active clusters | Use instance pools or terminate idle clusters |
| prevents accidental destroy | This is intentional — remove mode or use |
Examples
Override Variables per Target
# Override a variable at deploy time databricks bundle deploy -t prod --var="warehouse_size=Large"
Clean Slate Redeploy (Dev Only)
databricks bundle destroy -t dev --auto-approve databricks bundle deploy -t dev
Resources
Next Steps
For multi-environment setup, see
databricks-multi-env-setup.