Claude-skill-registry dbt-transformations
ALWAYS USE when working with dbt models, SQL transformations, tests, snapshots, or macros. Use IMMEDIATELY when editing dbt_project.yml, profiles.yml, or creating SQL models. MUST be loaded before any transform-layer work. Enforces dbt owns SQL principle - never parse, validate, or transform SQL in Python.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/dbt-skill" ~/.claude/skills/majiayu000-claude-skill-registry-dbt-transformations && rm -rf "$T"
skills/data/dbt-skill/SKILL.mddbt Core Development (Research-Driven)
Constitution Alignment
This skill enforces project principles:
- I. Technology Ownership: dbt owns ALL SQL compilation, dialect translation - NEVER parse SQL in Python
- II. Plugin-First Architecture: dbt adapter is pluggable (DuckDB default, Snowflake, BigQuery alternatives)
- IV. Contract-Driven Integration: dbt profiles generated from CompiledArtifacts contract
- VIII. Observability By Default: dbt metadata (manifest.json) feeds lineage and governance
Philosophy
This skill does NOT prescribe specific SQL patterns or dbt project structures. Instead, it guides you to:
- Research the current dbt Core version and capabilities
- Discover existing dbt project patterns in the codebase
- Validate your implementations against dbt documentation
- Verify integration with Dagster orchestration and Iceberg storage
CRITICAL: dbt owns SQL. Never parse, validate, or transform SQL in Python. Let dbt handle all SQL dialect translation and execution.
Pre-Implementation Research Protocol
Step 1: Verify Runtime Environment
ALWAYS run this first:
dbt --version python -c "import dbt.version; print(f'dbt-core {dbt.version.__version__}')"
Critical Questions to Answer:
- What version is installed? (1.x series recommended)
- What adapters are installed? (duckdb, snowflake, bigquery, etc.)
- Does it match the documented requirements?
Step 2: Research SDK State (if unfamiliar)
When to research: If you encounter unfamiliar dbt features or need to validate patterns
Research queries (use WebSearch):
- "dbt Core [feature] documentation 2025" (e.g., "dbt Core snapshots documentation 2025")
- "dbt programmatic invocation dbtRunner 2025"
- "dbt [adapter] configuration 2025" (e.g., "dbt duckdb configuration 2025")
Official documentation: https://docs.getdbt.com
Key documentation sections:
- Models: https://docs.getdbt.com/docs/build/models
- Tests: https://docs.getdbt.com/docs/build/data-tests
- Snapshots: https://docs.getdbt.com/docs/build/snapshots
- Programmatic invocations: https://docs.getdbt.com/reference/programmatic-invocations
Step 3: Discover Existing Patterns
BEFORE creating new dbt resources, search for existing implementations:
# Find dbt project directories find . -name "dbt_project.yml" # Find existing models find . -path "*/models/*.sql" -o -path "*/models/*.py" # Find existing tests find . -path "*/tests/*.sql" # Find macros find . -path "*/macros/*.sql" # Check profiles.yml location echo $DBT_PROFILES_DIR find . -name "profiles.yml"
Key questions:
- What dbt project structure exists?
- What naming conventions are used for models?
- What adapters are configured in profiles.yml?
- What macros or custom tests exist?
Step 4: Validate Against Architecture
Check architecture docs for integration requirements:
- Read
for CompiledArtifacts contract (dbt paths, profiles)/docs/ - Understand compute targets (how they map to dbt profiles)
- Verify Iceberg integration requirements
- Check governance requirements (classification metadata in model
)meta
Implementation Guidance (Not Prescriptive)
dbt Project Structure
Core concept: dbt projects organize transformations into models, tests, snapshots, seeds, and macros
Research questions:
- Where should the dbt project live? (monorepo package structure)
- What directory structure makes sense? (staging, intermediate, marts)
- How should models be organized? (by source system, by domain)
- What naming conventions should be used?
SDK features to research:
: Project configurationdbt_project.yml
: Connection profiles for compute targetsprofiles.yml- Model directories:
,models/staging/models/marts/ - Resource paths:
,analysis-paths
,test-pathsmacro-paths
Models (SQL and Python)
Core concept: Models are SELECT statements that create tables/views in the data warehouse
Research questions:
- Should this be a SQL or Python model?
- What materialization? (table, view, incremental, ephemeral)
- What dependencies exist? (use
){{ ref('model_name') }} - What sources? (use
){{ source('source_name', 'table_name') }}
SDK features to research:
- SQL models:
files with Jinja templating.sql - Python models:
files with.py
anddbt.ref()dbt.source() - Materializations:
{{ config(materialized='table') }} - Incremental models:
, incremental strategiesis_incremental() - Model configurations: tags, meta, pre/post hooks
Tests
Core concept: Tests assert data quality on models, sources, seeds, and snapshots
Research questions:
- What data quality assertions are needed?
- Should I use generic tests (unique, not_null) or singular tests?
- What custom tests are needed?
- How should test severity be configured? (warn vs error)
SDK features to research:
- Generic tests:
,unique
,not_null
,accepted_valuesrelationships - Singular tests: SQL files in
directorytests/ - Custom generic tests: Macros in
withmacros/{% test %} - Test configurations:
,severity
,error_ifwarn_if
Snapshots
Core concept: Snapshots capture Type 2 slowly changing dimensions (historical records)
Research questions:
- What tables need historical tracking?
- What snapshot strategy? (timestamp, check)
- How should snapshot metadata be named?
- What should happen to hard deletes?
SDK features to research:
- Snapshot strategies:
,timestampcheck - YAML configuration (v1.9+): cleaner snapshot setup
: Customize dbt_valid_from, dbt_valid_tosnapshot_meta_column_names
:hard_deletes
,ignore
,invalidatenew_record
Sources
Core concept: Sources represent raw tables in external systems
Research questions:
- What raw data sources exist?
- How should source freshness be validated?
- What source metadata is needed?
- Should source tests be added?
SDK features to research:
: Source definitionssources.yml
: Referencing sources{{ source('source_name', 'table_name') }}- Source freshness:
,loaded_at_field
checksfreshness - Source tests: Generic tests on source columns
Macros
Core concept: Macros are reusable SQL snippets (Jinja templates)
Research questions:
- What SQL patterns are repeated?
- Should I create a custom generic test?
- Are there adapter-specific needs?
SDK features to research:
- Macro syntax:
{% macro name(args) %}
: Returning values from macros{{ return() }}- Adapter dispatch:
adapter.dispatch() - Package macros:
, custom packagesdbt_utils
Programmatic Invocation (Python API)
Core concept: dbtRunner allows calling dbt commands from Python (for Dagster integration)
Research questions:
- What dbt commands need to be run from Python? (compile, run, test)
- How should errors be handled?
- Should manifest be cached for performance?
- What context is needed (profiles dir, project dir, target)?
SDK features to research:
: Entry point for programmatic invocationdbtRunner
: Return object with results and exceptionsdbtRunnerResult
: Execute dbt commands.invoke(cli_args)- Manifest reuse: Avoid reparsing for performance
CRITICAL: dbt-core does NOT support safe parallel execution in the same process
Validation Workflow
Before Implementation
- ✅ Verified dbt Core version and adapter
- ✅ Searched for existing dbt project structure
- ✅ Read architecture docs for compute targets and profiles
- ✅ Identified data sources and transformation requirements
- ✅ Researched unfamiliar dbt features
During Implementation
- ✅ SQL models using Jinja (
,{{ ref() }}
){{ source() }} - ✅ Proper materialization configured for performance
- ✅ Tests added for data quality (generic and singular)
- ✅ Metadata added to model
for governance (classifications)meta - ✅ Dependencies correctly specified
- ✅ NEVER parse or validate SQL in Python (dbt owns SQL)
After Implementation
- ✅ Run
to verify connectiondbt debug - ✅ Run
to verify SQL generationdbt compile - ✅ Run
to test modeldbt run --select model_name - ✅ Run
to validate data qualitydbt test - ✅ Check
for metadatatarget/manifest.json - ✅ Verify integration with Dagster (if applicable)
Context Injection (For Future Claude Instances)
When this skill is invoked, you should:
-
Verify runtime state (don't assume):
dbt --version dbt debug # Verify connection to target -
Discover existing patterns (don't invent):
find . -name "dbt_project.yml" find . -path "*/models/*.sql" -
Research when uncertain (don't guess):
- Use WebSearch for "dbt Core [feature] documentation 2025"
- Check official docs: https://docs.getdbt.com
-
Validate against architecture (don't assume requirements):
- Read relevant architecture docs in
/docs/ - Understand compute targets and how they map to dbt profiles
- Check CompiledArtifacts contract for dbt paths
- Read relevant architecture docs in
-
NEVER parse SQL (dbt owns SQL):
- Don't validate SQL syntax in Python
- Don't transform SQL dialects manually
- Let dbt handle all SQL compilation and execution
Quick Reference: Common Research Queries
Use these WebSearch queries when encountering specific needs:
- Models: "dbt Core models materialization documentation 2025"
- Incremental models: "dbt Core incremental models strategies 2025"
- Tests: "dbt Core data tests custom tests 2025"
- Snapshots: "dbt Core snapshots YAML configuration 2025"
- Sources: "dbt Core sources freshness documentation 2025"
- Macros: "dbt Core macros Jinja examples 2025"
- Programmatic invocation: "dbt Core dbtRunner programmatic invocation 2025"
- Adapters: "dbt [adapter] configuration 2025" (e.g., "dbt duckdb configuration 2025")
- Python models: "dbt Core Python models documentation 2025"
- Governance: "dbt Core model meta tags documentation 2025"
Integration Points to Research
CompiledArtifacts → dbt profiles.yml
Key question: How does CompiledArtifacts generate profiles.yml?
Research areas:
- What compute targets are defined in CompiledArtifacts?
- How do compute types map to dbt adapters? (duckdb, snowflake, bigquery)
- Where should profiles.yml be written? (DBT_PROFILES_DIR)
- What connection parameters are needed for each adapter?
dbt → Dagster Integration
Key question: How are dbt models represented as Dagster assets?
Research areas:
library usagedagster-dbt
vsload_assets_from_dbt_manifest()load_assets_from_dbt_project()- dbtRunner for programmatic invocation
- Manifest.json location (target/manifest.json)
- Metadata extraction (model meta, column classifications)
dbt → Iceberg Storage
Key question: How do dbt models write to Iceberg tables?
Research areas:
- dbt adapter for Iceberg (does one exist?)
- External tables pattern (dbt creates views, separate process writes Iceberg)
- Python models writing to PyIceberg
- Post-hooks for Iceberg table creation
Governance Metadata (dbt meta)
Key question: How should classifications be tagged in dbt models?
Research areas:
- Model-level meta tags
- Column-level meta tags
- Custom schema tests for governance
- Metadata extraction in CompiledArtifacts
dbt Development Workflow
Local Development
# Install dbt with adapter pip install dbt-core dbt-duckdb # Verify installation dbt --version # Initialize project (if needed) dbt init my_project # Verify connection dbt debug # Compile models (generate SQL) dbt compile # Run models dbt run # Run specific model dbt run --select my_model # Test data quality dbt test # Generate documentation dbt docs generate dbt docs serve # View at localhost:8080
Programmatic Invocation (Python)
from dbt.cli.main import dbtRunner, dbtRunnerResult # Initialize runner dbt = dbtRunner() # Run dbt command cli_args = ["run", "--select", "tag:daily"] res: dbtRunnerResult = dbt.invoke(cli_args) # Check results if res.success: for r in res.result: print(f"{r.node.name}: {r.status}") else: print(f"Error: {res.exception}")
References
- dbt Documentation: Official documentation
- dbt Core on PyPI: Package information
- Programmatic Invocations: dbtRunner API
- Models: Model documentation
- Tests: Testing documentation
- Snapshots: Snapshot documentation
- GitHub Repository: dbt-core source
Remember: This skill provides research guidance, NOT prescriptive SQL patterns. Always:
- Verify the dbt version and adapter compatibility
- Discover existing dbt project structure and conventions
- Research dbt capabilities when needed (use WebSearch liberally)
- Validate against actual CompiledArtifacts contract requirements
- NEVER parse or validate SQL in Python - dbt owns SQL
- Test with
anddbt run
before considering completedbt test