Claude-skill-registry dbt-transformations

ALWAYS USE when working with dbt models, SQL transformations, tests, snapshots, or macros. Use IMMEDIATELY when editing dbt_project.yml, profiles.yml, or creating SQL models. MUST be loaded before any transform-layer work. Enforces dbt owns SQL principle - never parse, validate, or transform SQL in Python.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/dbt-skill" ~/.claude/skills/majiayu000-claude-skill-registry-dbt-transformations && rm -rf "$T"

manifest: skills/data/dbt-skill/SKILL.md

dbt Core Development (Research-Driven)

Constitution Alignment

This skill enforces project principles:

I. Technology Ownership: dbt owns ALL SQL compilation, dialect translation - NEVER parse SQL in Python
II. Plugin-First Architecture: dbt adapter is pluggable (DuckDB default, Snowflake, BigQuery alternatives)
IV. Contract-Driven Integration: dbt profiles generated from CompiledArtifacts contract
VIII. Observability By Default: dbt metadata (manifest.json) feeds lineage and governance

Philosophy

This skill does NOT prescribe specific SQL patterns or dbt project structures. Instead, it guides you to:

Research the current dbt Core version and capabilities
Discover existing dbt project patterns in the codebase
Validate your implementations against dbt documentation
Verify integration with Dagster orchestration and Iceberg storage

CRITICAL: dbt owns SQL. Never parse, validate, or transform SQL in Python. Let dbt handle all SQL dialect translation and execution.

Pre-Implementation Research Protocol

Step 1: Verify Runtime Environment

ALWAYS run this first:

dbt --version
python -c "import dbt.version; print(f'dbt-core {dbt.version.__version__}')"

Critical Questions to Answer:

What version is installed? (1.x series recommended)
What adapters are installed? (duckdb, snowflake, bigquery, etc.)
Does it match the documented requirements?

Step 2: Research SDK State (if unfamiliar)

When to research: If you encounter unfamiliar dbt features or need to validate patterns

Research queries (use WebSearch):

"dbt Core [feature] documentation 2025" (e.g., "dbt Core snapshots documentation 2025")
"dbt programmatic invocation dbtRunner 2025"
"dbt [adapter] configuration 2025" (e.g., "dbt duckdb configuration 2025")

Official documentation: https://docs.getdbt.com

Key documentation sections:

Models: https://docs.getdbt.com/docs/build/models
Tests: https://docs.getdbt.com/docs/build/data-tests
Snapshots: https://docs.getdbt.com/docs/build/snapshots
Programmatic invocations: https://docs.getdbt.com/reference/programmatic-invocations

Step 3: Discover Existing Patterns

BEFORE creating new dbt resources, search for existing implementations:

# Find dbt project directories
find . -name "dbt_project.yml"

# Find existing models
find . -path "*/models/*.sql" -o -path "*/models/*.py"

# Find existing tests
find . -path "*/tests/*.sql"

# Find macros
find . -path "*/macros/*.sql"

# Check profiles.yml location
echo $DBT_PROFILES_DIR
find . -name "profiles.yml"

Key questions:

What dbt project structure exists?
What naming conventions are used for models?
What adapters are configured in profiles.yml?
What macros or custom tests exist?

Step 4: Validate Against Architecture

Check architecture docs for integration requirements:

Read
```
/docs/
```
for CompiledArtifacts contract (dbt paths, profiles)
Understand compute targets (how they map to dbt profiles)
Verify Iceberg integration requirements
Check governance requirements (classification metadata in model
```
meta
```
)

Implementation Guidance (Not Prescriptive)

dbt Project Structure

Core concept: dbt projects organize transformations into models, tests, snapshots, seeds, and macros

Research questions:

Where should the dbt project live? (monorepo package structure)
What directory structure makes sense? (staging, intermediate, marts)
How should models be organized? (by source system, by domain)
What naming conventions should be used?

SDK features to research:

```
dbt_project.yml
```
: Project configuration
```
profiles.yml
```
: Connection profiles for compute targets
Model directories:
```
models/staging/
```
,
```
models/marts/
```
Resource paths:
```
analysis-paths
```
,
```
test-paths
```
,
```
macro-paths
```

Models (SQL and Python)

Core concept: Models are SELECT statements that create tables/views in the data warehouse

Research questions:

Should this be a SQL or Python model?
What materialization? (table, view, incremental, ephemeral)
What dependencies exist? (use
```
{{ ref('model_name') }}
```
)

What sources? (use

{{ source('source_name', 'table_name') }}

)

SDK features to research:

SQL models:
```
.sql
```
files with Jinja templating
Python models:
```
.py
```
files with
```
dbt.ref()
```
and
```
dbt.source()
```
Materializations:
```
{{ config(materialized='table') }}
```
Incremental models:
```
is_incremental()
```
, incremental strategies
Model configurations: tags, meta, pre/post hooks

Tests

Core concept: Tests assert data quality on models, sources, seeds, and snapshots

Research questions:

What data quality assertions are needed?
Should I use generic tests (unique, not_null) or singular tests?
What custom tests are needed?
How should test severity be configured? (warn vs error)

SDK features to research:

Generic tests:

unique

not_null

accepted_values

relationships

Singular tests: SQL files in
```
tests/
```
directory
Custom generic tests: Macros in
```
macros/
```
with
```
{% test %}
```
Test configurations:
```
severity
```
,
```
error_if
```
,
```
warn_if
```

Snapshots

Core concept: Snapshots capture Type 2 slowly changing dimensions (historical records)

Research questions:

What tables need historical tracking?
What snapshot strategy? (timestamp, check)
How should snapshot metadata be named?
What should happen to hard deletes?

SDK features to research:

Snapshot strategies:
```
timestamp
```
,
```
check
```
YAML configuration (v1.9+): cleaner snapshot setup
```
snapshot_meta_column_names
```
: Customize dbt_valid_from, dbt_valid_to
```
hard_deletes
```
:
```
ignore
```
,
```
invalidate
```
,
```
new_record
```

Sources

Core concept: Sources represent raw tables in external systems

Research questions:

What raw data sources exist?
How should source freshness be validated?
What source metadata is needed?
Should source tests be added?

SDK features to research:

```
sources.yml
```
: Source definitions

{{ source('source_name', 'table_name') }}

: Referencing sources

Source freshness:
```
loaded_at_field
```
,
```
freshness
```
checks
Source tests: Generic tests on source columns

Macros

Core concept: Macros are reusable SQL snippets (Jinja templates)

Research questions:

What SQL patterns are repeated?
Should I create a custom generic test?
Are there adapter-specific needs?

SDK features to research:

Macro syntax:
```
{% macro name(args) %}
```
```
{{ return() }}
```
: Returning values from macros
Adapter dispatch:
```
adapter.dispatch()
```
Package macros:
```
dbt_utils
```
, custom packages

Programmatic Invocation (Python API)

Core concept: dbtRunner allows calling dbt commands from Python (for Dagster integration)

Research questions:

What dbt commands need to be run from Python? (compile, run, test)
How should errors be handled?
Should manifest be cached for performance?
What context is needed (profiles dir, project dir, target)?

SDK features to research:

```
dbtRunner
```
: Entry point for programmatic invocation
```
dbtRunnerResult
```
: Return object with results and exceptions
```
.invoke(cli_args)
```
: Execute dbt commands
Manifest reuse: Avoid reparsing for performance

CRITICAL: dbt-core does NOT support safe parallel execution in the same process

Validation Workflow

Before Implementation

✅ Verified dbt Core version and adapter
✅ Searched for existing dbt project structure
✅ Read architecture docs for compute targets and profiles
✅ Identified data sources and transformation requirements
✅ Researched unfamiliar dbt features

During Implementation

✅ SQL models using Jinja (
```
{{ ref() }}
```
,
```
{{ source() }}
```
)
✅ Proper materialization configured for performance
✅ Tests added for data quality (generic and singular)
✅ Metadata added to model
```
meta
```
for governance (classifications)
✅ Dependencies correctly specified
✅ NEVER parse or validate SQL in Python (dbt owns SQL)

After Implementation

✅ Run
```
dbt debug
```
to verify connection
✅ Run
```
dbt compile
```
to verify SQL generation
✅ Run
```
dbt run --select model_name
```
to test model
✅ Run
```
dbt test
```
to validate data quality
✅ Check
```
target/manifest.json
```
for metadata
✅ Verify integration with Dagster (if applicable)

Context Injection (For Future Claude Instances)

When this skill is invoked, you should:

Verify runtime state (don't assume):

dbt --version
dbt debug  # Verify connection to target

Discover existing patterns (don't invent):

find . -name "dbt_project.yml"
find . -path "*/models/*.sql"

Research when uncertain (don't guess):
- Use WebSearch for "dbt Core [feature] documentation 2025"
- Check official docs: https://docs.getdbt.com
Validate against architecture (don't assume requirements):
- Read relevant architecture docs in
```
/docs/
```
- Understand compute targets and how they map to dbt profiles
- Check CompiledArtifacts contract for dbt paths
NEVER parse SQL (dbt owns SQL):
- Don't validate SQL syntax in Python
- Don't transform SQL dialects manually
- Let dbt handle all SQL compilation and execution

Quick Reference: Common Research Queries

Use these WebSearch queries when encountering specific needs:

Models: "dbt Core models materialization documentation 2025"
Incremental models: "dbt Core incremental models strategies 2025"
Tests: "dbt Core data tests custom tests 2025"
Snapshots: "dbt Core snapshots YAML configuration 2025"
Sources: "dbt Core sources freshness documentation 2025"
Macros: "dbt Core macros Jinja examples 2025"
Programmatic invocation: "dbt Core dbtRunner programmatic invocation 2025"
Adapters: "dbt [adapter] configuration 2025" (e.g., "dbt duckdb configuration 2025")
Python models: "dbt Core Python models documentation 2025"
Governance: "dbt Core model meta tags documentation 2025"

Integration Points to Research

CompiledArtifacts → dbt profiles.yml

Key question: How does CompiledArtifacts generate profiles.yml?

Research areas:

What compute targets are defined in CompiledArtifacts?
How do compute types map to dbt adapters? (duckdb, snowflake, bigquery)
Where should profiles.yml be written? (DBT_PROFILES_DIR)
What connection parameters are needed for each adapter?

dbt → Dagster Integration

Key question: How are dbt models represented as Dagster assets?

Research areas:

```
dagster-dbt
```
library usage

load_assets_from_dbt_manifest()

load_assets_from_dbt_project()

dbtRunner for programmatic invocation
Manifest.json location (target/manifest.json)
Metadata extraction (model meta, column classifications)

dbt → Iceberg Storage

Key question: How do dbt models write to Iceberg tables?

Research areas:

dbt adapter for Iceberg (does one exist?)
External tables pattern (dbt creates views, separate process writes Iceberg)
Python models writing to PyIceberg
Post-hooks for Iceberg table creation

Governance Metadata (dbt meta)

Key question: How should classifications be tagged in dbt models?

Research areas:

Model-level meta tags
Column-level meta tags
Custom schema tests for governance
Metadata extraction in CompiledArtifacts

dbt Development Workflow

Local Development

# Install dbt with adapter
pip install dbt-core dbt-duckdb

# Verify installation
dbt --version

# Initialize project (if needed)
dbt init my_project

# Verify connection
dbt debug

# Compile models (generate SQL)
dbt compile

# Run models
dbt run

# Run specific model
dbt run --select my_model

# Test data quality
dbt test

# Generate documentation
dbt docs generate
dbt docs serve  # View at localhost:8080

Programmatic Invocation (Python)

from dbt.cli.main import dbtRunner, dbtRunnerResult

# Initialize runner
dbt = dbtRunner()

# Run dbt command
cli_args = ["run", "--select", "tag:daily"]
res: dbtRunnerResult = dbt.invoke(cli_args)

# Check results
if res.success:
    for r in res.result:
        print(f"{r.node.name}: {r.status}")
else:
    print(f"Error: {res.exception}")

References

dbt Documentation: Official documentation
dbt Core on PyPI: Package information
Programmatic Invocations: dbtRunner API
Models: Model documentation
Tests: Testing documentation
Snapshots: Snapshot documentation
GitHub Repository: dbt-core source

Remember: This skill provides research guidance, NOT prescriptive SQL patterns. Always:

Verify the dbt version and adapter compatibility
Discover existing dbt project structure and conventions
Research dbt capabilities when needed (use WebSearch liberally)
Validate against actual CompiledArtifacts contract requirements
NEVER parse or validate SQL in Python - dbt owns SQL
Test with
```
dbt run
```
and
```
dbt test
```
before considering complete