Claude-skill-registry data-quality

Data quality testing with dbt tests, Great Expectations, and monitoring.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-quality" ~/.claude/skills/majiayu000-claude-skill-registry-data-quality && rm -rf "$T"
manifest: skills/data/data-quality/SKILL.md
source content

Data Quality

Quality Dimensions

DimensionDescriptionTest
CompletenessNo missing valuesNOT NULL, count checks
UniquenessNo duplicatesUNIQUE, distinct counts
ValidityValues in rangeRange checks, regex
ConsistencyMatches across sourcesCross-table checks
TimelinessData is freshFreshness checks

dbt Tests

Schema Tests

models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['pending', 'completed', 'cancelled']
      - name: amount
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 1000000

Custom Tests

-- tests/assert_positive_revenue.sql
select *
from {{ ref('fct_orders') }}
where amount < 0

Relationship Tests

- name: customer_id
  tests:
    - relationships:
        to: ref('dim_customer')
        field: customer_id

Great Expectations

import great_expectations as gx

context = gx.get_context()

validator = context.sources.pandas_default.read_csv("data.csv")

validator.expect_column_values_to_not_be_null("order_id")
validator.expect_column_values_to_be_unique("order_id")
validator.expect_column_values_to_be_between("amount", 0, 1000000)

results = validator.validate()

Monitoring

  • Row count trends
  • Null percentage trends
  • Schema drift detection
  • Freshness SLAs
  • Anomaly detection