Claude-skill-registry Data Contracts

Data contracts สำหรับกำหนด schema, quality expectations และ SLAs ระหว่าง data producers และ consumers

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-contracts" ~/.claude/skills/majiayu000-claude-skill-registry-data-contracts && rm -rf "$T"
manifest: skills/data/data-contracts/SKILL.md
source content

Data Contracts

Overview

Data contracts define the schema, quality expectations, และ SLAs for data shared between producers and consumers ช่วยให้ data layer เชื่อถือได้

Why This Matters

  • Trust: Consumers รู้ว่า data format ไม่เปลี่ยน
  • Quality: Define expectations ชัดเจน
  • Decoupling: Producers/consumers evolve independently
  • Discovery: รู้ว่า data อะไรมี format ไหน

Data Contract Template

# contracts/users.contract.yaml
name: users
version: 1.0.0
owner: user-team
description: User profile data
status: active

schema:
  type: object
  properties:
    id:
      type: string
      format: uuid
      description: Unique user identifier
    email:
      type: string
      format: email
      description: User email address
    name:
      type: string
      description: User full name
    created_at:
      type: string
      format: date-time
      description: Account creation timestamp
    status:
      type: string
      enum: [active, inactive, suspended]
      description: Account status
  required: [id, email, created_at, status]

quality:
  - name: no_null_emails
    check: email IS NOT NULL
    threshold: 100%
    severity: critical
  
  - name: valid_email_format
    check: email LIKE '%@%.%'
    threshold: 99%
    severity: high
  
  - name: unique_emails
    check: COUNT(DISTINCT email) = COUNT(*)
    threshold: 100%
    severity: critical
  
  - name: recent_data
    check: created_at > NOW() - INTERVAL '7 days'
    threshold: 95%
    severity: medium

sla:
  freshness: 1 hour  # Data updated within 1 hour
  availability: 99.9%  # Uptime guarantee
  latency_p95: 100ms  # 95th percentile query time
  completeness: 99%  # No missing required fields

consumers:
  - analytics-team
  - marketing-team
  - billing-service

producer:
  team: user-team
  service: user-api
  contact: user-team@example.com

changelog:
  - version: 1.0.0
    date: 2024-01-01
    changes: Initial contract
  - version: 1.1.0
    date: 2024-01-15
    changes: Added status field (non-breaking)

Contract Validation

Python Example

from datacontract import Contract, validate

# Load contract
contract = Contract.load('contracts/users.contract.yaml')

# Validate data
result = validate(data, contract)

if not result.passed:
    print(f"Validation failed: {result.failures}")
    for failure in result.failures:
        print(f"- {failure.check}: {failure.message}")
    raise DataQualityError(result.failures)

print("✓ Data meets contract requirements")

SQL Example

-- Validate quality checks
WITH quality_checks AS (
  SELECT
    'no_null_emails' as check_name,
    COUNT(*) FILTER (WHERE email IS NULL) as failures,
    COUNT(*) as total
  FROM users
  
  UNION ALL
  
  SELECT
    'valid_email_format',
    COUNT(*) FILTER (WHERE email NOT LIKE '%@%.%'),
    COUNT(*)
  FROM users
)
SELECT
  check_name,
  failures,
  total,
  (1 - failures::float / total) * 100 as pass_rate,
  CASE
    WHEN (1 - failures::float / total) * 100 < 99 THEN 'FAIL'
    ELSE 'PASS'
  END as status
FROM quality_checks;

Breaking Change Detection

# Compare contract versions
datacontract diff v1.0.0 v1.1.0

# Output:
# BREAKING CHANGES:
# - Removed field 'age' (was required)
# - Changed type of 'phone' from string to number
# 
# COMPATIBLE CHANGES:
# - Added optional field 'address'
# - Added new quality check 'valid_status'

Breaking vs Non-Breaking

# BREAKING (requires consumer updates):
- Remove required field
- Change field type
- Rename field
- Add new required field
- Stricter validation

# NON-BREAKING (backward compatible):
- Add optional field
- Remove optional field
- Relax validation
- Add new quality check

Contract Registry

// contracts/registry.ts
export const contracts = {
  users: {
    version: '1.1.0',
    path: 'contracts/users.contract.yaml',
    owner: 'user-team',
    consumers: ['analytics', 'marketing']
  },
  orders: {
    version: '2.0.0',
    path: 'contracts/orders.contract.yaml',
    owner: 'order-team',
    consumers: ['billing', 'shipping']
  }
};

// Get contract
export function getContract(name: string, version?: string) {
  const contract = contracts[name];
  if (!contract) {
    throw new Error(`Contract ${name} not found`);
  }
  return Contract.load(contract.path, version);
}

CI/CD Integration

# .github/workflows/contract-validation.yml
name: Contract Validation
on: [pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Validate Contract Schema
        run: |
          datacontract validate contracts/*.yaml
      
      - name: Check Breaking Changes
        run: |
          datacontract diff main HEAD
          if [ $? -eq 1 ]; then
            echo "Breaking changes detected!"
            exit 1
          fi
      
      - name: Test Data Quality
        run: |
          python scripts/test_contracts.py

Monitoring

# Monitor contract SLAs
import time
from prometheus_client import Gauge

# Metrics
freshness_gauge = Gauge('data_freshness_seconds', 'Data freshness', ['dataset'])
quality_gauge = Gauge('data_quality_score', 'Quality score', ['dataset', 'check'])

def monitor_contract(contract_name: str):
    contract = get_contract(contract_name)
    
    # Check freshness
    last_update = get_last_update_time(contract_name)
    freshness = time.time() - last_update
    freshness_gauge.labels(dataset=contract_name).set(freshness)
    
    # Check quality
    for check in contract.quality:
        score = run_quality_check(contract_name, check)
        quality_gauge.labels(
            dataset=contract_name,
            check=check.name
        ).set(score)
        
        # Alert if below threshold
        if score < check.threshold:
            alert(f"{contract_name}: {check.name} below threshold")

Best Practices

1. Version Semantically

1.0.0 → 1.0.1: Bug fix (patch)
1.0.0 → 1.1.0: New optional field (minor)
1.0.0 → 2.0.0: Breaking change (major)

2. Document Changes

changelog:
  - version: 2.0.0
    date: 2024-01-20
    changes: |
      BREAKING: Removed 'age' field
      Reason: Privacy compliance
      Migration: Use 'birth_year' instead

3. Notify Consumers

Before breaking change:
1. Announce in #data-platform
2. Email all consumers
3. Provide migration guide
4. Set deprecation timeline (30 days)

4. Test Contracts

def test_user_contract():
    contract = Contract.load('contracts/users.contract.yaml')
    
    # Test valid data
    valid_data = {
        'id': '123',
        'email': 'test@example.com',
        'created_at': '2024-01-16T12:00:00Z',
        'status': 'active'
    }
    assert validate(valid_data, contract).passed
    
    # Test invalid data
    invalid_data = {'id': '123'}  # Missing required fields
    assert not validate(invalid_data, contract).passed

Summary

Data Contracts: กำหนด schema, quality และ SLAs

Components:

  • Schema (fields, types, required)
  • Quality checks (validation rules)
  • SLAs (freshness, availability, latency)
  • Ownership (producer, consumers)

Versioning:

  • Semantic versioning (major.minor.patch)
  • Breaking vs non-breaking changes
  • Changelog documentation

Enforcement:

  • Validation in CI/CD
  • Quality monitoring
  • SLA tracking
  • Consumer notifications

Benefits:

  • Trust in data
  • Clear expectations
  • Independent evolution
  • Early error detection