Claude-skill-registry Data Contracts
Data contracts สำหรับกำหนด schema, quality expectations และ SLAs ระหว่าง data producers และ consumers
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-contracts" ~/.claude/skills/majiayu000-claude-skill-registry-data-contracts && rm -rf "$T"
manifest:
skills/data/data-contracts/SKILL.mdsource content
Data Contracts
Overview
Data contracts define the schema, quality expectations, และ SLAs for data shared between producers and consumers ช่วยให้ data layer เชื่อถือได้
Why This Matters
- Trust: Consumers รู้ว่า data format ไม่เปลี่ยน
- Quality: Define expectations ชัดเจน
- Decoupling: Producers/consumers evolve independently
- Discovery: รู้ว่า data อะไรมี format ไหน
Data Contract Template
# contracts/users.contract.yaml name: users version: 1.0.0 owner: user-team description: User profile data status: active schema: type: object properties: id: type: string format: uuid description: Unique user identifier email: type: string format: email description: User email address name: type: string description: User full name created_at: type: string format: date-time description: Account creation timestamp status: type: string enum: [active, inactive, suspended] description: Account status required: [id, email, created_at, status] quality: - name: no_null_emails check: email IS NOT NULL threshold: 100% severity: critical - name: valid_email_format check: email LIKE '%@%.%' threshold: 99% severity: high - name: unique_emails check: COUNT(DISTINCT email) = COUNT(*) threshold: 100% severity: critical - name: recent_data check: created_at > NOW() - INTERVAL '7 days' threshold: 95% severity: medium sla: freshness: 1 hour # Data updated within 1 hour availability: 99.9% # Uptime guarantee latency_p95: 100ms # 95th percentile query time completeness: 99% # No missing required fields consumers: - analytics-team - marketing-team - billing-service producer: team: user-team service: user-api contact: user-team@example.com changelog: - version: 1.0.0 date: 2024-01-01 changes: Initial contract - version: 1.1.0 date: 2024-01-15 changes: Added status field (non-breaking)
Contract Validation
Python Example
from datacontract import Contract, validate # Load contract contract = Contract.load('contracts/users.contract.yaml') # Validate data result = validate(data, contract) if not result.passed: print(f"Validation failed: {result.failures}") for failure in result.failures: print(f"- {failure.check}: {failure.message}") raise DataQualityError(result.failures) print("✓ Data meets contract requirements")
SQL Example
-- Validate quality checks WITH quality_checks AS ( SELECT 'no_null_emails' as check_name, COUNT(*) FILTER (WHERE email IS NULL) as failures, COUNT(*) as total FROM users UNION ALL SELECT 'valid_email_format', COUNT(*) FILTER (WHERE email NOT LIKE '%@%.%'), COUNT(*) FROM users ) SELECT check_name, failures, total, (1 - failures::float / total) * 100 as pass_rate, CASE WHEN (1 - failures::float / total) * 100 < 99 THEN 'FAIL' ELSE 'PASS' END as status FROM quality_checks;
Breaking Change Detection
# Compare contract versions datacontract diff v1.0.0 v1.1.0 # Output: # BREAKING CHANGES: # - Removed field 'age' (was required) # - Changed type of 'phone' from string to number # # COMPATIBLE CHANGES: # - Added optional field 'address' # - Added new quality check 'valid_status'
Breaking vs Non-Breaking
# BREAKING (requires consumer updates): - Remove required field - Change field type - Rename field - Add new required field - Stricter validation # NON-BREAKING (backward compatible): - Add optional field - Remove optional field - Relax validation - Add new quality check
Contract Registry
// contracts/registry.ts export const contracts = { users: { version: '1.1.0', path: 'contracts/users.contract.yaml', owner: 'user-team', consumers: ['analytics', 'marketing'] }, orders: { version: '2.0.0', path: 'contracts/orders.contract.yaml', owner: 'order-team', consumers: ['billing', 'shipping'] } }; // Get contract export function getContract(name: string, version?: string) { const contract = contracts[name]; if (!contract) { throw new Error(`Contract ${name} not found`); } return Contract.load(contract.path, version); }
CI/CD Integration
# .github/workflows/contract-validation.yml name: Contract Validation on: [pull_request] jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Validate Contract Schema run: | datacontract validate contracts/*.yaml - name: Check Breaking Changes run: | datacontract diff main HEAD if [ $? -eq 1 ]; then echo "Breaking changes detected!" exit 1 fi - name: Test Data Quality run: | python scripts/test_contracts.py
Monitoring
# Monitor contract SLAs import time from prometheus_client import Gauge # Metrics freshness_gauge = Gauge('data_freshness_seconds', 'Data freshness', ['dataset']) quality_gauge = Gauge('data_quality_score', 'Quality score', ['dataset', 'check']) def monitor_contract(contract_name: str): contract = get_contract(contract_name) # Check freshness last_update = get_last_update_time(contract_name) freshness = time.time() - last_update freshness_gauge.labels(dataset=contract_name).set(freshness) # Check quality for check in contract.quality: score = run_quality_check(contract_name, check) quality_gauge.labels( dataset=contract_name, check=check.name ).set(score) # Alert if below threshold if score < check.threshold: alert(f"{contract_name}: {check.name} below threshold")
Best Practices
1. Version Semantically
1.0.0 → 1.0.1: Bug fix (patch) 1.0.0 → 1.1.0: New optional field (minor) 1.0.0 → 2.0.0: Breaking change (major)
2. Document Changes
changelog: - version: 2.0.0 date: 2024-01-20 changes: | BREAKING: Removed 'age' field Reason: Privacy compliance Migration: Use 'birth_year' instead
3. Notify Consumers
Before breaking change: 1. Announce in #data-platform 2. Email all consumers 3. Provide migration guide 4. Set deprecation timeline (30 days)
4. Test Contracts
def test_user_contract(): contract = Contract.load('contracts/users.contract.yaml') # Test valid data valid_data = { 'id': '123', 'email': 'test@example.com', 'created_at': '2024-01-16T12:00:00Z', 'status': 'active' } assert validate(valid_data, contract).passed # Test invalid data invalid_data = {'id': '123'} # Missing required fields assert not validate(invalid_data, contract).passed
Summary
Data Contracts: กำหนด schema, quality และ SLAs
Components:
- Schema (fields, types, required)
- Quality checks (validation rules)
- SLAs (freshness, availability, latency)
- Ownership (producer, consumers)
Versioning:
- Semantic versioning (major.minor.patch)
- Breaking vs non-breaking changes
- Changelog documentation
Enforcement:
- Validation in CI/CD
- Quality monitoring
- SLA tracking
- Consumer notifications
Benefits:
- Trust in data
- Clear expectations
- Independent evolution
- Early error detection