Claude-skill-registry data-design
Data modeling, schema design, and data architecture
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-design" ~/.claude/skills/majiayu000-claude-skill-registry-data-design && rm -rf "$T"
manifest:
skills/data/data-design/SKILL.mdsource content
Data Design
Overview
Principles for designing data structures, schemas, and data flows that are efficient, maintainable, and scalable.
Data Modeling
Entity-Relationship Diagrams
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ User │ │ Order │ │ Product │ ├─────────────┤ ├─────────────┤ ├─────────────┤ │ id (PK) │──┐ │ id (PK) │ ┌──│ id (PK) │ │ email │ │ │ user_id(FK) │←───┘ │ name │ │ name │ └───→│ status │ │ price │ │ created_at │ │ total │ │ stock │ └─────────────┘ │ created_at │ └─────────────┘ └─────────────┘ │ │ │ ┌──────┴──────┐ │ ↓ ↓ │ ┌─────────────┐ │ │ OrderItem │ │ ├─────────────┤ │ │ id (PK) │ │ │ order_id(FK)│ │ │ product_id │─────────────────────┘ │ quantity │ │ price │ └─────────────┘
Relationship Types
| Type | Description | Example |
|---|---|---|
| 1:1 | One to one | User ↔ Profile |
| 1:N | One to many | User → Orders |
| M:N | Many to many | Students ↔ Courses |
-- 1:1 (profile extends user) CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE ); CREATE TABLE profiles ( user_id INTEGER PRIMARY KEY REFERENCES users(id), bio TEXT, avatar_url VARCHAR(255) ); -- 1:N (user has many orders) CREATE TABLE orders ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES users(id), total DECIMAL(10,2) ); -- M:N (students ↔ courses via junction table) CREATE TABLE enrollments ( student_id INTEGER REFERENCES students(id), course_id INTEGER REFERENCES courses(id), enrolled_at TIMESTAMP DEFAULT NOW(), PRIMARY KEY (student_id, course_id) );
Normalization
Normal Forms
| Form | Rule | Example Violation |
|---|---|---|
| 1NF | Atomic values, no repeating groups | |
| 2NF | 1NF + no partial dependencies | Non-key depends on part of composite key |
| 3NF | 2NF + no transitive dependencies | in orders table |
| BCNF | Every determinant is a candidate key | Rare edge cases |
-- ❌ Violates 1NF (non-atomic) CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(255), tags VARCHAR(255) -- "electronics,sale,featured" ); -- ✅ 1NF compliant CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(255) ); CREATE TABLE product_tags ( product_id INTEGER REFERENCES products(id), tag VARCHAR(50), PRIMARY KEY (product_id, tag) ); -- ❌ Violates 3NF (transitive dependency) CREATE TABLE orders ( id SERIAL PRIMARY KEY, customer_zip VARCHAR(10), customer_city VARCHAR(100) -- Depends on zip, not order ); -- ✅ 3NF compliant CREATE TABLE customers ( id SERIAL PRIMARY KEY, zip VARCHAR(10), city VARCHAR(100) ); CREATE TABLE orders ( id SERIAL PRIMARY KEY, customer_id INTEGER REFERENCES customers(id) );
Denormalization
When to Denormalize
Normalize for: ✅ Write-heavy workloads ✅ Data integrity requirements ✅ Storage efficiency ✅ Flexibility in queries Denormalize for: ✅ Read-heavy workloads ✅ Complex joins hurting performance ✅ Reporting/analytics ✅ Known access patterns
Denormalization Patterns
-- Computed columns CREATE TABLE orders ( id SERIAL PRIMARY KEY, items JSONB, item_count INTEGER GENERATED ALWAYS AS (jsonb_array_length(items)) STORED, total DECIMAL(10,2) ); -- Duplicated data for read performance CREATE TABLE posts ( id SERIAL PRIMARY KEY, author_id INTEGER REFERENCES users(id), author_name VARCHAR(100), -- Duplicated from users author_avatar VARCHAR(255), -- Duplicated from users content TEXT ); -- Materialized view for complex queries CREATE MATERIALIZED VIEW monthly_sales AS SELECT DATE_TRUNC('month', created_at) as month, product_id, SUM(quantity) as units_sold, SUM(total) as revenue FROM order_items GROUP BY 1, 2; -- Refresh periodically REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;
Schema Design Patterns
Soft Deletes
CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255), deleted_at TIMESTAMP NULL, -- Partial unique index CONSTRAINT unique_active_email UNIQUE (email) WHERE deleted_at IS NULL ); -- Query active users only SELECT * FROM users WHERE deleted_at IS NULL;
Audit Trail
CREATE TABLE audit_log ( id SERIAL PRIMARY KEY, table_name VARCHAR(100), record_id INTEGER, action VARCHAR(10), -- INSERT, UPDATE, DELETE old_data JSONB, new_data JSONB, changed_by INTEGER REFERENCES users(id), changed_at TIMESTAMP DEFAULT NOW() ); -- Trigger for automatic auditing CREATE OR REPLACE FUNCTION audit_trigger() RETURNS TRIGGER AS $$ BEGIN INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by) VALUES ( TG_TABLE_NAME, COALESCE(NEW.id, OLD.id), TG_OP, CASE WHEN TG_OP != 'INSERT' THEN to_jsonb(OLD) END, CASE WHEN TG_OP != 'DELETE' THEN to_jsonb(NEW) END, current_setting('app.user_id', true)::INTEGER ); RETURN COALESCE(NEW, OLD); END; $$ LANGUAGE plpgsql;
Multi-Tenancy
-- Row-level security CREATE TABLE organizations ( id SERIAL PRIMARY KEY, name VARCHAR(255) ); CREATE TABLE projects ( id SERIAL PRIMARY KEY, org_id INTEGER REFERENCES organizations(id), name VARCHAR(255) ); -- Enable RLS ALTER TABLE projects ENABLE ROW LEVEL SECURITY; CREATE POLICY org_isolation ON projects USING (org_id = current_setting('app.org_id')::INTEGER); -- Set org context per request SET app.org_id = 123; SELECT * FROM projects; -- Only sees org 123's projects
Versioning / History
-- Version table pattern CREATE TABLE documents ( id SERIAL PRIMARY KEY, current_version_id INTEGER ); CREATE TABLE document_versions ( id SERIAL PRIMARY KEY, document_id INTEGER REFERENCES documents(id), version INTEGER, content TEXT, created_at TIMESTAMP DEFAULT NOW(), created_by INTEGER REFERENCES users(id), UNIQUE (document_id, version) ); -- Temporal tables (PostgreSQL) CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(255), price DECIMAL(10,2), valid_from TIMESTAMP DEFAULT NOW(), valid_to TIMESTAMP DEFAULT 'infinity' ); -- Query historical state SELECT * FROM products WHERE valid_from <= '2024-01-01' AND valid_to > '2024-01-01';
NoSQL Schema Design
Document Store (MongoDB)
// Embedded vs Referenced // ✅ Embed when: data is accessed together, 1:few relationship { _id: ObjectId("..."), title: "Blog Post", author: { name: "John", email: "john@example.com" }, comments: [ { user: "Jane", text: "Great post!", date: ISODate("...") } ] } // ✅ Reference when: data is accessed independently, 1:many or M:N { _id: ObjectId("..."), title: "Blog Post", authorId: ObjectId("..."), // Reference to users collection commentIds: [ObjectId("..."), ObjectId("...")] } // ❌ Anti-pattern: Unbounded arrays { _id: ObjectId("..."), logs: [...] // Can grow to millions, hits 16MB limit } // ✅ Better: Bucket pattern { _id: ObjectId("..."), sensorId: "sensor-123", date: ISODate("2024-01-15"), readings: [...] // Max ~1000 per document }
Key-Value Store (Redis)
# Naming conventions user:123 # User object user:123:sessions # User's sessions (set) user:123:orders # User's orders (list) order:456 # Order object orders:pending # Queue of pending orders (list) products:category:electronics # Products in category (set) # Expiration patterns session:{token} # Expires after 30 min rate_limit:ip:1.2.3.4 # Expires after 1 min cache:api:/users/123 # Expires after 5 min
Data Pipeline Design
ETL vs ELT
ETL (Extract, Transform, Load): Source → Transform (external) → Data Warehouse Use: Traditional, when transformation is complex ELT (Extract, Load, Transform): Source → Data Lake/Warehouse → Transform (in-place) Use: Modern, leverages warehouse compute power
Event Sourcing
// Events are the source of truth interface Event { id: string; aggregateId: string; type: string; payload: unknown; timestamp: Date; version: number; } // Event store class EventStore { async append(aggregateId: string, events: Event[]) { await db.events.insertMany(events); } async getEvents(aggregateId: string): Promise<Event[]> { return db.events .find({ aggregateId }) .sort({ version: 1 }) .toArray(); } } // Rebuild state from events function rebuildAccount(events: Event[]): Account { return events.reduce((account, event) => { switch (event.type) { case 'AccountOpened': return { balance: 0, ...event.payload }; case 'MoneyDeposited': return { ...account, balance: account.balance + event.payload.amount }; case 'MoneyWithdrawn': return { ...account, balance: account.balance - event.payload.amount }; default: return account; } }, {} as Account); }
Data Governance
Data Quality Dimensions
| Dimension | Description | Example Check |
|---|---|---|
| Accuracy | Correct values | Email format validation |
| Completeness | No missing data | Required fields present |
| Consistency | Same across systems | User ID matches in all tables |
| Timeliness | Up to date | Last updated within 24h |
| Uniqueness | No duplicates | Unique email per user |
Schema Evolution
-- Safe migrations -- ✅ Adding nullable column ALTER TABLE users ADD COLUMN phone VARCHAR(20) NULL; -- ✅ Adding column with default ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active'; -- ⚠️ Making column non-null (multi-step) -- Step 1: Add with default ALTER TABLE users ADD COLUMN verified BOOLEAN DEFAULT false; -- Step 2: Backfill data UPDATE users SET verified = true WHERE email_verified_at IS NOT NULL; -- Step 3: Add constraint ALTER TABLE users ALTER COLUMN verified SET NOT NULL; -- ❌ Dangerous: Renaming column -- Instead: Add new, migrate data, remove old (over multiple deploys)
Related Skills
- [[database]] - Database implementation
- [[architecture-patterns]] - Data architecture patterns
- [[api-design]] - Data in APIs