Awesome-omni-skill database-architect
Database design and optimization specialist. Schema design, query optimization, indexing strategies, data modeling, and migration planning for relational and NoSQL databases.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/database-architect" ~/.claude/skills/diegosouzapw-awesome-omni-skill-database-architect-63f022 && rm -rf "$T"
skills/data-ai/database-architect/SKILL.mdDatabase Architect Skill
<identity> Database Architect Skill - Designs efficient database schemas, optimizes queries, plans indexes, and creates migration strategies for both relational (PostgreSQL, MySQL) and NoSQL (MongoDB, Redis) databases. </identity> <capabilities> - Designing normalized and denormalized schemas - Query optimization and execution plan analysis - Index strategy planning - Data modeling (ER diagrams, relationships) - Migration planning and versioning - Performance troubleshooting </capabilities> <instructions> <execution_process>Step 1: Understand Data Requirements
Gather requirements:
- Entities: What data needs to be stored?
- Relationships: How do entities relate (1:1, 1:N, N:M)?
- Access Patterns: How will data be queried?
- Volume: Expected data size and growth rate
- Consistency: ACID requirements vs eventual consistency
Step 2: Design Schema
For Relational Databases:
- Normalize: Start with 3NF to reduce redundancy
- Define Primary Keys: Use surrogate keys (UUID/SERIAL) or natural keys
- Define Foreign Keys: Establish referential integrity
- Consider Denormalization: Only for proven performance needs
For NoSQL Databases:
- Model for Queries: Design documents/collections around access patterns
- Embed vs Reference: Embed for 1:1/1:few, reference for 1:many
- Shard Key Selection: Choose keys that distribute evenly
Step 3: Plan Indexes
Index strategy based on query patterns:
-- Example: Users table with common queries CREATE INDEX idx_users_email ON users(email); -- Exact match CREATE INDEX idx_users_name ON users(last_name, first_name); -- Range/sort CREATE INDEX idx_users_created ON users(created_at DESC); -- Ordering
Index Guidelines:
- Index columns used in WHERE, JOIN, ORDER BY
- Consider composite indexes for multi-column queries
- Avoid over-indexing (slows writes)
- Use covering indexes for read-heavy queries
Step 4: Plan Migrations
Create versioned migrations:
migrations/ 001_create_users.sql 002_add_email_index.sql 003_create_orders.sql
Migration Best Practices:
- Always include up and down migrations
- Test migrations on production-like data
- Plan for zero-downtime migrations
- Backup before running migrations
Step 5: Optimize Queries
Analyze and improve slow queries:
- Use EXPLAIN ANALYZE: Understand execution plans
- Identify Table Scans: Replace with index scans
- Optimize JOINs: Ensure indexes on join columns
- Batch Operations: Use bulk inserts/updates
- Connection Pooling: Reduce connection overhead
Step 6: PostgreSQL 17 Features (2024–2026)
Leverage PostgreSQL 17 capabilities where applicable:
Performance improvements:
- New
memory management — up to 20x lower memory footprint; vacuum now runs faster on busy systemsVACUUM - Streaming I/O interface accelerates sequential scans on large datasets
indexes support parallel buildsBRIN- B-tree indexes are more efficient for
clause queriesIN - Optimized CTE (Common Table Expression) planning
SQL/JSON enhancements (PG 17):
— converts JSON data into relational table representationJSON_TABLE()- JSON constructors and identity functions (
,JSON()
,JSON_SCALAR()
,JSON_ARRAY()
)JSON_OBJECT() - Use
for expressive path-based queries over JSONB columnsjsonpath
Incremental backups:
supports incremental backup; combine withpg_basebackup
for zero-data-loss major version upgradespg_upgrade
Logical replication improvements:
- Failover control for logical replication slots
creates logical replicas from physical standbyspg_createsubscriber
now preserves logical replication slots across major version upgradespg_upgrade
Security:
- New
privilege — grants targeted maintenance rights without full superuser accessMAINTAIN
client option for direct TLS handshake (avoids round-trip)sslnegotiation=direct
COPY improvements:
— continues import on row-level errors instead of abortingCOPY ... ON_ERROR ignore
Step 7: pgvector for AI Embeddings
Store and query vector embeddings alongside relational data to avoid a separate vector database:
-- Install extension CREATE EXTENSION IF NOT EXISTS vector; -- Table with embedding column (1536 dims for OpenAI text-embedding-3-small) CREATE TABLE documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content TEXT NOT NULL, embedding vector(1536), created_at TIMESTAMPTZ DEFAULT NOW() ); -- IVFFlat index for approximate nearest neighbor (ANN) search -- lists = sqrt(row_count) is a good starting value CREATE INDEX idx_documents_embedding ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- HNSW index (faster queries, more memory; preferred for < 1M vectors) CREATE INDEX idx_documents_embedding_hnsw ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64); -- Similarity search (cosine distance) SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity FROM documents ORDER BY embedding <=> $1::vector LIMIT 10;
When to use pgvector vs. dedicated vector DB:
- Up to ~10M vectors: pgvector is sufficient (sub-50ms queries with HNSW index)
- Above 10M vectors or requiring specialized ANN algorithms: consider Pinecone, Weaviate, or Qdrant
- pgvector advantage: same backups, replication, and connection pooling as the rest of PostgreSQL
Step 8: Table Partitioning Strategies
Use declarative partitioning for tables expected to exceed available RAM:
-- Range partitioning by date (common for time-series / logs) CREATE TABLE events ( id BIGSERIAL, created_at TIMESTAMPTZ NOT NULL, event_type TEXT NOT NULL, payload JSONB ) PARTITION BY RANGE (created_at); -- Monthly partitions CREATE TABLE events_2025_01 PARTITION OF events FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'); CREATE TABLE events_2025_02 PARTITION OF events FOR VALUES FROM ('2025-02-01') TO ('2025-03-01'); -- Hash partitioning for even distribution (e.g., multi-tenant) CREATE TABLE orders ( id UUID NOT NULL, tenant_id UUID NOT NULL, total DECIMAL(12,2) ) PARTITION BY HASH (tenant_id); CREATE TABLE orders_p0 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 0); CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 1); CREATE TABLE orders_p2 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 2); CREATE TABLE orders_p3 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 3);
Partition pruning: PostgreSQL automatically skips irrelevant partitions when the partition key appears in
WHERE. Always include the partition key in queries.
Index on partitioned tables: Indexes created on the parent table are automatically created on all child partitions.
Step 9: JSONB Patterns at Scale
-- Generated columns promote hot JSONB fields to indexed native columns CREATE TABLE customers ( id BIGSERIAL PRIMARY KEY, data JSONB NOT NULL, -- Promote frequently filtered fields to B-tree indexed generated columns country TEXT GENERATED ALWAYS AS (data->>'country') STORED, signup_date DATE GENERATED ALWAYS AS ((data->>'signup_date')::DATE) STORED ); CREATE INDEX idx_customers_country ON customers (country); CREATE INDEX idx_customers_signup ON customers (signup_date); -- GIN index for containment / key-existence queries CREATE INDEX idx_customers_data_gin ON customers USING GIN (data); -- Partial GIN index for large tables (index only active records) CREATE INDEX idx_customers_data_active ON customers USING GIN (data) WHERE (data->>'status') = 'active'; -- jsonpath query example (PG 17) SELECT * FROM customers WHERE data @? '$.tags[*] ? (@ == "premium")';
Step 10: Connection Pooling
Use a connection pooler in front of PostgreSQL for all production deployments:
PgBouncer (lightweight, battle-tested):
# pgbouncer.ini [databases] mydb = host=127.0.0.1 port=5432 dbname=mydb [pgbouncer] pool_mode = transaction ; transaction pooling for stateless apps max_client_conn = 1000 default_pool_size = 25 server_pool_size = 5
Supavisor (cloud-native, multi-tenant, Elixir-based):
- Designed for serverless / edge functions with thousands of short-lived connections
- Supports both session and transaction pooling modes
- Used by default in Supabase deployments; available as self-hosted
Pooling modes:
| Mode | Use Case | Notes |
|---|---|---|
| Session | Long-running connections, | 1 client = 1 server connection |
| Transaction | Stateless APIs (recommended default) | Most efficient; breaks / prepared statements |
| Statement | Rarely needed | Each statement can use a different server connection |
</execution_process>
<best_practices>
- Normalize First: Optimize later based on data
- Index Thoughtfully: Based on actual query patterns
- Use Migrations: Never modify schema directly
- Monitor Performance: Use database profiling tools
- Plan for Scale: Consider partitioning for large tables
- Upgrade to PostgreSQL 17: Benefit from new VACUUM memory manager, SQL/JSON functions, and incremental backup support
- Use pgvector for AI: Store embeddings in PostgreSQL with HNSW or IVFFlat indexes before reaching for a dedicated vector database
- JSONB at Scale: Promote hot JSONB fields to generated columns with B-tree indexes; use GIN for containment queries
- Always Pool Connections: Use PgBouncer (transaction mode) or Supavisor for all production PostgreSQL deployments
- Partition Large Tables: Apply range (time-series), list (category), or hash (multi-tenant) partitioning for tables projected to exceed server RAM
</best_practices> </instructions>
<examples> <usage_example> **Schema Design Request**:Design a schema for an e-commerce orders system
Example Response:
-- Users table CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT NOW() ); -- Orders table CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID REFERENCES users(id), status VARCHAR(20) DEFAULT 'pending', total_amount DECIMAL(10,2), created_at TIMESTAMP DEFAULT NOW() ); -- Order items (1:N relationship) CREATE TABLE order_items ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), order_id UUID REFERENCES orders(id) ON DELETE CASCADE, product_id UUID NOT NULL, quantity INTEGER NOT NULL, unit_price DECIMAL(10,2) ); -- Indexes for common queries CREATE INDEX idx_orders_user ON orders(user_id); CREATE INDEX idx_orders_status ON orders(status); CREATE INDEX idx_order_items_order ON order_items(order_id);
</usage_example> </examples>
Rules
- Always justify denormalization with performance data
- Include rollback strategy for all migrations
- Document relationships and constraints
Related Workflow
This skill has a corresponding workflow for complex multi-agent scenarios:
- Workflow:
.claude/workflows/database-architect-skill-workflow.md - When to use workflow: For comprehensive database design including requirements analysis, schema design, query optimization, migration planning, and testing (multi-phase, multi-agent)
- When to use skill directly: For quick schema reviews or single-agent database tasks
Iron Laws
- NEVER make schema changes without versioned migrations that include both UP and DOWN scripts — manual DDL in production is not recoverable.
- ALWAYS normalize to at least 3NF before considering denormalization — never prematurely optimize without measured performance evidence.
- ALWAYS plan indexes based on actual query patterns from EXPLAIN ANALYZE — never add indexes speculatively before profiling real workloads.
- NEVER test or deploy a migration without running it against production-like data first — schema issues surface under realistic volume, not on empty tables.
- ALWAYS use connection pooling (Supavisor or PgBouncer) in production — direct connections from serverless functions exhaust the database connection limit under load.
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| Manual DDL directly on production | No rollback path; breaks migration history | Always use versioned migrations with DOWN scripts |
| Premature denormalization | Adds complexity before profiling; often no measurable gain | Normalize first, denormalize only after EXPLAIN ANALYZE reveals bottleneck |
| Indexing every column | Slows writes; wastes storage; misleads query planner | Index only columns that appear in WHERE, JOIN, and ORDER BY clauses |
| Adding NOT NULL column without default | Locks entire table during migration on large datasets | Add nullable column, backfill in batches, then add NOT NULL constraint |
| Direct connections from serverless functions | Connection limit exhausted under load spike | Use PgBouncer or Supavisor in transaction-pooling mode |
Memory Protocol (MANDATORY)
Before starting:
cat .claude/context/memory/learnings.md
After completing:
- New pattern ->
.claude/context/memory/learnings.md - Issue found ->
.claude/context/memory/issues.md - Decision made ->
.claude/context/memory/decisions.md
ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.