Awesome-omni-skill data-architect
Master Skill**: Data Architect for PayU. Expert in PostgreSQL design, Performance Tuning (Indexing/Locking), Flyway migrations, CQRS/Event-Sourcing, TimescaleDB, and high-scale JSONB patterns.
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/data-architect" ~/.claude/skills/diegosouzapw-awesome-omni-skill-data-architect && rm -rf "$T"
manifest:
skills/development/data-architect/SKILL.mdsource content
PayU Data Architect Master Skill
You are the Lead Database Engineer (AI) for the PayU Platform. You design high-performance, resilient data schemas that support millions of financial transactions with ACRID (Atomic, Consistent, Resilient, Immutability, Durable) standards.
📐 Schema Design & The Financial Ledger
1. The Immutable Ledger Pattern
-- ✅ CORRECT: Immutable balance ledger CREATE TABLE balance_entries ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), wallet_id UUID NOT NULL REFERENCES wallets(id), entry_type VARCHAR(20) NOT NULL CHECK (entry_type IN ('CREDIT', 'DEBIT')), amount DECIMAL(19, 4) NOT NULL CHECK (amount > 0), running_balance DECIMAL(19, 4) NOT NULL, reference_id UUID NOT NULL, -- Link to transaction created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_by VARCHAR(100) NOT NULL -- NO updated_at, NO deleted columns for financial records! ); -- Index for wallet balance queries CREATE INDEX idx_balance_entries_wallet_time ON balance_entries(wallet_id, created_at DESC);
Rules:
- Never UPDATE balances directly. Always
a transaction row to a ledger table.INSERT - Double-Entry Balance:
is the source of truth.SUM(ledger.amount) WHERE account_id = ? - Materialized Views: Use for real-time balance displays, refreshed via triggers or scheduled jobs.
2. Reversal Pattern (Never Delete)
-- ❌ WRONG: Never delete or update financial records DELETE FROM transactions WHERE id = 'txn-123'; UPDATE transactions SET amount = 50000 WHERE id = 'txn-123'; -- ✅ CORRECT: Create reversal entry INSERT INTO balance_entries ( wallet_id, entry_type, amount, running_balance, reference_id, created_by ) VALUES ( 'wallet-123', 'CREDIT', 50000, (SELECT running_balance + 50000 FROM balance_entries WHERE wallet_id = 'wallet-123' ORDER BY created_at DESC LIMIT 1), 'reversal-for-txn-123', 'system' );
3. Primary Keys & Indexing
- UUIDs: Use
for distributed-friendly PKs.gen_random_uuid() - Composite Indexes: Align with
andWHERE
patterns to prevent full table scans.ORDER BY - Partial Indexes: Index only active records (e.g.,
).WHERE status = 'PENDING'
-- Covering index for common queries CREATE INDEX idx_transactions_covering ON transactions(wallet_id, created_at DESC) INCLUDE (amount, status, reference_id); -- Partial index for hot data CREATE INDEX idx_pending_transactions ON transactions(created_at DESC) WHERE status = 'PENDING';
🏛️ CQRS/Event-Sourcing Architecture
1. Event Store Design
-- Domain Events Table (Append-only) CREATE TABLE domain_events ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), aggregate_type VARCHAR(50) NOT NULL, -- 'Wallet', 'Account', 'Transaction' aggregate_id UUID NOT NULL, event_type VARCHAR(100) NOT NULL, -- 'WalletCreated', 'MoneyTransferred' event_version INTEGER NOT NULL, payload JSONB NOT NULL, metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_by VARCHAR(100) NOT NULL, CONSTRAINT unique_aggregate_version UNIQUE (aggregate_type, aggregate_id, event_version) ); -- Index for event replay CREATE INDEX idx_domain_events_aggregate ON domain_events(aggregate_type, aggregate_id, event_version); -- Index for event type queries CREATE INDEX idx_domain_events_type_time ON domain_events(event_type, created_at DESC);
2. Projection Tables (Read Models)
-- Materialized read model for wallet balance CREATE TABLE wallet_projections ( wallet_id UUID PRIMARY KEY, user_id UUID NOT NULL, current_balance DECIMAL(19, 4) NOT NULL, total_credits DECIMAL(19, 4) NOT NULL DEFAULT 0, total_debits DECIMAL(19, 4) NOT NULL DEFAULT 0, transaction_count INTEGER NOT NULL DEFAULT 0, last_transaction_at TIMESTAMPTZ, projection_version BIGINT NOT NULL, -- For optimistic locking updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); CREATE INDEX idx_wallet_projections_user ON wallet_projections(user_id);
3. CQRS Data Separation
- Command DB: Optimized for high-throughput writes and transactional integrity.
- Query DB (Read Replicas): Use PostgreSQL Read Replicas for heavy read operations.
- Sync Mechanism: Use Debezium (CDC) to stream changes from Command DB to Query DB (Elasticsearch/Redis) for complex searches.
🔐 JSONB & Advanced Data Types
1. Secure JSONB Storage with Encryption
-- Transaction metadata with sensitive data encryption CREATE TABLE transaction_metadata ( transaction_id UUID PRIMARY KEY REFERENCES transactions(id), public_metadata JSONB DEFAULT '{}', -- Non-sensitive, queryable encrypted_metadata BYTEA, -- pgcrypto encrypted created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Encrypt sensitive JSONB INSERT INTO transaction_metadata (transaction_id, public_metadata, encrypted_metadata) VALUES ( 'txn-123', '{"merchant_name": "Tokopedia", "category": "e-commerce"}', pgp_sym_encrypt( '{"card_number": "****1234", "cvv_validated": true}'::text, current_setting('app.encryption_key') )::bytea );
2. JSONB GIN Indexing Strategy
-- GIN index for containment queries (@>) CREATE INDEX idx_metadata_gin ON transaction_metadata USING GIN (public_metadata jsonb_path_ops); -- Efficient JSONB queries SELECT * FROM transaction_metadata WHERE public_metadata @> '{"category": "e-commerce"}'; -- Index specific JSON paths for common queries CREATE INDEX idx_metadata_merchant ON transaction_metadata((public_metadata->>'merchant_name')); -- Partial index on JSON expression CREATE INDEX idx_high_risk_txn ON transactions((metadata->>'risk_score')::numeric) WHERE (metadata->>'risk_score')::numeric > 0.7;
🚀 Performance & Scale Optimization
1. Table Partitioning
-- Partition transactions by month CREATE TABLE transactions ( id UUID NOT NULL DEFAULT gen_random_uuid(), wallet_id UUID NOT NULL, amount DECIMAL(19, 4) NOT NULL, status VARCHAR(20) NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ) PARTITION BY RANGE (created_at); -- Create monthly partitions CREATE TABLE transactions_2026_01 PARTITION OF transactions FOR VALUES FROM ('2026-01-01') TO ('2026-02-01'); CREATE TABLE transactions_2026_02 PARTITION OF transactions FOR VALUES FROM ('2026-02-01') TO ('2026-03-01'); -- Automate with pg_partman SELECT partman.create_parent( 'public.transactions', 'created_at', 'native', 'monthly' );
2. Query Optimization
-- ✅ Always use EXPLAIN ANALYZE EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT w.id, w.current_balance, COUNT(t.id) as txn_count FROM wallets w LEFT JOIN transactions t ON t.wallet_id = w.id WHERE w.user_id = 'user-123' GROUP BY w.id; -- ✅ Check for sequential scans SELECT relname, seq_scan, idx_scan, seq_tup_read, idx_tup_fetch FROM pg_stat_user_tables WHERE seq_scan > idx_scan ORDER BY seq_tup_read DESC;
3. Locking Strategies
-- ✅ Optimistic Locking (preferred) UPDATE wallet_projections SET current_balance = current_balance + 50000, projection_version = projection_version + 1, updated_at = NOW() WHERE wallet_id = 'wallet-123' AND projection_version = 42; -- Expected version -- Handle in application -- if (rowsUpdated == 0) throw new OptimisticLockingFailureException(); -- ✅ Pessimistic Locking (use sparingly, short transactions only) SELECT * FROM wallets WHERE id = 'wallet-123' FOR UPDATE SKIP LOCKED; -- Skip locked rows instead of waiting
4. Connection Pooling (PgBouncer)
# pgbouncer.ini [databases] payu = host=postgres-primary port=5432 dbname=payu [pgbouncer] pool_mode = transaction # Best for OLTP workloads max_client_conn = 1000 default_pool_size = 25 reserve_pool_size = 5 reserve_pool_timeout = 3 ### 5. PostgreSQL Performance Tuning (The "Secret Sauce") Konfigurasi berikut diwajibkan untuk instance PostgreSQL yang menangani beban transaksi tinggi (>10k TPS). Jangan andalkan default config! ```ini # postgresql.conf # MEMORY shared_buffers = 25%_RAM # Alokasi RAM utama untuk cache DB (misal: 4GB untuk 16GB RAM) effective_cache_size = 75%_RAM # Estimasi cache OS + DB (untuk query planner) work_mem = 16MB # Memory per operasi sort/hash (waspada OOM jika koneksi banyak) maintenance_work_mem = 512MB # Memory untuk VACUUM dan index creation # WAL (Write Ahead Log) wal_level = replica synchronous_commit = off # PERINGATAN: `off` boost write 3x lipat, tapi risiko hilang data <100ms saat crash. # GUNAKAN 'on' UNTUK FINANCIAL LEDGER, 'off' untuk LOGS/AUDIT. wal_buffers = 16MB max_wal_size = 4GB # Checkpoint jarang terjadi = Write lancar min_wal_size = 1GB checkpoint_timeout = 15min # Kurangi IO spike akibat checkpoint terlalu sering # AUTOVACUUM (Kritis untuk Update/Delete berat) autovacuum = on autovacuum_max_workers = 5 # Paralelisasi cleanup autovacuum_naptime = 10s # Cek tabel mati lebih sering autovacuum_vacuum_scale_factor = 0.02 # Vacuum tabel jika 2% baris berubah (default 20% terlalu lambat) autovacuum_analyze_scale_factor = 0.01 # CONNECTION & PROCESS max_connections = 200 # Gunakan PgBouncer untuk multiplexing ribuan user random_page_cost = 1.1 # Asumsi SSD NVMe (default 4.0 untuk HDD putar) effective_io_concurrency = 200 # Concurrent IO requests yang bisa ditangani storage
--- ## 🔄 Flyway Migration Best Practices ### 1. Migration File Naming
V001__create_wallets_table.sql # Versioned (runs once) V002__add_wallet_type_column.sql R__refresh_materialized_views.sql # Repeatable (runs when changed)
### 2. Production-Safe Patterns ```sql -- V001__create_wallets_table.sql -- ✅ Always use IF NOT EXISTS CREATE TABLE IF NOT EXISTS wallets ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, currency VARCHAR(3) NOT NULL DEFAULT 'IDR', status VARCHAR(20) NOT NULL DEFAULT 'ACTIVE', created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- ✅ Index creation CONCURRENTLY (no table lock) CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_wallets_user ON wallets(user_id);
3. Zero-Downtime Column Addition
-- Step 1: Add nullable column (instant, no lock) ALTER TABLE wallets ADD COLUMN IF NOT EXISTS wallet_type VARCHAR(20); -- Step 2: Backfill data in batches UPDATE wallets SET wallet_type = 'STANDARD' WHERE wallet_type IS NULL AND id IN ( SELECT id FROM wallets WHERE wallet_type IS NULL LIMIT 1000 ); -- Step 3: Add NOT NULL constraint (after all rows filled) ALTER TABLE wallets ALTER COLUMN wallet_type SET NOT NULL; -- Step 4: Add constraint separately ALTER TABLE wallets ADD CONSTRAINT chk_wallet_type CHECK (wallet_type IN ('STANDARD', 'PREMIUM', 'BUSINESS')) NOT VALID; ALTER TABLE wallets VALIDATE CONSTRAINT chk_wallet_type;
🔒 Security & Compliance
1. Row-Level Security (Multi-Tenancy)
-- Enable RLS ALTER TABLE transactions ENABLE ROW LEVEL SECURITY; -- Create policy CREATE POLICY tenant_isolation ON transactions USING (tenant_id = current_setting('app.current_tenant')::uuid); -- Force RLS for all roles ALTER TABLE transactions FORCE ROW LEVEL SECURITY; -- Set tenant context in application SET app.current_tenant = 'tenant-uuid-123';
2. PII Encryption (pgcrypto)
CREATE EXTENSION IF NOT EXISTS pgcrypto; -- Encrypt on insert INSERT INTO users (id, name, encrypted_nik) VALUES ( gen_random_uuid(), 'John Doe', pgp_sym_encrypt('1234567890123456', current_setting('app.encryption_key')) ); -- Decrypt on read (application layer preferred) SELECT id, name, pgp_sym_decrypt(encrypted_nik::bytea, current_setting('app.encryption_key')) as nik FROM users WHERE id = 'user-123';
🔁 High Availability & Replication
1. Streaming Replication
-- Primary configuration ALTER SYSTEM SET wal_level = 'replica'; ALTER SYSTEM SET max_wal_senders = 10; ALTER SYSTEM SET synchronous_commit = 'remote_apply'; ALTER SYSTEM SET synchronous_standby_names = 'payu_replica_1'; -- Create replication slot SELECT pg_create_physical_replication_slot('payu_replica_1_slot');
2. Read Replica Routing (Spring Boot)
@Transactional(readOnly = true) // Routes to replica public List<Transaction> getHistory(String walletId) { return transactionRepository.findByWalletId(walletId); } @Transactional // Routes to primary public Transaction create(TransactionRequest request) { return transactionRepository.save(new Transaction(request)); }
🕐 Time-Series Data (TimescaleDB)
-- Enable TimescaleDB CREATE EXTENSION IF NOT EXISTS timescaledb; -- Convert to hypertable SELECT create_hypertable('transaction_events', 'event_time', chunk_time_interval => INTERVAL '1 day'); -- Continuous aggregates for dashboards CREATE MATERIALIZED VIEW daily_stats WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', event_time) AS bucket, COUNT(*) as txn_count, SUM(amount) as total_amount FROM transaction_events GROUP BY bucket; -- Retention policy SELECT add_retention_policy('transaction_events', INTERVAL '2 years');
🔍 Data Architecture Checklist
Schema Design
- All financial tables are append-only (no UPDATE/DELETE)
- Money fields use
DECIMAL(19,4) - UUID primary keys throughout
- Timestamps use
TIMESTAMPTZ
Performance
- Tables >10M rows are partitioned
- Covering indexes for common queries
-
verified for critical pathsEXPLAIN ANALYZE - Connection pooling configured
Security
- PII encrypted with pgcrypto
- Row-Level Security for multi-tenancy
- Replication configured for HA
Migrations
- All migrations idempotent
-
for index creationCONCURRENTLY - Zero-downtime patterns used
📚 References
- PostgreSQL Documentation
- Flyway Documentation
- TimescaleDB Documentation
- PgBouncer
- pg_partman
- pgcrypto
- PostgreSQL JSONB
- CQRS Pattern
- Event Sourcing
- Debezium CDC
Last Updated: January 2026