Claude-skill-registry data-systems-architecture
Use when designing databases for data-heavy applications, making schema decisions for performance, choosing between normalization and denormalization, selecting storage/indexing strategies, planning for scale, or evaluating OLTP vs OLAP trade-offs. Also use when encountering N+1 queries, ORM issues, or concurrency problems.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-systems-architecture" ~/.claude/skills/majiayu000-claude-skill-registry-data-systems-architecture && rm -rf "$T"
manifest:
skills/data/data-systems-architecture/SKILL.mdsource content
Data Systems Architecture
Overview
Core principle: Good data system architecture balances reliability (correct operation under faults), scalability (handling growth gracefully), and maintainability (enabling productive change over time). Every architectural decision involves trade-offs between these concerns.
This skill synthesizes knowledge from three foundational texts:
- Designing Data-Intensive Applications (Kleppmann) - distributed systems, storage engines, scaling
- The Art of PostgreSQL (Fontaine) - PostgreSQL-specific patterns, SQL as programming
- PostgreSQL Query Optimization (Dombrovskaya et al.) - execution plans, performance tuning
When to Use
| Symptom | Start With |
|---|---|
| Designing a new database/schema | |
| Normalization vs denormalization decisions | |
| Need to understand OLTP vs OLAP | |
| Slow queries, index selection | |
| Planning for growth, read replicas | |
| Race conditions, deadlocks, isolation issues | |
| N+1 queries, ORM problems, application integration | |
Navigation
Reference Files (Load as needed)
01-foundational-principles.md - Reliability/Scalability/Maintainability, load parameters 02-data-modeling.md - Normalization, denormalization, schema design patterns 03-storage-engines.md - B-trees, LSM-trees, OLTP vs OLAP, PostgreSQL internals 04-indexing.md - Index types, compound indexes, covering indexes, maintenance 05-scaling-patterns.md - Replication, partitioning, sharding strategies 06-transactions-concurrency.md - ACID, isolation levels, MVCC, locking patterns 07-application-integration.md - ORM pitfalls, N+1, business logic placement, batch processing
Quick Decision Framework
New system design? ├─ Yes → Read 01, then 02 for data model └─ No → What's the problem? ├─ "Queries are slow" → Read 04 (indexing) + 03 (storage patterns) ├─ "Data is inconsistent" → Read 02 (modeling) + 06 (transactions) ├─ "Can't handle the load" → Read 05 (scaling) + 03 (OLTP vs OLAP) ├─ "App makes too many queries" → Read 07 (N+1, ORM patterns) └─ "Race conditions/deadlocks" → Read 06 (concurrency)
Core Concepts (Quick Reference)
The Three Pillars
| Concern | Definition | Key Question |
|---|---|---|
| Reliability | System works correctly under faults | What happens when things fail? |
| Scalability | Handles growth gracefully | What's 10x load look like? |
| Maintainability | Easy to operate and evolve | Can new engineers understand this? |
Data Model Selection
| Model | Best For | Avoid When |
|---|---|---|
| Relational | Many-to-many relationships, joins, consistency | Highly hierarchical data, constant schema changes |
| Document | Self-contained docs, tree structures | Need for joins, many-to-many |
| Graph | Highly connected data, recursive queries | Simple CRUD, no relationship traversal |
OLTP vs OLAP
| Aspect | OLTP | OLAP |
|---|---|---|
| Query pattern | Point lookups, few rows | Aggregates, many rows |
| Optimization | Index everything used in WHERE | Fewer indexes, full scans OK |
| Storage | Row-oriented | Consider column-oriented |
Index Type Quick Reference
| Type | Use Case | PostgreSQL |
|---|---|---|
| B-tree | Equality, range, sorting | Default, most queries |
| Hash | Equality only | Faster for exact match |
| GIN | Arrays, JSONB, full-text | , operators |
| GiST | Geometric, range types | PostGIS, nearest-neighbor |
| BRIN | Large, naturally ordered tables | Time-series data |
Isolation Levels
| Level | Prevents | PostgreSQL Default? |
|---|---|---|
| Read Committed | Dirty reads | Yes |
| Repeatable Read | + Non-repeatable reads | No |
| Serializable | All anomalies | No (uses SSI) |
Design Checklist
Before finalizing a data architecture:
- Identified load parameters (read/write ratio, data volume, latency requirements)
- Chose appropriate data model (relational/document/graph hybrid?)
- Normalized to 3NF first, denormalized only with measured justification
- Designed indexes for actual query patterns (not hypothetical)
- Considered 10x growth scenario
- Established isolation level requirements
- Defined where business logic lives (app vs DB vs both)
- Planned for operations (backups, monitoring, migrations)
References
- Kleppmann, M. Designing Data-Intensive Applications (O'Reilly, 2017)
- Fontaine, D. The Art of PostgreSQL (2nd ed., 2020)
- Dombrovskaya, H., Novikov, B., Bailliekova, A. PostgreSQL Query Optimization (Apress, 2021)