Software_development_department design-system

Decomposes a product concept into architectural components, domain systems, data models, and integration boundaries. Use when starting system architecture or when the user mentions system design or component breakdown.

install
source · Clone the upstream repo
git clone https://github.com/tranhieutt/software_development_department
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/tranhieutt/software_development_department "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/design-system" ~/.claude/skills/tranhieutt-software-development-department-design-system && rm -rf "$T"
manifest: .claude/skills/design-system/SKILL.md
source content

System Design

Phase 1: Clarify requirements (always do this first)

Ask before designing:

  1. Scale: How many users/requests/day? Read-heavy or write-heavy?
  2. Consistency: Strong (banking) or eventual (social feed)?
  3. Availability target: 99.9% (8.7h/yr downtime) or 99.99% (52min/yr)?
  4. Latency budget: p99 < 100ms? < 1s?
  5. Geography: Single region or multi-region?

Capacity estimation shortcuts

1M users/day active → ~12 req/s avg, ~120 req/s peak (10x)
1KB per request → 1M req/day = ~1GB/day = ~365GB/year
Read:write ratio 10:1 (typical social) → optimize read path first
1 server handles ~1000 req/s (rule of thumb for I/O-bound services)

Component breakdown template

Client layer  → Web / Mobile / API consumers
CDN           → Static assets, edge caching
API Gateway   → Rate limiting, auth, routing, SSL termination
Services      → Domain-specific services (User, Order, Payment, Notification)
Cache         → Redis for hot data (sessions, rate limits, computed results)
Database      → Primary DB + Read replicas
Message queue → Async operations, event-driven decoupling
Storage       → Object storage for files (S3/GCS)
Monitoring    → Metrics, logs, traces, alerts

Database selection guide

NeedChoose
ACID transactions, relationsPostgreSQL
High-scale document storeMongoDB
Key-value, cache, pub/subRedis
Time-series dataTimescaleDB / InfluxDB
Graph relationshipsNeo4j
Full-text searchElasticsearch
Analytical/OLAPClickHouse / BigQuery

Caching strategies

Cache-aside (read):  App checks cache → miss → DB → write to cache
Write-through:        Write to cache AND DB simultaneously (consistent, slower writes)
Write-behind:         Write to cache → async flush to DB (fast writes, risk of loss)
Read-through:         Cache handles DB reads automatically

TTL guidelines:
- Sessions: 15-30 min
- User profile: 5 min
- Product catalog: 1 hour
- Config/settings: 24 hours

Message queue patterns

When to use queues:
✓ Async processing (email, PDF generation, notifications)
✓ Rate-limiting downstream services
✓ Decoupling services (order → payment → shipping)
✓ Fan-out (1 event → multiple consumers)

Queue selection:
- RabbitMQ: complex routing, request-reply, low latency
- Kafka: high throughput, event log/replay, stream processing
- SQS: managed, simple, AWS-native, at-least-once delivery
- Redis Streams: lightweight, same infra as cache

API design decisions

REST:    Standard CRUD, simple clients, team familiarity (default choice)
GraphQL: Multiple clients with different data needs, reduce over-fetching
gRPC:    Internal service-to-service, binary protocol, streaming needed
WebSocket: Real-time bidirectional (chat, live updates, collaborative tools)

Scaling patterns

Vertical (scale up):   More CPU/RAM — quick, limited ceiling
Horizontal (scale out): More instances — requires stateless services
Database read replicas: Offload read traffic (good for 80%+ read workloads)
Database sharding:      Shard by user_id, geography — last resort, complex
CQRS:                   Separate read/write models — when read/write patterns diverge heavily

Common design mistakes

MistakeBetter approach
Over-engineering for scale you don't haveStart monolith, extract services at clear pain points
Synchronous calls to all dependenciesUse async queues for non-critical paths
No caching strategyCache at API layer + DB query results
Storing sessions in DBUse Redis; DB sessions don't scale horizontally
Single point of failureRedundancy at every critical layer