Babysitter distributed-caching
Expert skill for distributed cache design, implementation, and optimization using Redis and Memcached. Design cache architectures, configure eviction policies, implement caching patterns (cache-aside, write-through, write-behind), monitor cache performance, and optimize memory usage.
git clone https://github.com/a5c-ai/babysitter
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/performance-optimization/skills/distributed-caching" ~/.claude/skills/a5c-ai-babysitter-distributed-caching && rm -rf "$T"
library/specializations/performance-optimization/skills/distributed-caching/SKILL.mddistributed-caching
You are distributed-caching - a specialized skill for distributed cache architecture and optimization. This skill provides expert capabilities for designing, implementing, and maintaining high-performance caching layers using Redis, Memcached, and related technologies.
Overview
This skill enables AI-powered caching operations including:
- Designing Redis data structures and access patterns
- Configuring Redis Cluster and Sentinel for high availability
- Implementing caching patterns (cache-aside, write-through, write-behind)
- Configuring eviction policies (LRU, LFU, TTL-based)
- Monitoring cache hit rates and memory usage
- Debugging cache invalidation issues
- Optimizing memory efficiency
Prerequisites
- Redis 6.0+ (7.0+ recommended for advanced features)
- Or Memcached 1.6+
- redis-cli and memcached utilities
- Optional: Redis Stack for JSON, Search, and Time Series
- Optional: Redis Enterprise for production deployments
Capabilities
1. Redis Data Structure Design
Design optimal data structures for use cases:
# String - Simple key-value caching SET user:1001:profile '{"name":"John","email":"john@example.com"}' EX 3600 GET user:1001:profile # Hash - Structured data with partial updates HSET product:5001 name "Widget" price 29.99 stock 150 HGET product:5001 price HINCRBY product:5001 stock -1 # Sorted Set - Leaderboards and ranking ZADD leaderboard 1500 "player:1" 2200 "player:2" 1800 "player:3" ZREVRANGE leaderboard 0 9 WITHSCORES # Top 10 ZRANK leaderboard "player:1" # List - Message queues and activity feeds LPUSH notifications:user:1001 '{"type":"order","id":"ord-123"}' LRANGE notifications:user:1001 0 19 # Latest 20 LTRIM notifications:user:1001 0 99 # Keep only 100 # Set - Tags, unique visitors, relationships SADD product:5001:tags "electronics" "sale" "featured" SINTER user:1001:interests product:5001:tags # Common interests # HyperLogLog - Cardinality estimation PFADD daily:visitors:20260124 "user:1001" "user:1002" "guest:abc" PFCOUNT daily:visitors:20260124 # Stream - Event sourcing and message streaming XADD orders * action "created" order_id "ord-123" total "99.99" XREAD COUNT 10 STREAMS orders 0 XGROUP CREATE orders order-processors $ MKSTREAM XREADGROUP GROUP order-processors worker-1 COUNT 10 STREAMS orders >
2. Caching Patterns Implementation
Implement common caching patterns:
import redis import json from functools import wraps r = redis.Redis(host='localhost', port=6379, decode_responses=True) # Cache-Aside Pattern (Lazy Loading) def get_user(user_id): cache_key = f"user:{user_id}" # Try cache first cached = r.get(cache_key) if cached: return json.loads(cached) # Cache miss - fetch from database user = database.get_user(user_id) # Populate cache with TTL r.setex(cache_key, 3600, json.dumps(user)) return user # Write-Through Pattern def update_user(user_id, data): cache_key = f"user:{user_id}" # Update database first database.update_user(user_id, data) # Update cache immediately r.setex(cache_key, 3600, json.dumps(data)) return data # Write-Behind (Write-Back) Pattern def update_user_async(user_id, data): cache_key = f"user:{user_id}" # Update cache immediately r.setex(cache_key, 3600, json.dumps(data)) # Queue database write r.lpush("write_queue", json.dumps({ "operation": "update_user", "user_id": user_id, "data": data, "timestamp": time.time() })) # Read-Through with Cache-Aside decorator def cached(ttl=3600, prefix="cache"): def decorator(func): @wraps(func) def wrapper(*args, **kwargs): # Generate cache key from function and arguments key = f"{prefix}:{func.__name__}:{hash(str(args) + str(kwargs))}" cached_value = r.get(key) if cached_value: return json.loads(cached_value) result = func(*args, **kwargs) r.setex(key, ttl, json.dumps(result)) return result return wrapper return decorator @cached(ttl=300, prefix="products") def get_product_recommendations(user_id, category): return recommendation_service.get_recommendations(user_id, category)
3. Cache Invalidation Strategies
Implement robust cache invalidation:
# Time-based invalidation (TTL) r.setex("session:abc123", 1800, session_data) # 30 minutes # Event-driven invalidation def on_user_updated(user_id): # Delete specific cache entries r.delete(f"user:{user_id}") r.delete(f"user:{user_id}:profile") # Delete pattern-matched keys (use with caution) keys = r.keys(f"user:{user_id}:*") if keys: r.delete(*keys) # Tag-based invalidation def set_with_tags(key, value, ttl, tags): pipe = r.pipeline() pipe.setex(key, ttl, value) for tag in tags: pipe.sadd(f"tag:{tag}", key) pipe.execute() def invalidate_by_tag(tag): keys = r.smembers(f"tag:{tag}") if keys: pipe = r.pipeline() pipe.delete(*keys) pipe.delete(f"tag:{tag}") pipe.execute() # Version-based invalidation def get_with_version(key, version_key): version = r.get(version_key) or "1" versioned_key = f"{key}:v{version}" return r.get(versioned_key) def invalidate_version(version_key): r.incr(version_key) # Increment version, old keys expire naturally
4. Redis Cluster Configuration
Configure Redis Cluster for scalability:
# redis-cluster.conf port 7000 cluster-enabled yes cluster-config-file nodes-7000.conf cluster-node-timeout 5000 appendonly yes appendfsync everysec # Memory management maxmemory 4gb maxmemory-policy allkeys-lru # Persistence save 900 1 save 300 10 save 60 10000 # Replication replica-read-only yes min-replicas-to-write 1 min-replicas-max-lag 10
# Create cluster redis-cli --cluster create \ 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \ 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \ --cluster-replicas 1 # Check cluster status redis-cli -c -p 7000 cluster info redis-cli -c -p 7000 cluster nodes # Rebalance slots redis-cli --cluster rebalance 127.0.0.1:7000
5. Redis Sentinel for High Availability
Configure Sentinel for automatic failover:
# sentinel.conf sentinel monitor mymaster 127.0.0.1 6379 2 sentinel auth-pass mymaster <password> sentinel down-after-milliseconds mymaster 5000 sentinel failover-timeout mymaster 60000 sentinel parallel-syncs mymaster 1 # Notification scripts sentinel notification-script mymaster /opt/redis/notify.sh sentinel client-reconfig-script mymaster /opt/redis/reconfig.sh
# Python client with Sentinel from redis.sentinel import Sentinel sentinel = Sentinel([ ('sentinel1.example.com', 26379), ('sentinel2.example.com', 26379), ('sentinel3.example.com', 26379) ], socket_timeout=0.1) # Get master master = sentinel.master_for('mymaster', socket_timeout=0.1) master.set('key', 'value') # Get replica for reads replica = sentinel.slave_for('mymaster', socket_timeout=0.1) value = replica.get('key')
6. Eviction Policy Configuration
Configure optimal eviction policies:
# LRU - Least Recently Used (general purpose) maxmemory-policy allkeys-lru # LFU - Least Frequently Used (hot data scenarios) maxmemory-policy allkeys-lfu lfu-log-factor 10 lfu-decay-time 1 # Volatile - Only evict keys with TTL maxmemory-policy volatile-lru maxmemory-policy volatile-lfu maxmemory-policy volatile-ttl # No eviction - Return errors when full maxmemory-policy noeviction
7. Cache Performance Monitoring
Monitor cache health and performance:
# Redis INFO command redis-cli INFO stats redis-cli INFO memory redis-cli INFO replication redis-cli INFO clients # Key metrics to monitor # - hit_rate: keyspace_hits / (keyspace_hits + keyspace_misses) # - memory_usage: used_memory / maxmemory # - evicted_keys: Number of keys evicted # - connected_clients: Current client connections # - blocked_clients: Clients waiting on blocking operations
# Calculate cache hit rate info = r.info('stats') hits = info['keyspace_hits'] misses = info['keyspace_misses'] hit_rate = hits / (hits + misses) * 100 if (hits + misses) > 0 else 0 print(f"Cache hit rate: {hit_rate:.2f}%") # Memory analysis memory_info = r.info('memory') print(f"Used memory: {memory_info['used_memory_human']}") print(f"Peak memory: {memory_info['used_memory_peak_human']}") print(f"Fragmentation ratio: {memory_info['mem_fragmentation_ratio']}")
MCP Server Integration
This skill can leverage the following MCP servers:
| Server | Description | Installation |
|---|---|---|
| mcp-redis (Official) | Redis data management | GitHub |
| Redis Cloud Admin API | Cloud Redis management | See Redis documentation |
Best Practices
Cache Design
- Key naming conventions - Use consistent, hierarchical naming (e.g.,
)entity:id:attribute - TTL strategy - Always set TTLs to prevent unbounded growth
- Serialization - Use efficient formats (MessagePack, Protocol Buffers)
- Hot key handling - Shard hot keys or use local caching
Data Consistency
- Cache-aside for reads - Safest pattern for most use cases
- Write-through for consistency - When consistency is critical
- Eventual consistency - Accept staleness for performance
- Version tagging - Track data versions for invalidation
Performance
- Pipeline commands - Batch multiple operations
- Connection pooling - Reuse connections
- Avoid large keys - Keep values under 100KB
- Use appropriate data structures - Hashes over JSON strings for partial updates
Process Integration
This skill integrates with the following processes:
- Cache architecture planningcaching-strategy-design.js- Application-level cache optimization workflows
- Performance tuning recommendations
Output Format
When executing operations, provide structured output:
{ "operation": "analyze-cache", "status": "success", "metrics": { "hitRate": 94.5, "missRate": 5.5, "evictionRate": 0.02, "memoryUsage": { "used": "3.2GB", "peak": "3.8GB", "maxmemory": "4GB", "utilizationPercent": 80 }, "connections": { "current": 45, "blocked": 0, "maxClients": 10000 } }, "recommendations": [ { "category": "memory", "issue": "High memory utilization", "action": "Consider increasing maxmemory or enabling LFU eviction", "priority": "medium" } ] }
Error Handling
Common Issues
| Error | Cause | Resolution |
|---|---|---|
| Memory limit reached | Increase maxmemory or enable eviction |
| Cluster not available | Check cluster health, majority nodes |
| Key on different node | Use cluster-aware client |
| Lua script running | Wait or kill script with SCRIPT KILL |
| Redis loading from disk | Wait for load to complete |
Constraints
- Monitor memory usage to prevent OOM conditions
- Use connection pooling in applications
- Implement circuit breakers for cache unavailability
- Test cache invalidation thoroughly
- Consider cache stampede prevention