System Design Cheat Sheet
This guide is designed to help you master the fundamental concepts of distributed systems architecture, whether you're preparing for technical interviews, architecting scalable applications, or simply expanding your understanding of large-scale systems.
🎯 What You'll Learn
System design is the art and science of building large-scale distributed systems that can handle millions of users, process massive amounts of data, and maintain high availability. This cheat sheet covers the essential building blocks that power the world's most successful applications - from social media platforms to e-commerce giants.
🎯 Table of Contents
- 📊 Database & Storage
- ⚡ Scaling & Load Management
- 🗄️ Caching Systems
- 📨 Messaging & Communication
- 🔍 Monitoring & Reliability
📊 Database & Storage
CAP Theorem
What It Means: Trade-off between Consistency, Availability, and Partition Tolerance
❌ Common Confusion: Thinking you can have all three at once
✅ How to Understand: Pick any 2: You must choose based on the business needs (e.g., banking favors Consistency)
Database Sharding
What It Means: Splitting a database into smaller parts
❌ Common Confusion: Confusing it with replication
✅ How to Understand: Sharding = scaling out data; Replication = making copies for reliability
Replication
What It Means: Copying data across nodes
❌ Common Confusion: Assuming replication improves writes
✅ How to Understand: It's great for reads and fault tolerance, but increases write complexity
Strong vs Eventual Consistency
What It Means: Strong = always up-to-date; Eventual = eventually up-to-date
❌ Common Confusion: Believing eventual is "inconsistent"
✅ How to Understand: Eventual is fine for social feeds; Strong is needed for transactions
NoSQL vs SQL
What It Means: NoSQL = flexible schema; SQL = structured tables
❌ Common Confusion: Thinking NoSQL is always better for scale
✅ How to Understand: Use SQL for structured data; NoSQL for flexibility and unstructured data
Database Indexing
What It Means: Speeds up data lookup
❌ Common Confusion: Adding too many indexes blindly
✅ How to Understand: Index the read-heavy, high-selectivity fields only
Data Partitioning
What It Means: Dividing data for performance and scalability
❌ Common Confusion: Confusing with sharding
✅ How to Understand: Sharding is a type of partitioning — often based on range, hash, or geo
Read vs Write Optimization
What It Means: Design optimized for reading or writing
❌ Common Confusion: Trying to optimize both equally
✅ How to Understand: Choose based on the system's access pattern
Data Compaction
What It Means: Merging small files/logs for efficiency
❌ Common Confusion: Not compacting leads to disk and read bloat
✅ How to Understand: Use with write-heavy systems like LSM Trees or log-structured storage
Data Deduplication
What It Means: Avoiding storing same data multiple times
❌ Common Confusion: Confusing with compression
✅ How to Understand: Deduplication saves storage, not bandwidth
Idempotency
What It Means: Same request = same result (safe to retry)
❌ Common Confusion: Ignoring it in APIs
✅ How to Understand: Critical for payment systems, retries, distributed transactions
Bloom Filter
What It Means: Space-efficient probabilistic data structure to test membership
❌ Common Confusion: Assuming 100% accuracy
✅ How to Understand: Use to reduce unnecessary DB hits; allows false positives, no false negatives
Quorum in Distributed DBs
What It Means: Minimum number of nodes to agree for read/write
❌ Common Confusion: Assuming majority is always quorum
✅ How to Understand: Tune quorum (e.g., W+R > N) for your consistency vs availability balance
⚡ Scaling & Load Management
Load Balancer
What It Means: Distributes incoming traffic across multiple servers
❌ Common Confusion: Mixing it up with API Gateway
✅ How to Understand: LB is about traffic distribution; Gateway is about routing, auth, versioning, etc.
Horizontal vs Vertical Scaling
What It Means: Horizontal = add more machines; Vertical = upgrade existing machine
❌ Common Confusion: Assuming vertical is always better
✅ How to Understand: Horizontal gives you better fault tolerance and future scalability
Rate Limiter
What It Means: Restricts number of requests per user/time
❌ Common Confusion: Thinking it's only for APIs
✅ How to Understand: Also protects from spam, abuse, and DDoS
Throttling
What It Means: Limits how many requests a user/system can make
❌ Common Confusion: Mixing it with rate limiting
✅ How to Understand: Throttling slows down; Rate limiting blocks
Failover / Redundancy
What It Means: Backup systems take over when primary fails
❌ Common Confusion: Forgetting to test failovers
✅ How to Understand: Practice chaos engineering to make sure they really work
Microservices vs Monolith
What It Means: Microservices = independent deployable units; Monolith = one big app
❌ Common Confusion: Thinking microservices = automatic scalability
✅ How to Understand: Microservices add complexity — use when needed, not blindly
Leader Election
What It Means: Picking one node to coordinate or lead
❌ Common Confusion: Not knowing when it's needed
✅ How to Understand: Use in distributed systems that need coordination (e.g., master DB node)
Service Discovery
What It Means: Locating instances of a service dynamically
❌ Common Confusion: Hardcoding IPs instead
✅ How to Understand: Use tools like Consul, Eureka, or DNS-based discovery
Consistent Hashing
What It Means: Evenly distributes load/data across nodes, minimizes rebalancing
❌ Common Confusion: Hard to implement correctly
✅ How to Understand: Great for sharding, CDN caches, and partitioning systems
Cold Start Problem
What It Means: Initial delay before app or function is ready
❌ Common Confusion: Happens with serverless apps or auto-scaling systems
✅ How to Understand: Pre-warm containers, use provisioned concurrency
🗄️ Caching Systems
CDN (Content Delivery Network)
What It Means: Caches static content closer to the user
❌ Common Confusion: Thinking CDNs work for all kinds of data
✅ How to Understand: Works best for static content like images, CSS, JS
Cache (e.g., Redis)
What It Means: In-memory store to reduce DB hits
❌ Common Confusion: Not knowing when or what to cache
✅ How to Understand: Cache frequent reads, slow queries, or expensive computations
Write Amplification
What It Means: Extra writes due to replication/indexing
❌ Common Confusion: Not considering performance impact
✅ How to Understand: Minimize by batching writes, avoiding too many indexes
Backpressure
What It Means: Controlling data flow to prevent overload
❌ Common Confusion: Ignoring it in stream processing
✅ How to Understand: Use buffering, retries, or discarding strategies
📨 Messaging & Communication
Queue (e.g., Kafka, SQS)
What It Means: Stores and processes tasks asynchronously
❌ Common Confusion: Confusing with cache
✅ How to Understand: Cache = fast reads; Queue = async processing for load decoupling
Pub/Sub
What It Means: Publishers send messages; subscribers receive them
❌ Common Confusion: Assuming it's always real-time
✅ How to Understand: It's eventual, but great for decoupling services
Session Management
What It Means: Managing user state across requests
❌ Common Confusion: Confusing cookies, tokens, sticky sessions
✅ How to Understand: Use JWT + stateless sessions for scale; Redis/session store for short-term login info
🔍 Monitoring & Reliability
Heartbeat & Health Checks
What It Means: Used to detect if services are up and running
❌ Common Confusion: Over-engineering them
✅ How to Understand: Lightweight checks are often enough — don't make them a bottleneck
Circuit Breaker
What It Means: Stops calling a failing service temporarily
❌ Common Confusion: Assuming retries solve all failures
✅ How to Understand: Circuit breaker avoids cascading failures by "tripping" open
Latency vs Throughput
What It Means: Latency = delay; Throughput = amount processed
❌ Common Confusion: Using them interchangeably
✅ How to Understand: Optimize latency for user experience; throughput for batch processing
Latency Budget
What It Means: Max allowed delay per system/component
❌ Common Confusion: Not distributing time wisely
✅ How to Understand: Divide latency among tiers (e.g., LB = 10ms, App = 50ms, DB = 30ms)
🎯 Quick Reference Guide
When to Use What?
Scenario | Best Choice | Why |
---|---|---|
High read traffic | Read replicas + Cache | Distribute load, reduce DB hits |
High write traffic | Sharding + Queue | Spread writes, async processing |
Global users | CDN + Regional DBs | Reduce latency worldwide |
Microservices | API Gateway + Service Discovery | Centralized routing, dynamic scaling |
Real-time features | WebSockets + Pub/Sub | Instant communication |
Financial transactions | ACID DB + Idempotency | Data integrity, safe retries |
Analytics workload | Data warehouse + Batch processing | Optimized for complex queries |
Mobile apps | REST API + CDN | Simple, cacheable, fast |
Common Anti-Patterns to Avoid
- ❌ Premature optimization - Don't over-engineer from day one
- ❌ Distributed monolith - Microservices that are tightly coupled
- ❌ Cache everything - Only cache what's actually accessed frequently
- ❌ Single point of failure - Always have backups and redundancy
- ❌ Ignoring monitoring - You can't fix what you can't measure
- ❌ Not testing failure scenarios - Chaos engineering is your friend
- ❌ Synchronous everything - Use async processing where possible
Key Principles to Remember
- Start simple, scale gradually - Begin with a monolith, split when needed
- Design for failure - Everything will fail eventually
- Monitor everything - Metrics, logs, and traces are essential
- Automate deployments - Manual processes don't scale
- Choose consistency model wisely - Not everything needs strong consistency
- Cache strategically - Cache hot data, not everything
- Design for your actual use case - Don't copy someone else's architecture blindly