System Design Cheat Sheet

This guide is designed to help you master the fundamental concepts of distributed systems architecture, whether you're preparing for technical interviews, architecting scalable applications, or simply expanding your understanding of large-scale systems.

🎯 What You'll Learn

System design is the art and science of building large-scale distributed systems that can handle millions of users, process massive amounts of data, and maintain high availability. This cheat sheet covers the essential building blocks that power the world's most successful applications - from social media platforms to e-commerce giants.

📊 Database & Storage

CAP Theorem

What It Means: Trade-off between Consistency, Availability, and Partition Tolerance

❌ Common Confusion: Thinking you can have all three at once

✅ How to Understand: Pick any 2: You must choose based on the business needs (e.g., banking favors Consistency)

Database Sharding

What It Means: Splitting a database into smaller parts

❌ Common Confusion: Confusing it with replication

✅ How to Understand: Sharding = scaling out data; Replication = making copies for reliability

Replication

What It Means: Copying data across nodes

❌ Common Confusion: Assuming replication improves writes

✅ How to Understand: It's great for reads and fault tolerance, but increases write complexity

Strong vs Eventual Consistency

What It Means: Strong = always up-to-date; Eventual = eventually up-to-date

❌ Common Confusion: Believing eventual is "inconsistent"

✅ How to Understand: Eventual is fine for social feeds; Strong is needed for transactions

NoSQL vs SQL

What It Means: NoSQL = flexible schema; SQL = structured tables

❌ Common Confusion: Thinking NoSQL is always better for scale

✅ How to Understand: Use SQL for structured data; NoSQL for flexibility and unstructured data

Database Indexing

What It Means: Speeds up data lookup

❌ Common Confusion: Adding too many indexes blindly

✅ How to Understand: Index the read-heavy, high-selectivity fields only

Data Partitioning

What It Means: Dividing data for performance and scalability

❌ Common Confusion: Confusing with sharding

✅ How to Understand: Sharding is a type of partitioning — often based on range, hash, or geo

Read vs Write Optimization

What It Means: Design optimized for reading or writing

❌ Common Confusion: Trying to optimize both equally

✅ How to Understand: Choose based on the system's access pattern

Data Compaction

What It Means: Merging small files/logs for efficiency

❌ Common Confusion: Not compacting leads to disk and read bloat

✅ How to Understand: Use with write-heavy systems like LSM Trees or log-structured storage

Data Deduplication

What It Means: Avoiding storing same data multiple times

❌ Common Confusion: Confusing with compression

✅ How to Understand: Deduplication saves storage, not bandwidth

Idempotency

What It Means: Same request = same result (safe to retry)

❌ Common Confusion: Ignoring it in APIs

✅ How to Understand: Critical for payment systems, retries, distributed transactions

Bloom Filter

What It Means: Space-efficient probabilistic data structure to test membership

❌ Common Confusion: Assuming 100% accuracy

✅ How to Understand: Use to reduce unnecessary DB hits; allows false positives, no false negatives

Quorum in Distributed DBs

What It Means: Minimum number of nodes to agree for read/write

❌ Common Confusion: Assuming majority is always quorum

✅ How to Understand: Tune quorum (e.g., W+R > N) for your consistency vs availability balance

⚡ Scaling & Load Management

Load Balancer

What It Means: Distributes incoming traffic across multiple servers

❌ Common Confusion: Mixing it up with API Gateway

✅ How to Understand: LB is about traffic distribution; Gateway is about routing, auth, versioning, etc.

Horizontal vs Vertical Scaling

What It Means: Horizontal = add more machines; Vertical = upgrade existing machine

❌ Common Confusion: Assuming vertical is always better

✅ How to Understand: Horizontal gives you better fault tolerance and future scalability

Rate Limiter

What It Means: Restricts number of requests per user/time

❌ Common Confusion: Thinking it's only for APIs

✅ How to Understand: Also protects from spam, abuse, and DDoS

Throttling

What It Means: Limits how many requests a user/system can make

❌ Common Confusion: Mixing it with rate limiting

✅ How to Understand: Throttling slows down; Rate limiting blocks

Failover / Redundancy

What It Means: Backup systems take over when primary fails

❌ Common Confusion: Forgetting to test failovers

✅ How to Understand: Practice chaos engineering to make sure they really work

Microservices vs Monolith

What It Means: Microservices = independent deployable units; Monolith = one big app

❌ Common Confusion: Thinking microservices = automatic scalability

✅ How to Understand: Microservices add complexity — use when needed, not blindly

Leader Election

What It Means: Picking one node to coordinate or lead

❌ Common Confusion: Not knowing when it's needed

✅ How to Understand: Use in distributed systems that need coordination (e.g., master DB node)

Service Discovery

What It Means: Locating instances of a service dynamically

❌ Common Confusion: Hardcoding IPs instead

✅ How to Understand: Use tools like Consul, Eureka, or DNS-based discovery

Consistent Hashing

What It Means: Evenly distributes load/data across nodes, minimizes rebalancing

❌ Common Confusion: Hard to implement correctly

✅ How to Understand: Great for sharding, CDN caches, and partitioning systems

Cold Start Problem

What It Means: Initial delay before app or function is ready

❌ Common Confusion: Happens with serverless apps or auto-scaling systems

✅ How to Understand: Pre-warm containers, use provisioned concurrency

🗄️ Caching Systems

CDN (Content Delivery Network)

What It Means: Caches static content closer to the user

❌ Common Confusion: Thinking CDNs work for all kinds of data

✅ How to Understand: Works best for static content like images, CSS, JS

Cache (e.g., Redis)

What It Means: In-memory store to reduce DB hits

❌ Common Confusion: Not knowing when or what to cache

✅ How to Understand: Cache frequent reads, slow queries, or expensive computations

Write Amplification

What It Means: Extra writes due to replication/indexing

❌ Common Confusion: Not considering performance impact

✅ How to Understand: Minimize by batching writes, avoiding too many indexes

Backpressure

What It Means: Controlling data flow to prevent overload

❌ Common Confusion: Ignoring it in stream processing

✅ How to Understand: Use buffering, retries, or discarding strategies

📨 Messaging & Communication

Queue (e.g., Kafka, SQS)

What It Means: Stores and processes tasks asynchronously

❌ Common Confusion: Confusing with cache

✅ How to Understand: Cache = fast reads; Queue = async processing for load decoupling

Pub/Sub

What It Means: Publishers send messages; subscribers receive them

❌ Common Confusion: Assuming it's always real-time

✅ How to Understand: It's eventual, but great for decoupling services

Session Management

What It Means: Managing user state across requests

❌ Common Confusion: Confusing cookies, tokens, sticky sessions

✅ How to Understand: Use JWT + stateless sessions for scale; Redis/session store for short-term login info

🔍 Monitoring & Reliability

Heartbeat & Health Checks

What It Means: Used to detect if services are up and running

❌ Common Confusion: Over-engineering them

✅ How to Understand: Lightweight checks are often enough — don't make them a bottleneck

Circuit Breaker

What It Means: Stops calling a failing service temporarily

❌ Common Confusion: Assuming retries solve all failures

✅ How to Understand: Circuit breaker avoids cascading failures by "tripping" open

Latency vs Throughput

What It Means: Latency = delay; Throughput = amount processed

❌ Common Confusion: Using them interchangeably

✅ How to Understand: Optimize latency for user experience; throughput for batch processing

Latency Budget

What It Means: Max allowed delay per system/component

❌ Common Confusion: Not distributing time wisely

✅ How to Understand: Divide latency among tiers (e.g., LB = 10ms, App = 50ms, DB = 30ms)

🎯 Quick Reference Guide

When to Use What?

Scenario	Best Choice	Why
High read traffic	Read replicas + Cache	Distribute load, reduce DB hits
High write traffic	Sharding + Queue	Spread writes, async processing
Global users	CDN + Regional DBs	Reduce latency worldwide
Microservices	API Gateway + Service Discovery	Centralized routing, dynamic scaling
Real-time features	WebSockets + Pub/Sub	Instant communication
Financial transactions	ACID DB + Idempotency	Data integrity, safe retries
Analytics workload	Data warehouse + Batch processing	Optimized for complex queries
Mobile apps	REST API + CDN	Simple, cacheable, fast

Common Anti-Patterns to Avoid

❌ Premature optimization - Don't over-engineer from day one
❌ Distributed monolith - Microservices that are tightly coupled
❌ Cache everything - Only cache what's actually accessed frequently
❌ Single point of failure - Always have backups and redundancy
❌ Ignoring monitoring - You can't fix what you can't measure
❌ Not testing failure scenarios - Chaos engineering is your friend
❌ Synchronous everything - Use async processing where possible

Key Principles to Remember

Start simple, scale gradually - Begin with a monolith, split when needed
Design for failure - Everything will fail eventually
Monitor everything - Metrics, logs, and traces are essential
Automate deployments - Manual processes don't scale
Choose consistency model wisely - Not everything needs strong consistency
Cache strategically - Cache hot data, not everything
Design for your actual use case - Don't copy someone else's architecture blindly

🎯 What You'll Learn​

🎯 Table of Contents​

📊 Database & Storage​

CAP Theorem​

Database Sharding​

Replication​

Strong vs Eventual Consistency​

NoSQL vs SQL​

Database Indexing​

Data Partitioning​

Read vs Write Optimization​

Data Compaction​

Data Deduplication​

Idempotency​

Bloom Filter​

Quorum in Distributed DBs​

⚡ Scaling & Load Management​

Load Balancer​

Horizontal vs Vertical Scaling​

Rate Limiter​

Throttling​

Failover / Redundancy​

Microservices vs Monolith​

Leader Election​

Service Discovery​

Consistent Hashing​

Cold Start Problem​

🗄️ Caching Systems​

CDN (Content Delivery Network)​

Cache (e.g., Redis)​

Write Amplification​

Backpressure​

📨 Messaging & Communication​

Queue (e.g., Kafka, SQS)​

Pub/Sub​

Session Management​

🔍 Monitoring & Reliability​

Heartbeat & Health Checks​

Circuit Breaker​

Latency vs Throughput​

Latency Budget​

🎯 Quick Reference Guide​

When to Use What?​

Common Anti-Patterns to Avoid​

Key Principles to Remember​