Skip to main content

System Design Cheat Sheet

This guide is designed to help you master the fundamental concepts of distributed systems architecture, whether you're preparing for technical interviews, architecting scalable applications, or simply expanding your understanding of large-scale systems.

🎯 What You'll Learn

System design is the art and science of building large-scale distributed systems that can handle millions of users, process massive amounts of data, and maintain high availability. This cheat sheet covers the essential building blocks that power the world's most successful applications - from social media platforms to e-commerce giants.


🎯 Table of Contents


📊 Database & Storage

CAP Theorem

What It Means: Trade-off between Consistency, Availability, and Partition Tolerance

❌ Common Confusion: Thinking you can have all three at once

✅ How to Understand: Pick any 2: You must choose based on the business needs (e.g., banking favors Consistency)


Database Sharding

What It Means: Splitting a database into smaller parts

❌ Common Confusion: Confusing it with replication

✅ How to Understand: Sharding = scaling out data; Replication = making copies for reliability


Replication

What It Means: Copying data across nodes

❌ Common Confusion: Assuming replication improves writes

✅ How to Understand: It's great for reads and fault tolerance, but increases write complexity


Strong vs Eventual Consistency

What It Means: Strong = always up-to-date; Eventual = eventually up-to-date

❌ Common Confusion: Believing eventual is "inconsistent"

✅ How to Understand: Eventual is fine for social feeds; Strong is needed for transactions


NoSQL vs SQL

What It Means: NoSQL = flexible schema; SQL = structured tables

❌ Common Confusion: Thinking NoSQL is always better for scale

✅ How to Understand: Use SQL for structured data; NoSQL for flexibility and unstructured data


Database Indexing

What It Means: Speeds up data lookup

❌ Common Confusion: Adding too many indexes blindly

✅ How to Understand: Index the read-heavy, high-selectivity fields only


Data Partitioning

What It Means: Dividing data for performance and scalability

❌ Common Confusion: Confusing with sharding

✅ How to Understand: Sharding is a type of partitioning — often based on range, hash, or geo


Read vs Write Optimization

What It Means: Design optimized for reading or writing

❌ Common Confusion: Trying to optimize both equally

✅ How to Understand: Choose based on the system's access pattern


Data Compaction

What It Means: Merging small files/logs for efficiency

❌ Common Confusion: Not compacting leads to disk and read bloat

✅ How to Understand: Use with write-heavy systems like LSM Trees or log-structured storage


Data Deduplication

What It Means: Avoiding storing same data multiple times

❌ Common Confusion: Confusing with compression

✅ How to Understand: Deduplication saves storage, not bandwidth


Idempotency

What It Means: Same request = same result (safe to retry)

❌ Common Confusion: Ignoring it in APIs

✅ How to Understand: Critical for payment systems, retries, distributed transactions


Bloom Filter

What It Means: Space-efficient probabilistic data structure to test membership

❌ Common Confusion: Assuming 100% accuracy

✅ How to Understand: Use to reduce unnecessary DB hits; allows false positives, no false negatives


Quorum in Distributed DBs

What It Means: Minimum number of nodes to agree for read/write

❌ Common Confusion: Assuming majority is always quorum

✅ How to Understand: Tune quorum (e.g., W+R > N) for your consistency vs availability balance


⚡ Scaling & Load Management

Load Balancer

What It Means: Distributes incoming traffic across multiple servers

❌ Common Confusion: Mixing it up with API Gateway

✅ How to Understand: LB is about traffic distribution; Gateway is about routing, auth, versioning, etc.


Horizontal vs Vertical Scaling

What It Means: Horizontal = add more machines; Vertical = upgrade existing machine

❌ Common Confusion: Assuming vertical is always better

✅ How to Understand: Horizontal gives you better fault tolerance and future scalability


Rate Limiter

What It Means: Restricts number of requests per user/time

❌ Common Confusion: Thinking it's only for APIs

✅ How to Understand: Also protects from spam, abuse, and DDoS


Throttling

What It Means: Limits how many requests a user/system can make

❌ Common Confusion: Mixing it with rate limiting

✅ How to Understand: Throttling slows down; Rate limiting blocks


Failover / Redundancy

What It Means: Backup systems take over when primary fails

❌ Common Confusion: Forgetting to test failovers

✅ How to Understand: Practice chaos engineering to make sure they really work


Microservices vs Monolith

What It Means: Microservices = independent deployable units; Monolith = one big app

❌ Common Confusion: Thinking microservices = automatic scalability

✅ How to Understand: Microservices add complexity — use when needed, not blindly


Leader Election

What It Means: Picking one node to coordinate or lead

❌ Common Confusion: Not knowing when it's needed

✅ How to Understand: Use in distributed systems that need coordination (e.g., master DB node)


Service Discovery

What It Means: Locating instances of a service dynamically

❌ Common Confusion: Hardcoding IPs instead

✅ How to Understand: Use tools like Consul, Eureka, or DNS-based discovery


Consistent Hashing

What It Means: Evenly distributes load/data across nodes, minimizes rebalancing

❌ Common Confusion: Hard to implement correctly

✅ How to Understand: Great for sharding, CDN caches, and partitioning systems


Cold Start Problem

What It Means: Initial delay before app or function is ready

❌ Common Confusion: Happens with serverless apps or auto-scaling systems

✅ How to Understand: Pre-warm containers, use provisioned concurrency


🗄️ Caching Systems

CDN (Content Delivery Network)

What It Means: Caches static content closer to the user

❌ Common Confusion: Thinking CDNs work for all kinds of data

✅ How to Understand: Works best for static content like images, CSS, JS


Cache (e.g., Redis)

What It Means: In-memory store to reduce DB hits

❌ Common Confusion: Not knowing when or what to cache

✅ How to Understand: Cache frequent reads, slow queries, or expensive computations


Write Amplification

What It Means: Extra writes due to replication/indexing

❌ Common Confusion: Not considering performance impact

✅ How to Understand: Minimize by batching writes, avoiding too many indexes


Backpressure

What It Means: Controlling data flow to prevent overload

❌ Common Confusion: Ignoring it in stream processing

✅ How to Understand: Use buffering, retries, or discarding strategies


📨 Messaging & Communication

Queue (e.g., Kafka, SQS)

What It Means: Stores and processes tasks asynchronously

❌ Common Confusion: Confusing with cache

✅ How to Understand: Cache = fast reads; Queue = async processing for load decoupling


Pub/Sub

What It Means: Publishers send messages; subscribers receive them

❌ Common Confusion: Assuming it's always real-time

✅ How to Understand: It's eventual, but great for decoupling services


Session Management

What It Means: Managing user state across requests

❌ Common Confusion: Confusing cookies, tokens, sticky sessions

✅ How to Understand: Use JWT + stateless sessions for scale; Redis/session store for short-term login info


🔍 Monitoring & Reliability

Heartbeat & Health Checks

What It Means: Used to detect if services are up and running

❌ Common Confusion: Over-engineering them

✅ How to Understand: Lightweight checks are often enough — don't make them a bottleneck


Circuit Breaker

What It Means: Stops calling a failing service temporarily

❌ Common Confusion: Assuming retries solve all failures

✅ How to Understand: Circuit breaker avoids cascading failures by "tripping" open


Latency vs Throughput

What It Means: Latency = delay; Throughput = amount processed

❌ Common Confusion: Using them interchangeably

✅ How to Understand: Optimize latency for user experience; throughput for batch processing


Latency Budget

What It Means: Max allowed delay per system/component

❌ Common Confusion: Not distributing time wisely

✅ How to Understand: Divide latency among tiers (e.g., LB = 10ms, App = 50ms, DB = 30ms)



🎯 Quick Reference Guide

When to Use What?

ScenarioBest ChoiceWhy
High read trafficRead replicas + CacheDistribute load, reduce DB hits
High write trafficSharding + QueueSpread writes, async processing
Global usersCDN + Regional DBsReduce latency worldwide
MicroservicesAPI Gateway + Service DiscoveryCentralized routing, dynamic scaling
Real-time featuresWebSockets + Pub/SubInstant communication
Financial transactionsACID DB + IdempotencyData integrity, safe retries
Analytics workloadData warehouse + Batch processingOptimized for complex queries
Mobile appsREST API + CDNSimple, cacheable, fast

Common Anti-Patterns to Avoid

  • Premature optimization - Don't over-engineer from day one
  • Distributed monolith - Microservices that are tightly coupled
  • Cache everything - Only cache what's actually accessed frequently
  • Single point of failure - Always have backups and redundancy
  • Ignoring monitoring - You can't fix what you can't measure
  • Not testing failure scenarios - Chaos engineering is your friend
  • Synchronous everything - Use async processing where possible

Key Principles to Remember

  1. Start simple, scale gradually - Begin with a monolith, split when needed
  2. Design for failure - Everything will fail eventually
  3. Monitor everything - Metrics, logs, and traces are essential
  4. Automate deployments - Manual processes don't scale
  5. Choose consistency model wisely - Not everything needs strong consistency
  6. Cache strategically - Cache hot data, not everything
  7. Design for your actual use case - Don't copy someone else's architecture blindly