System design starts with understanding what properties a distributed system must balance. There is rarely a single "correct" design — only trade-offs that depend on requirements.
Key Properties
The fraction of time the system is operational. Measured as "nines": 99.9% = 8.7 hrs downtime/year; 99.99% = 52 min/year.
The system performs its intended function correctly over time. A system can be available but unreliable (returns wrong results).
Ability to handle growing load — more users, requests, or data — without degrading performance.
How easy it is for engineers to operate, evolve, and fix the system. Often the most costly long-term factor.
The system continues to operate correctly despite partial failures. Achieved via redundancy, replication, and graceful degradation.
Every read receives the most recent write or an error. Comes in strong, eventual, and causal varieties.
Back-of-the-Envelope Estimation
Always estimate scale before designing. Key numbers every engineer should know:
| Operation | Approx. Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| RAM access | 100 ns |
| SSD random read | 150 µs |
| HDD seek | 10 ms |
| Datacenter RTT | 500 µs |
| Cross-continent RTT | 150 ms |
Horizontal vs Vertical Scaling
| Dimension | Vertical (Scale Up) | Horizontal (Scale Out) |
|---|---|---|
| Method | Bigger machine (more CPU/RAM) | More machines |
| Complexity | Low — single node | High — distributed state |
| Limit | Hardware ceiling exists | Nearly unlimited |
| Downtime | Requires restart | Rolling updates possible |
| Cost | Expensive at high-end | Commodity hardware |
Load Balancing
A load balancer distributes incoming requests across a pool of servers. It is the gateway to horizontal scaling.
Requests distributed evenly in turn. Simple but ignores server capacity differences.
Routes to the server with fewest active connections. Better for long-lived connections.
Client IP determines server. Useful for session stickiness without shared state.
Maps keys to a ring. Adding/removing nodes only remaps a small fraction of keys. Essential for distributed caches and DBs.
Rate Limiting
Protects services from abuse and runaway clients. Common algorithms:
- Token Bucket — tokens added at constant rate, consumed per request. Allows bursts.
- Leaky Bucket — requests processed at fixed outflow rate. Smooths traffic, rejects overflow.
- Fixed Window Counter — count per time window. Simple but has boundary burst issue.
- Sliding Window Log — timestamps stored; precise but memory-intensive.
SQL vs NoSQL
| Attribute | SQL (Relational) | NoSQL |
|---|---|---|
| Schema | Fixed, typed columns | Flexible / schemaless |
| ACID | Full ACID support | Usually eventual consistency |
| Scaling | Harder to scale out | Designed for horizontal scale |
| Joins | Native, performant | Avoided; denormalise instead |
| Best For | Complex queries, financial data | High-throughput, flexible schema |
| Examples | PostgreSQL, MySQL, SQLite | MongoDB, Cassandra, DynamoDB, Redis |
ACID Properties
A transaction is all-or-nothing. If any step fails, the whole transaction rolls back.
A transaction brings the database from one valid state to another; constraints are never violated.
Concurrent transactions produce the same result as if they ran sequentially.
Once committed, data survives crashes. Achieved via WAL (Write-Ahead Log).
Indexing
Indexes trade write speed and storage for fast reads. Without an index, every query is a full table scan O(n).
- B-Tree Index — default in most RDBMS; ideal for range queries and equality
- Hash Index — O(1) exact lookups; no range support
- Composite Index — index on multiple columns; follow left-prefix rule
- Covering Index — index contains all needed columns; query never touches table
- Full-Text Index — tokenised text search (e.g., Elasticsearch)
Replication & Sharding
One primary handles writes; replicas serve reads. Increases read throughput. Failover: promote a replica.
Multiple nodes accept writes. Allows geographic distribution. Conflict resolution required.
Data partitioned across multiple DB instances (e.g., users 1–1M on shard A, 1M–2M on shard B). Enables massive scale; cross-shard queries are expensive.
Different tables on different databases. Simpler, but limited by table size on each node.
Caching stores expensive computation results or frequently accessed data in fast memory, reducing latency and database load dramatically.
Cache Patterns
App checks cache → miss → reads DB → writes to cache. Cache only populated on demand. Most common pattern.
Cache sits in front of DB. On miss, cache itself fetches from DB and stores result. Transparent to app.
Every write goes to cache AND DB synchronously. No stale data; higher write latency.
Writes go to cache only; flushed to DB asynchronously. Low latency writes, risk of data loss on crash.
Eviction Policies
- LRU (Least Recently Used) — evicts item not accessed longest. Most commonly used.
- LFU (Least Frequently Used) — evicts item accessed fewest times. Better for skewed access patterns.
- FIFO — evicts oldest inserted item regardless of use.
- TTL (Time-To-Live) — entries expire after fixed duration.
Cache Problems
| Problem | Description | Solution |
|---|---|---|
| Cache Stampede | Many requests miss at same time after expiry → DB overwhelmed | Mutex/lock, staggered TTLs, probabilistic early expiry |
| Cache Penetration | Requests for non-existent keys always miss → DB hit every time | Cache null results, Bloom filter at gateway |
| Cache Avalanche | Many keys expire at once → spike on DB | Jitter on TTL, persistent cache, circuit breaker |
| Hotspot / Hot Key | Single key gets massive traffic | Replicate hot key to multiple shards |
API Styles Compared
| Style | Protocol | Best For | Drawback |
|---|---|---|---|
| REST | HTTP/1.1+ | Public APIs, CRUD resources | Over/under-fetching |
| GraphQL | HTTP | Flexible frontends, mobile | Complex caching, N+1 queries |
| gRPC | HTTP/2 | Internal microservices, high throughput | Not browser-native, binary |
| WebSocket | TCP (WS) | Real-time: chat, games, feeds | Stateful, harder to scale |
| Webhooks | HTTP callback | Event-driven integrations | Delivery guarantees needed |
DNS Resolution Flow
Low DNS TTL = faster propagation; High TTL = better cache hit rate and reduced DNS load.
HTTP Status Codes to Know
| Code | Meaning | System Design Context |
|---|---|---|
| 200 OK | Success | Standard response |
| 201 Created | Resource created | POST endpoint |
| 301 / 302 | Redirect | URL shortener response |
| 400 | Bad request | Client-side validation failure |
| 401 / 403 | Unauth / Forbidden | Auth gate; 401 = missing, 403 = denied |
| 429 | Too Many Requests | Rate limiter response |
| 503 | Service Unavailable | Circuit breaker open / overload |
Asynchronous communication decouples producers from consumers, enabling resilience and independent scaling.
Why Queues?
- Decoupling — producer and consumer don't need to know about each other
- Load levelling — absorbs traffic bursts; consumer processes at its own pace
- Durability — messages persist even if consumer is temporarily down
- Fan-out — one event consumed by multiple downstream services
Key Technologies
Distributed log. Topics with partitions. Immutable, replayable. Excellent for event sourcing, analytics pipelines, millions of events/sec.
Traditional broker with exchanges and queues. Supports complex routing (topic, direct, fanout). Better for task queues and RPC.
Managed queue service. Two types: Standard (at-least-once) and FIFO (exactly-once, ordered). Scales automatically.
In-memory, real-time. Messages not persisted — fire-and-forget. Good for notifications, live scores.
Delivery Semantics
| Semantic | Description | Risk |
|---|---|---|
| At-most-once | Message sent once, may be lost | Data loss |
| At-least-once | Message retried until ack; may duplicate | Duplicate processing |
| Exactly-once | No loss, no duplicate | Expensive; requires idempotency + 2PC |
CAP Theorem
A distributed system can guarantee at most two of the following three properties simultaneously:
Every read returns the most recent write or an error. All nodes see the same data at the same time.
Every request receives a response (not necessarily the most recent data). System never rejects requests.
System operates despite network partitions (dropped messages between nodes). Always required in practice.
Consistency Models
| Model | Guarantee | Example |
|---|---|---|
| Strong Consistency | Read always returns latest write (linearisability) | Spanner, etcd |
| Sequential Consistency | Operations in some global sequential order | Zookeeper |
| Causal Consistency | Causally related operations seen in order | MongoDB sessions |
| Eventual Consistency | Replicas converge given no new updates | DynamoDB, Cassandra |
| Read-your-writes | A user always sees their own writes | Sticky sessions, read-from-primary |
Consensus Algorithms
How distributed nodes agree on a single value despite failures:
- Paxos — Foundational consensus algorithm. Notoriously hard to implement correctly. Used in Google Spanner, Chubby.
- Raft — Designed for understandability. Leader election + log replication. Used in etcd, CockroachDB, TiKV.
- PBFT — Byzantine Fault Tolerance; used in blockchain-adjacent systems where nodes may act maliciously.
Distributed Patterns
Prevents calls to a failing service. States: Closed → Open (on failure threshold) → Half-Open (probe). Popularised by Netflix Hystrix.
Manages distributed transactions without 2PC. Sequence of local transactions with compensating actions on failure.
Deploy helper process alongside each service (e.g., Envoy proxy). Handles logging, auth, tracing without changing app code.
Network of sidecars (Istio, Linkerd). Provides mTLS, retries, observability between microservices automatically.
Storage Types
| Type | Description | Example | Use Case |
|---|---|---|---|
| Block Storage | Raw volumes; OS sees as disk | AWS EBS, GCP Persistent Disk | Databases, OS volumes |
| File Storage | Hierarchical filesystem | AWS EFS, NFS | Shared filesystems, home dirs |
| Object Storage | Flat key-value for blobs | AWS S3, GCS | Media, backups, static assets |
| In-Memory | RAM; fastest, volatile | Redis, Memcached | Caching, sessions, queues |
Content Delivery Networks (CDN)
A CDN is a globally distributed network of edge servers that cache and serve content close to users, reducing latency and origin load.
- Push CDN — you proactively upload content to CDN. Good for static, predictable content.
- Pull CDN — CDN fetches from origin on first request, caches. Simpler but cold misses hit origin.
- Use for: Images, video, JS/CSS, static HTML. Not suitable for highly personalised or frequently changing data.
Use this structured framework for any system design question. Allocate your 45 minutes carefully.
Step-by-Step Process
| # | Step | Time | Key Questions |
|---|---|---|---|
| 1 | Clarify Requirements | 5 min | Read vs write ratio? DAU? Latency SLA? Consistency needs? Global? |
| 2 | Capacity Estimation | 5 min | QPS, storage/day, bandwidth, peak load × 2–3 |
| 3 | High-Level Design | 10 min | Draw boxes: clients, LB, servers, DB, cache, queue, CDN |
| 4 | Deep Dive | 15 min | Most complex component — schema, API, data flow |
| 5 | Bottlenecks & Trade-offs | 10 min | Single points of failure, hot spots, consistency vs availability |
Standard Component Checklist
- ☐ DNS + CDN for static assets
- ☐ Load balancer (L4 or L7)
- ☐ Stateless web servers (horizontally scalable)
- ☐ Primary + read replicas for DB
- ☐ Cache layer (Redis/Memcached)
- ☐ Async processing via message queue
- ☐ Object store for media/blobs (S3)
- ☐ Search index if full-text needed (Elasticsearch)
- ☐ Monitoring + alerting (Prometheus + Grafana)
- ☐ Auth service (JWT / OAuth 2.0)
URL Shortener (like Bitly)
Read-heavy Low latency
- Key decision: How to generate short codes? Base62 of auto-increment ID, or random hash with collision check
- Storage: SQL or KV store (key = short code, value = long URL)
- Scaling reads: Cache frequently accessed short codes in Redis
- Redirect: HTTP 301 (permanent, browser caches) vs 302 (temporary, server can track clicks)
- DB schema: id, short_code, long_url, user_id, created_at, expiry
Twitter / Social Feed
Write-heavy Fan-out problem
- Fan-out on write: When user tweets, push to all followers' timelines immediately. Fast reads, slow writes. Bad for celebrities (10M followers).
- Fan-out on read (pull): On timeline load, fetch tweets from all followed users. Always fresh; heavy on reads.
- Hybrid: Push for regular users; pull for celebrities. Twitter uses this.
- Storage: Tweets in DB + timeline feeds in Redis (sorted sets by timestamp)
WhatsApp / Chat System
Real-time High availability
- Connection: WebSocket for persistent bidirectional connection per client
- Message flow: Client A → Chat Server → Message Queue → Chat Server → Client B
- Offline users: Messages stored in DB; push notification via APNs/FCM; delivered on reconnect
- Message ordering: Logical timestamps / sequence numbers per conversation
- Read receipts: Separate lightweight event sent back through same path
YouTube / Video Platform
Storage-intensive CDN-dependent
- Upload: Client → Upload Service → Raw Storage (S3) → Transcoding Queue (Kafka) → Transcoder Workers → Multiple resolution outputs → CDN
- Transcoding: Convert to multiple resolutions (360p, 720p, 1080p, 4K) and codecs (H.264, VP9)
- Streaming: Adaptive bitrate (HLS/DASH) — player selects quality based on bandwidth
- Metadata: PostgreSQL for video info; Elasticsearch for search
Ride-Sharing (Uber-like)
Geospatial Real-time matching
- Location updates: Drivers send GPS every 4–5s → Location Service → Redis Geo (sorted set with geohash)
- Geohash: Encode lat/lng as string prefix → nearby search becomes string prefix search
- Matching: On ride request, find nearest N drivers, send offer to closest available
- Trip events: Kafka for ride requested → matched → started → completed
References & Resources
Designing Data-Intensive Applications
Martin Kleppmann — O'Reilly, 2017. The essential reference. Deep dives on storage engines, replication, consistency.
System Design Interview Vol. 1 & 2
Alex Xu. Interview-focused, covers 30+ real systems with step-by-step breakdown and diagrams.
ByteByteGo
Alex Xu's channel. Excellent animated explainers. Free on YouTube, newsletter at bytebytego.com.
youtube.com/@ByteByteGo →Gaurav Sen
Deep conceptual explanations of consistent hashing, distributed systems, microservices.
youtube.com/@gkcs →Hussein Nasser
Backend engineering, database internals, proxy servers. Very practical and hands-on.
youtube.com/@hnasr →High Scalability Blog
Real-world architecture breakdowns of how companies scaled their systems.
highscalability.com →Grokking System Design
Educative.io. Structured course covering 20+ systems with interactive diagrams.
educative.io →System Design Primer
Donne Martin's massive open-source repo. 270k+ stars. Covers everything with diagrams.
github.com/donnemartin →Google Bigtable (2006)
Foundational NoSQL paper describing the storage system behind Google's web indexing.
research.google →Amazon Dynamo (2007)
Seminal paper on highly available key-value storage. Introduced consistent hashing + vector clocks to mainstream.
allthingsdistributed.com →Raft Consensus Algorithm
Understandable distributed consensus. Diego Ongaro & John Ousterhout, 2014.
raft.github.io →Engineering Blogs
Netflix Tech Blog, Uber Engineering, Discord Blog, Airbnb Engineering — real architecture decisions explained.
netflixtechblog.com →