System Design — Study Notes

Fundamentals

System design starts with understanding what properties a distributed system must balance. There is rarely a single "correct" design — only trade-offs that depend on requirements.

Key Properties

Availability

The fraction of time the system is operational. Measured as "nines": 99.9% = 8.7 hrs downtime/year; 99.99% = 52 min/year.

Reliability

The system performs its intended function correctly over time. A system can be available but unreliable (returns wrong results).

Scalability

Ability to handle growing load — more users, requests, or data — without degrading performance.

Maintainability

How easy it is for engineers to operate, evolve, and fix the system. Often the most costly long-term factor.

Fault Tolerance

The system continues to operate correctly despite partial failures. Achieved via redundancy, replication, and graceful degradation.

Consistency

Every read receives the most recent write or an error. Comes in strong, eventual, and causal varieties.

Back-of-the-Envelope Estimation

Always estimate scale before designing. Key numbers every engineer should know:

Operation	Approx. Latency
L1 cache reference	0.5 ns
L2 cache reference	7 ns
RAM access	100 ns
SSD random read	150 µs
HDD seek	10 ms
Datacenter RTT	500 µs
Cross-continent RTT	150 ms

Rule of Thumb 1M requests/day ≈ 12 req/sec. Twitter at peak: ~150k tweets/min. YouTube: 500 hours of video uploaded per minute.

Scalability

Horizontal vs Vertical Scaling

Dimension	Vertical (Scale Up)	Horizontal (Scale Out)
Method	Bigger machine (more CPU/RAM)	More machines
Complexity	Low — single node	High — distributed state
Limit	Hardware ceiling exists	Nearly unlimited
Downtime	Requires restart	Rolling updates possible
Cost	Expensive at high-end	Commodity hardware

Load Balancing

A load balancer distributes incoming requests across a pool of servers. It is the gateway to horizontal scaling.

Client → [Load Balancer] → Server A → Server B → Server C

Round Robin

Requests distributed evenly in turn. Simple but ignores server capacity differences.

Least Connections

Routes to the server with fewest active connections. Better for long-lived connections.

IP Hash

Client IP determines server. Useful for session stickiness without shared state.

Consistent Hashing

Maps keys to a ring. Adding/removing nodes only remaps a small fraction of keys. Essential for distributed caches and DBs.

Tip Prefer stateless services so any server can handle any request. Externalise session state to Redis or a DB.

Rate Limiting

Protects services from abuse and runaway clients. Common algorithms:

Token Bucket — tokens added at constant rate, consumed per request. Allows bursts.
Leaky Bucket — requests processed at fixed outflow rate. Smooths traffic, rejects overflow.
Fixed Window Counter — count per time window. Simple but has boundary burst issue.
Sliding Window Log — timestamps stored; precise but memory-intensive.

Databases

SQL vs NoSQL

Attribute	SQL (Relational)	NoSQL
Schema	Fixed, typed columns	Flexible / schemaless
ACID	Full ACID support	Usually eventual consistency
Scaling	Harder to scale out	Designed for horizontal scale
Joins	Native, performant	Avoided; denormalise instead
Best For	Complex queries, financial data	High-throughput, flexible schema
Examples	PostgreSQL, MySQL, SQLite	MongoDB, Cassandra, DynamoDB, Redis

ACID Properties

Atomicity

A transaction is all-or-nothing. If any step fails, the whole transaction rolls back.

Consistency

A transaction brings the database from one valid state to another; constraints are never violated.

Isolation

Concurrent transactions produce the same result as if they ran sequentially.

Durability

Once committed, data survives crashes. Achieved via WAL (Write-Ahead Log).

Indexing

Indexes trade write speed and storage for fast reads. Without an index, every query is a full table scan O(n).

B-Tree Index — default in most RDBMS; ideal for range queries and equality
Hash Index — O(1) exact lookups; no range support
Composite Index — index on multiple columns; follow left-prefix rule
Covering Index — index contains all needed columns; query never touches table
Full-Text Index — tokenised text search (e.g., Elasticsearch)

Warning Over-indexing slows writes. Index only columns used in WHERE, JOIN, or ORDER BY clauses under high read load.

Replication & Sharding

Primary-Replica

One primary handles writes; replicas serve reads. Increases read throughput. Failover: promote a replica.

Multi-Primary

Multiple nodes accept writes. Allows geographic distribution. Conflict resolution required.

Horizontal Sharding

Data partitioned across multiple DB instances (e.g., users 1–1M on shard A, 1M–2M on shard B). Enables massive scale; cross-shard queries are expensive.

Vertical Sharding

Different tables on different databases. Simpler, but limited by table size on each node.

Caching

Caching stores expensive computation results or frequently accessed data in fast memory, reducing latency and database load dramatically.

Cache Patterns

Cache-Aside (Lazy)

App checks cache → miss → reads DB → writes to cache. Cache only populated on demand. Most common pattern.

Read-Through

Cache sits in front of DB. On miss, cache itself fetches from DB and stores result. Transparent to app.

Write-Through

Every write goes to cache AND DB synchronously. No stale data; higher write latency.

Write-Back (Write-Behind)

Writes go to cache only; flushed to DB asynchronously. Low latency writes, risk of data loss on crash.

Eviction Policies

LRU (Least Recently Used) — evicts item not accessed longest. Most commonly used.
LFU (Least Frequently Used) — evicts item accessed fewest times. Better for skewed access patterns.
FIFO — evicts oldest inserted item regardless of use.
TTL (Time-To-Live) — entries expire after fixed duration.

Cache Problems

Problem	Description	Solution
Cache Stampede	Many requests miss at same time after expiry → DB overwhelmed	Mutex/lock, staggered TTLs, probabilistic early expiry
Cache Penetration	Requests for non-existent keys always miss → DB hit every time	Cache null results, Bloom filter at gateway
Cache Avalanche	Many keys expire at once → spike on DB	Jitter on TTL, persistent cache, circuit breaker
Hotspot / Hot Key	Single key gets massive traffic	Replicate hot key to multiple shards

Networking & APIs

API Styles Compared

Style	Protocol	Best For	Drawback
REST	HTTP/1.1+	Public APIs, CRUD resources	Over/under-fetching
GraphQL	HTTP	Flexible frontends, mobile	Complex caching, N+1 queries
gRPC	HTTP/2	Internal microservices, high throughput	Not browser-native, binary
WebSocket	TCP (WS)	Real-time: chat, games, feeds	Stateful, harder to scale
Webhooks	HTTP callback	Event-driven integrations	Delivery guarantees needed

DNS Resolution Flow

Browser → Recursive Resolver → Root Nameserver → TLD Nameserver (.com) → Authoritative Nameserver ← IP Address returned Browser → IP (cached per TTL)

Low DNS TTL = faster propagation; High TTL = better cache hit rate and reduced DNS load.

HTTP Status Codes to Know

Code	Meaning	System Design Context
200 OK	Success	Standard response
201 Created	Resource created	POST endpoint
301 / 302	Redirect	URL shortener response
400	Bad request	Client-side validation failure
401 / 403	Unauth / Forbidden	Auth gate; 401 = missing, 403 = denied
429	Too Many Requests	Rate limiter response
503	Service Unavailable	Circuit breaker open / overload

Message Queues & Event Streaming

Asynchronous communication decouples producers from consumers, enabling resilience and independent scaling.

Why Queues?

Decoupling — producer and consumer don't need to know about each other
Load levelling — absorbs traffic bursts; consumer processes at its own pace
Durability — messages persist even if consumer is temporarily down
Fan-out — one event consumed by multiple downstream services

Key Technologies

Apache Kafka

Distributed log. Topics with partitions. Immutable, replayable. Excellent for event sourcing, analytics pipelines, millions of events/sec.

RabbitMQ

Traditional broker with exchanges and queues. Supports complex routing (topic, direct, fanout). Better for task queues and RPC.

Amazon SQS

Managed queue service. Two types: Standard (at-least-once) and FIFO (exactly-once, ordered). Scales automatically.

Redis Pub/Sub

In-memory, real-time. Messages not persisted — fire-and-forget. Good for notifications, live scores.

Delivery Semantics

Semantic	Description	Risk
At-most-once	Message sent once, may be lost	Data loss
At-least-once	Message retried until ack; may duplicate	Duplicate processing
Exactly-once	No loss, no duplicate	Expensive; requires idempotency + 2PC

Best Practice Design consumers to be idempotent (safe to call multiple times with same input). This makes at-least-once semantics safe in practice.

Distributed Systems

CAP Theorem

A distributed system can guarantee at most two of the following three properties simultaneously:

Consistency (C)

Every read returns the most recent write or an error. All nodes see the same data at the same time.

Availability (A)

Every request receives a response (not necessarily the most recent data). System never rejects requests.

Partition Tolerance (P)

System operates despite network partitions (dropped messages between nodes). Always required in practice.

Key Insight Network partitions always occur in distributed systems. So you always sacrifice either C or A during a partition. CP systems (e.g. HBase, ZooKeeper) choose accuracy over uptime. AP systems (e.g. Cassandra, DynamoDB) choose uptime over accuracy.

Consistency Models

Model	Guarantee	Example
Strong Consistency	Read always returns latest write (linearisability)	Spanner, etcd
Sequential Consistency	Operations in some global sequential order	Zookeeper
Causal Consistency	Causally related operations seen in order	MongoDB sessions
Eventual Consistency	Replicas converge given no new updates	DynamoDB, Cassandra
Read-your-writes	A user always sees their own writes	Sticky sessions, read-from-primary

Consensus Algorithms

How distributed nodes agree on a single value despite failures:

Paxos — Foundational consensus algorithm. Notoriously hard to implement correctly. Used in Google Spanner, Chubby.
Raft — Designed for understandability. Leader election + log replication. Used in etcd, CockroachDB, TiKV.
PBFT — Byzantine Fault Tolerance; used in blockchain-adjacent systems where nodes may act maliciously.

Distributed Patterns

Circuit Breaker

Prevents calls to a failing service. States: Closed → Open (on failure threshold) → Half-Open (probe). Popularised by Netflix Hystrix.

Saga Pattern

Manages distributed transactions without 2PC. Sequence of local transactions with compensating actions on failure.

Sidecar

Deploy helper process alongside each service (e.g., Envoy proxy). Handles logging, auth, tracing without changing app code.

Service Mesh

Network of sidecars (Istio, Linkerd). Provides mTLS, retries, observability between microservices automatically.

Storage & CDN

Storage Types

Type	Description	Example	Use Case
Block Storage	Raw volumes; OS sees as disk	AWS EBS, GCP Persistent Disk	Databases, OS volumes
File Storage	Hierarchical filesystem	AWS EFS, NFS	Shared filesystems, home dirs
Object Storage	Flat key-value for blobs	AWS S3, GCS	Media, backups, static assets
In-Memory	RAM; fastest, volatile	Redis, Memcached	Caching, sessions, queues

Content Delivery Networks (CDN)

A CDN is a globally distributed network of edge servers that cache and serve content close to users, reducing latency and origin load.

User (Pakistan) → [CDN Edge — Dubai] ← cached CSS/JS/images ← 10ms latency (not 200ms to US origin)

Push CDN — you proactively upload content to CDN. Good for static, predictable content.
Pull CDN — CDN fetches from origin on first request, caches. Simpler but cold misses hit origin.
Use for: Images, video, JS/CSS, static HTML. Not suitable for highly personalised or frequently changing data.

Design Interview Framework

Use this structured framework for any system design question. Allocate your 45 minutes carefully.

Step-by-Step Process

#	Step	Time	Key Questions
1	Clarify Requirements	5 min	Read vs write ratio? DAU? Latency SLA? Consistency needs? Global?
2	Capacity Estimation	5 min	QPS, storage/day, bandwidth, peak load × 2–3
3	High-Level Design	10 min	Draw boxes: clients, LB, servers, DB, cache, queue, CDN
4	Deep Dive	15 min	Most complex component — schema, API, data flow
5	Bottlenecks & Trade-offs	10 min	Single points of failure, hot spots, consistency vs availability

Standard Component Checklist

☐ DNS + CDN for static assets
☐ Load balancer (L4 or L7)
☐ Stateless web servers (horizontally scalable)
☐ Primary + read replicas for DB
☐ Cache layer (Redis/Memcached)
☐ Async processing via message queue
☐ Object store for media/blobs (S3)
☐ Search index if full-text needed (Elasticsearch)
☐ Monitoring + alerting (Prometheus + Grafana)
☐ Auth service (JWT / OAuth 2.0)

Classic Design Problems

URL Shortener (like Bitly)

Read-heavy Low latency

Key decision: How to generate short codes? Base62 of auto-increment ID, or random hash with collision check
Storage: SQL or KV store (key = short code, value = long URL)
Scaling reads: Cache frequently accessed short codes in Redis
Redirect: HTTP 301 (permanent, browser caches) vs 302 (temporary, server can track clicks)
DB schema: id, short_code, long_url, user_id, created_at, expiry

Twitter / Social Feed

Write-heavy Fan-out problem

Fan-out on write: When user tweets, push to all followers' timelines immediately. Fast reads, slow writes. Bad for celebrities (10M followers).
Fan-out on read (pull): On timeline load, fetch tweets from all followed users. Always fresh; heavy on reads.
Hybrid: Push for regular users; pull for celebrities. Twitter uses this.
Storage: Tweets in DB + timeline feeds in Redis (sorted sets by timestamp)

WhatsApp / Chat System

Real-time High availability

Connection: WebSocket for persistent bidirectional connection per client
Message flow: Client A → Chat Server → Message Queue → Chat Server → Client B
Offline users: Messages stored in DB; push notification via APNs/FCM; delivered on reconnect
Message ordering: Logical timestamps / sequence numbers per conversation
Read receipts: Separate lightweight event sent back through same path

YouTube / Video Platform

Storage-intensive CDN-dependent

Upload: Client → Upload Service → Raw Storage (S3) → Transcoding Queue (Kafka) → Transcoder Workers → Multiple resolution outputs → CDN
Transcoding: Convert to multiple resolutions (360p, 720p, 1080p, 4K) and codecs (H.264, VP9)
Streaming: Adaptive bitrate (HLS/DASH) — player selects quality based on bandwidth
Metadata: PostgreSQL for video info; Elasticsearch for search

Ride-Sharing (Uber-like)

Geospatial Real-time matching

Location updates: Drivers send GPS every 4–5s → Location Service → Redis Geo (sorted set with geohash)
Geohash: Encode lat/lng as string prefix → nearby search becomes string prefix search
Matching: On ride request, find nearest N drivers, send offer to closest available
Trip events: Kafka for ride requested → matched → started → completed

References & Resources

📖 Book

Designing Data-Intensive Applications

Martin Kleppmann — O'Reilly, 2017. The essential reference. Deep dives on storage engines, replication, consistency.

📖 Book

System Design Interview Vol. 1 & 2

Alex Xu. Interview-focused, covers 30+ real systems with step-by-step breakdown and diagrams.

🎬 YouTube

ByteByteGo

Alex Xu's channel. Excellent animated explainers. Free on YouTube, newsletter at bytebytego.com.

youtube.com/@ByteByteGo →

🎬 YouTube

Gaurav Sen

Deep conceptual explanations of consistent hashing, distributed systems, microservices.

youtube.com/@gkcs →

🎬 YouTube

Hussein Nasser

Backend engineering, database internals, proxy servers. Very practical and hands-on.

youtube.com/@hnasr →

🌐 Website

High Scalability Blog

Real-world architecture breakdowns of how companies scaled their systems.

highscalability.com →

📚 Course

Grokking System Design

Educative.io. Structured course covering 20+ systems with interactive diagrams.

educative.io →

🌐 GitHub

System Design Primer

Donne Martin's massive open-source repo. 270k+ stars. Covers everything with diagrams.

github.com/donnemartin →

📄 Paper

Google Bigtable (2006)

Foundational NoSQL paper describing the storage system behind Google's web indexing.

research.google →

📄 Paper

Amazon Dynamo (2007)

Seminal paper on highly available key-value storage. Introduced consistent hashing + vector clocks to mainstream.

allthingsdistributed.com →

📄 Paper

Raft Consensus Algorithm

Understandable distributed consensus. Diego Ongaro & John Ousterhout, 2014.

raft.github.io →

🌐 Blog

Engineering Blogs

Netflix Tech Blog, Uber Engineering, Discord Blog, Airbnb Engineering — real architecture decisions explained.

netflixtechblog.com →

Contents

Fundamentals

Key Properties

Back-of-the-Envelope Estimation

Scalability

Horizontal vs Vertical Scaling

Load Balancing

Rate Limiting

Databases

SQL vs NoSQL

ACID Properties

Indexing

Replication & Sharding

Caching

Cache Patterns

Eviction Policies

Cache Problems

Networking & APIs

API Styles Compared

DNS Resolution Flow

HTTP Status Codes to Know

Message Queues & Event Streaming

Why Queues?

Key Technologies

Delivery Semantics

Distributed Systems

CAP Theorem

Consistency Models

Consensus Algorithms

Distributed Patterns

Storage & CDN

Storage Types

Content Delivery Networks (CDN)

Design Interview Framework

Step-by-Step Process

Standard Component Checklist

Classic Design Problems

URL Shortener (like Bitly)

Twitter / Social Feed

WhatsApp / Chat System

YouTube / Video Platform

Ride-Sharing (Uber-like)

References & Resources

Designing Data-Intensive Applications

System Design Interview Vol. 1 & 2

ByteByteGo

Gaurav Sen

Hussein Nasser

High Scalability Blog

Grokking System Design

System Design Primer

Google Bigtable (2006)

Amazon Dynamo (2007)

Raft Consensus Algorithm

Engineering Blogs