Engineering Study Notes

System Design

A comprehensive reference covering core concepts, patterns, trade-offs, and real-world architectures — with curated resources.

View Comprehensive Text ➔

Contents

01Fundamentals 02Scalability 03Databases 04Caching 05Networking & APIs 06Message Queues 07Distributed Systems 08Storage & CDN 09Design Framework 10Classic Problems References

System design starts with understanding what properties a distributed system must balance. There is rarely a single "correct" design — only trade-offs that depend on requirements.

Key Properties

Availability

The fraction of time the system is operational. Measured as "nines": 99.9% = 8.7 hrs downtime/year; 99.99% = 52 min/year.

Reliability

The system performs its intended function correctly over time. A system can be available but unreliable (returns wrong results).

Scalability

Ability to handle growing load — more users, requests, or data — without degrading performance.

Maintainability

How easy it is for engineers to operate, evolve, and fix the system. Often the most costly long-term factor.

Fault Tolerance

The system continues to operate correctly despite partial failures. Achieved via redundancy, replication, and graceful degradation.

Consistency

Every read receives the most recent write or an error. Comes in strong, eventual, and causal varieties.

Back-of-the-Envelope Estimation

Always estimate scale before designing. Key numbers every engineer should know:

OperationApprox. Latency
L1 cache reference0.5 ns
L2 cache reference7 ns
RAM access100 ns
SSD random read150 µs
HDD seek10 ms
Datacenter RTT500 µs
Cross-continent RTT150 ms
Rule of Thumb 1M requests/day ≈ 12 req/sec. Twitter at peak: ~150k tweets/min. YouTube: 500 hours of video uploaded per minute.

Horizontal vs Vertical Scaling

DimensionVertical (Scale Up)Horizontal (Scale Out)
MethodBigger machine (more CPU/RAM)More machines
ComplexityLow — single nodeHigh — distributed state
LimitHardware ceiling existsNearly unlimited
DowntimeRequires restartRolling updates possible
CostExpensive at high-endCommodity hardware

Load Balancing

A load balancer distributes incoming requests across a pool of servers. It is the gateway to horizontal scaling.

Client → [Load Balancer] → Server A → Server B → Server C
Round Robin

Requests distributed evenly in turn. Simple but ignores server capacity differences.

Least Connections

Routes to the server with fewest active connections. Better for long-lived connections.

IP Hash

Client IP determines server. Useful for session stickiness without shared state.

Consistent Hashing

Maps keys to a ring. Adding/removing nodes only remaps a small fraction of keys. Essential for distributed caches and DBs.

Tip Prefer stateless services so any server can handle any request. Externalise session state to Redis or a DB.

Rate Limiting

Protects services from abuse and runaway clients. Common algorithms:

  • Token Bucket — tokens added at constant rate, consumed per request. Allows bursts.
  • Leaky Bucket — requests processed at fixed outflow rate. Smooths traffic, rejects overflow.
  • Fixed Window Counter — count per time window. Simple but has boundary burst issue.
  • Sliding Window Log — timestamps stored; precise but memory-intensive.

SQL vs NoSQL

AttributeSQL (Relational)NoSQL
SchemaFixed, typed columnsFlexible / schemaless
ACIDFull ACID supportUsually eventual consistency
ScalingHarder to scale outDesigned for horizontal scale
JoinsNative, performantAvoided; denormalise instead
Best ForComplex queries, financial dataHigh-throughput, flexible schema
ExamplesPostgreSQL, MySQL, SQLiteMongoDB, Cassandra, DynamoDB, Redis

ACID Properties

Atomicity

A transaction is all-or-nothing. If any step fails, the whole transaction rolls back.

Consistency

A transaction brings the database from one valid state to another; constraints are never violated.

Isolation

Concurrent transactions produce the same result as if they ran sequentially.

Durability

Once committed, data survives crashes. Achieved via WAL (Write-Ahead Log).

Indexing

Indexes trade write speed and storage for fast reads. Without an index, every query is a full table scan O(n).

  • B-Tree Index — default in most RDBMS; ideal for range queries and equality
  • Hash Index — O(1) exact lookups; no range support
  • Composite Index — index on multiple columns; follow left-prefix rule
  • Covering Index — index contains all needed columns; query never touches table
  • Full-Text Index — tokenised text search (e.g., Elasticsearch)
Warning Over-indexing slows writes. Index only columns used in WHERE, JOIN, or ORDER BY clauses under high read load.

Replication & Sharding

Primary-Replica

One primary handles writes; replicas serve reads. Increases read throughput. Failover: promote a replica.

Multi-Primary

Multiple nodes accept writes. Allows geographic distribution. Conflict resolution required.

Horizontal Sharding

Data partitioned across multiple DB instances (e.g., users 1–1M on shard A, 1M–2M on shard B). Enables massive scale; cross-shard queries are expensive.

Vertical Sharding

Different tables on different databases. Simpler, but limited by table size on each node.

Caching stores expensive computation results or frequently accessed data in fast memory, reducing latency and database load dramatically.

Cache Patterns

Cache-Aside (Lazy)

App checks cache → miss → reads DB → writes to cache. Cache only populated on demand. Most common pattern.

Read-Through

Cache sits in front of DB. On miss, cache itself fetches from DB and stores result. Transparent to app.

Write-Through

Every write goes to cache AND DB synchronously. No stale data; higher write latency.

Write-Back (Write-Behind)

Writes go to cache only; flushed to DB asynchronously. Low latency writes, risk of data loss on crash.

Eviction Policies

  • LRU (Least Recently Used) — evicts item not accessed longest. Most commonly used.
  • LFU (Least Frequently Used) — evicts item accessed fewest times. Better for skewed access patterns.
  • FIFO — evicts oldest inserted item regardless of use.
  • TTL (Time-To-Live) — entries expire after fixed duration.

Cache Problems

ProblemDescriptionSolution
Cache StampedeMany requests miss at same time after expiry → DB overwhelmedMutex/lock, staggered TTLs, probabilistic early expiry
Cache PenetrationRequests for non-existent keys always miss → DB hit every timeCache null results, Bloom filter at gateway
Cache AvalancheMany keys expire at once → spike on DBJitter on TTL, persistent cache, circuit breaker
Hotspot / Hot KeySingle key gets massive trafficReplicate hot key to multiple shards

API Styles Compared

StyleProtocolBest ForDrawback
RESTHTTP/1.1+Public APIs, CRUD resourcesOver/under-fetching
GraphQLHTTPFlexible frontends, mobileComplex caching, N+1 queries
gRPCHTTP/2Internal microservices, high throughputNot browser-native, binary
WebSocketTCP (WS)Real-time: chat, games, feedsStateful, harder to scale
WebhooksHTTP callbackEvent-driven integrationsDelivery guarantees needed

DNS Resolution Flow

Browser → Recursive Resolver → Root Nameserver → TLD Nameserver (.com) → Authoritative Nameserver ← IP Address returned Browser → IP (cached per TTL)

Low DNS TTL = faster propagation; High TTL = better cache hit rate and reduced DNS load.

HTTP Status Codes to Know

CodeMeaningSystem Design Context
200 OKSuccessStandard response
201 CreatedResource createdPOST endpoint
301 / 302RedirectURL shortener response
400Bad requestClient-side validation failure
401 / 403Unauth / ForbiddenAuth gate; 401 = missing, 403 = denied
429Too Many RequestsRate limiter response
503Service UnavailableCircuit breaker open / overload

Asynchronous communication decouples producers from consumers, enabling resilience and independent scaling.

Why Queues?

  • Decoupling — producer and consumer don't need to know about each other
  • Load levelling — absorbs traffic bursts; consumer processes at its own pace
  • Durability — messages persist even if consumer is temporarily down
  • Fan-out — one event consumed by multiple downstream services

Key Technologies

Apache Kafka

Distributed log. Topics with partitions. Immutable, replayable. Excellent for event sourcing, analytics pipelines, millions of events/sec.

RabbitMQ

Traditional broker with exchanges and queues. Supports complex routing (topic, direct, fanout). Better for task queues and RPC.

Amazon SQS

Managed queue service. Two types: Standard (at-least-once) and FIFO (exactly-once, ordered). Scales automatically.

Redis Pub/Sub

In-memory, real-time. Messages not persisted — fire-and-forget. Good for notifications, live scores.

Delivery Semantics

SemanticDescriptionRisk
At-most-onceMessage sent once, may be lostData loss
At-least-onceMessage retried until ack; may duplicateDuplicate processing
Exactly-onceNo loss, no duplicateExpensive; requires idempotency + 2PC
Best Practice Design consumers to be idempotent (safe to call multiple times with same input). This makes at-least-once semantics safe in practice.

CAP Theorem

A distributed system can guarantee at most two of the following three properties simultaneously:

Consistency (C)

Every read returns the most recent write or an error. All nodes see the same data at the same time.

Availability (A)

Every request receives a response (not necessarily the most recent data). System never rejects requests.

Partition Tolerance (P)

System operates despite network partitions (dropped messages between nodes). Always required in practice.

Key Insight Network partitions always occur in distributed systems. So you always sacrifice either C or A during a partition. CP systems (e.g. HBase, ZooKeeper) choose accuracy over uptime. AP systems (e.g. Cassandra, DynamoDB) choose uptime over accuracy.

Consistency Models

ModelGuaranteeExample
Strong ConsistencyRead always returns latest write (linearisability)Spanner, etcd
Sequential ConsistencyOperations in some global sequential orderZookeeper
Causal ConsistencyCausally related operations seen in orderMongoDB sessions
Eventual ConsistencyReplicas converge given no new updatesDynamoDB, Cassandra
Read-your-writesA user always sees their own writesSticky sessions, read-from-primary

Consensus Algorithms

How distributed nodes agree on a single value despite failures:

  • Paxos — Foundational consensus algorithm. Notoriously hard to implement correctly. Used in Google Spanner, Chubby.
  • Raft — Designed for understandability. Leader election + log replication. Used in etcd, CockroachDB, TiKV.
  • PBFT — Byzantine Fault Tolerance; used in blockchain-adjacent systems where nodes may act maliciously.

Distributed Patterns

Circuit Breaker

Prevents calls to a failing service. States: Closed → Open (on failure threshold) → Half-Open (probe). Popularised by Netflix Hystrix.

Saga Pattern

Manages distributed transactions without 2PC. Sequence of local transactions with compensating actions on failure.

Sidecar

Deploy helper process alongside each service (e.g., Envoy proxy). Handles logging, auth, tracing without changing app code.

Service Mesh

Network of sidecars (Istio, Linkerd). Provides mTLS, retries, observability between microservices automatically.

Storage Types

TypeDescriptionExampleUse Case
Block StorageRaw volumes; OS sees as diskAWS EBS, GCP Persistent DiskDatabases, OS volumes
File StorageHierarchical filesystemAWS EFS, NFSShared filesystems, home dirs
Object StorageFlat key-value for blobsAWS S3, GCSMedia, backups, static assets
In-MemoryRAM; fastest, volatileRedis, MemcachedCaching, sessions, queues

Content Delivery Networks (CDN)

A CDN is a globally distributed network of edge servers that cache and serve content close to users, reducing latency and origin load.

User (Pakistan) → [CDN Edge — Dubai] ← cached CSS/JS/images ← 10ms latency (not 200ms to US origin)
  • Push CDN — you proactively upload content to CDN. Good for static, predictable content.
  • Pull CDN — CDN fetches from origin on first request, caches. Simpler but cold misses hit origin.
  • Use for: Images, video, JS/CSS, static HTML. Not suitable for highly personalised or frequently changing data.

Use this structured framework for any system design question. Allocate your 45 minutes carefully.

Step-by-Step Process

#StepTimeKey Questions
1Clarify Requirements5 minRead vs write ratio? DAU? Latency SLA? Consistency needs? Global?
2Capacity Estimation5 minQPS, storage/day, bandwidth, peak load × 2–3
3High-Level Design10 minDraw boxes: clients, LB, servers, DB, cache, queue, CDN
4Deep Dive15 minMost complex component — schema, API, data flow
5Bottlenecks & Trade-offs10 minSingle points of failure, hot spots, consistency vs availability

Standard Component Checklist

  • ☐ DNS + CDN for static assets
  • ☐ Load balancer (L4 or L7)
  • ☐ Stateless web servers (horizontally scalable)
  • ☐ Primary + read replicas for DB
  • ☐ Cache layer (Redis/Memcached)
  • ☐ Async processing via message queue
  • ☐ Object store for media/blobs (S3)
  • ☐ Search index if full-text needed (Elasticsearch)
  • ☐ Monitoring + alerting (Prometheus + Grafana)
  • ☐ Auth service (JWT / OAuth 2.0)

URL Shortener (like Bitly)

Read-heavy Low latency

  • Key decision: How to generate short codes? Base62 of auto-increment ID, or random hash with collision check
  • Storage: SQL or KV store (key = short code, value = long URL)
  • Scaling reads: Cache frequently accessed short codes in Redis
  • Redirect: HTTP 301 (permanent, browser caches) vs 302 (temporary, server can track clicks)
  • DB schema: id, short_code, long_url, user_id, created_at, expiry

Twitter / Social Feed

Write-heavy Fan-out problem

  • Fan-out on write: When user tweets, push to all followers' timelines immediately. Fast reads, slow writes. Bad for celebrities (10M followers).
  • Fan-out on read (pull): On timeline load, fetch tweets from all followed users. Always fresh; heavy on reads.
  • Hybrid: Push for regular users; pull for celebrities. Twitter uses this.
  • Storage: Tweets in DB + timeline feeds in Redis (sorted sets by timestamp)

WhatsApp / Chat System

Real-time High availability

  • Connection: WebSocket for persistent bidirectional connection per client
  • Message flow: Client A → Chat Server → Message Queue → Chat Server → Client B
  • Offline users: Messages stored in DB; push notification via APNs/FCM; delivered on reconnect
  • Message ordering: Logical timestamps / sequence numbers per conversation
  • Read receipts: Separate lightweight event sent back through same path

YouTube / Video Platform

Storage-intensive CDN-dependent

  • Upload: Client → Upload Service → Raw Storage (S3) → Transcoding Queue (Kafka) → Transcoder Workers → Multiple resolution outputs → CDN
  • Transcoding: Convert to multiple resolutions (360p, 720p, 1080p, 4K) and codecs (H.264, VP9)
  • Streaming: Adaptive bitrate (HLS/DASH) — player selects quality based on bandwidth
  • Metadata: PostgreSQL for video info; Elasticsearch for search

Ride-Sharing (Uber-like)

Geospatial Real-time matching

  • Location updates: Drivers send GPS every 4–5s → Location Service → Redis Geo (sorted set with geohash)
  • Geohash: Encode lat/lng as string prefix → nearby search becomes string prefix search
  • Matching: On ride request, find nearest N drivers, send offer to closest available
  • Trip events: Kafka for ride requested → matched → started → completed

References & Resources

📖 Book

Designing Data-Intensive Applications

Martin Kleppmann — O'Reilly, 2017. The essential reference. Deep dives on storage engines, replication, consistency.

📖 Book

System Design Interview Vol. 1 & 2

Alex Xu. Interview-focused, covers 30+ real systems with step-by-step breakdown and diagrams.

🎬 YouTube

ByteByteGo

Alex Xu's channel. Excellent animated explainers. Free on YouTube, newsletter at bytebytego.com.

youtube.com/@ByteByteGo →
🎬 YouTube

Gaurav Sen

Deep conceptual explanations of consistent hashing, distributed systems, microservices.

youtube.com/@gkcs →
🎬 YouTube

Hussein Nasser

Backend engineering, database internals, proxy servers. Very practical and hands-on.

youtube.com/@hnasr →
🌐 Website

High Scalability Blog

Real-world architecture breakdowns of how companies scaled their systems.

highscalability.com →
📚 Course

Grokking System Design

Educative.io. Structured course covering 20+ systems with interactive diagrams.

educative.io →
🌐 GitHub

System Design Primer

Donne Martin's massive open-source repo. 270k+ stars. Covers everything with diagrams.

github.com/donnemartin →
📄 Paper

Google Bigtable (2006)

Foundational NoSQL paper describing the storage system behind Google's web indexing.

research.google →
📄 Paper

Amazon Dynamo (2007)

Seminal paper on highly available key-value storage. Introduced consistent hashing + vector clocks to mainstream.

allthingsdistributed.com →
📄 Paper

Raft Consensus Algorithm

Understandable distributed consensus. Diego Ongaro & John Ousterhout, 2014.

raft.github.io →
🌐 Blog

Engineering Blogs

Netflix Tech Blog, Uber Engineering, Discord Blog, Airbnb Engineering — real architecture decisions explained.

netflixtechblog.com →