System Design — Comprehensive Study Notes

Fundamentals & Estimation

Before designing anything, understand the properties you're optimising for and estimate the scale you're working at. Numbers ground your decisions.

The Core Properties

Availability

% uptime. 99.9% ("three nines") = 8.77 hrs downtime/year. 99.99% = 52 min/year. 99.999% = 5.26 min/year. SLAs are commitments; SLOs are internal targets; SLIs are actual measurements.

Reliability

Performs correctly, not just responds. A service returning stale data is available but not reliable. MTBF (Mean Time Between Failures) and MTTR (Mean Time To Recovery) are key metrics.

Scalability

Handles growing load without degrading. Define your load parameters first: RPS, concurrent users, data volume, read/write ratio. Twitter handles ~6k tweets/sec average, 150k at peak.

Consistency

All nodes agree on the same data. Strong consistency = every read sees latest write. Eventual = replicas converge over time. The right model depends on the business (bank balance vs likes count).

Durability

Committed data survives crashes. Achieved via WAL (Write-Ahead Logs), replication, and periodic snapshots. S3 offers 99.999999999% (11 nines) durability.

Latency vs Throughput

Latency = time for one request. Throughput = requests/second. Often a trade-off. Batching improves throughput but increases latency. Target both P50 and P99 latencies.

Numbers Every Engineer Should Know

Operation	Latency	Intuition
L1 cache access	0.5 ns	1 step across a room
L2 cache access	7 ns	14 steps
RAM read	100 ns	200 steps across a field
SSD random read (4K)	150 µs	Drive to the corner store
HDD seek + read	10 ms	A walk around the block
Same datacenter RTT	0.5 ms	Ping next building
Cross-continent (US↔EU)	150 ms	Speed of light delay
Reading 1 MB from RAM	250 µs	—
Reading 1 MB from SSD	1 ms	4× slower than RAM
Reading 1 MB from disk	20 ms	80× slower than RAM

Key Takeaway RAM is ~10,000× faster than disk. Network within a datacenter is ~1,000× faster than cross-continent. These gaps inform every caching and replication decision you make.

Back-of-Envelope: Twitter Example

Given: 300M monthly users, 50% daily active = 150M DAU. Each user reads 100 tweets/day, posts 1 tweet every 2 days.

// Write QPS
tweet_writes  = 150M users × (1 tweet / 2 days) / 86400s
               = ~870 writes/sec   (peak × 3 = ~2,500/sec)

// Read QPS
timeline_reads = 150M × 100 / 86400s
               = ~174,000 reads/sec  (peak × 3 = ~500,000/sec)

// Read:Write ratio = ~200:1 — heavily read-dominated
// Decision: optimise for reads, cache timelines aggressively

// Storage
tweet_size     = 64 bytes (id) + 140 bytes (text) + metadata = ~300 bytes
storage_per_day= 870/sec × 86400s × 300 bytes = ~22.6 GB/day
per_year       = 22.6 GB × 365 = ~8 TB/year  (before replication)
    

Scalability

Scalability is about how your system behaves as load grows. The goal is to add resources in proportion to load without redesigning the system.

Horizontal vs Vertical Scaling

Attribute	Vertical (Scale Up)	Horizontal (Scale Out)
Method	Bigger machine (more CPU/RAM)	More machines of same type
Complexity	Simple — no distributed concerns	Complex — must handle distributed state
Ceiling	Hardware limit exists (~96 cores)	Nearly unlimited (Google has millions of servers)
Downtime	Requires restart to resize	Rolling deploys; zero downtime
Failure	Single point of failure	Partial failure tolerance
Cost curve	Superlinear — 2× RAM costs 4×	Linear — commodity hardware
Real example	Early Instagram: one fat Postgres	Airbnb: thousands of identical app nodes

Load Balancing Deep Dive

A load balancer distributes traffic across a server pool. It can operate at Layer 4 (TCP) or Layer 7 (HTTP, aware of content).

┌─────────────────────────────────────────┐ │ LOAD BALANCER │ │ Health checks every 5s │ │ Removes unhealthy servers │ └──────┬─────────────┬───────────┬────────┘ │ │ │ [Server A] [Server B] [Server C] sessions:5 sessions:2 sessions:8 ← LC routes new req to Server B ──────→

Round Robin

Requests go to each server in turn: A → B → C → A... Best when servers are identical and requests are homogeneous.

Weighted Round Robin

Assign weights by capacity. Server with 2× RAM gets 2× requests. Used when servers have different specs.

Least Connections

Route to server with fewest active connections. Best for long-lived connections (WebSocket, streaming).

Consistent Hashing

Hash(key) maps to a point on a ring. Each server owns an arc. Adding a server only remaps ~1/n keys. Essential for caches (Redis cluster, Cassandra).

IP / URL Hash

Same client always goes to same server. Enables sticky sessions without shared store. Risk: hot spots if one client is very active.

Random

Statistically balanced without state. Good for short-lived stateless requests. AWS ALB uses weighted random by default.

Rate Limiting

Protect services from abuse, ensure fair use, and prevent cascade failures. Implemented at API gateway or per-service.

Algorithm	How it Works	Burst Allowed?	Use When
Token Bucket	Bucket holds N tokens, refills at rate R. Each request consumes 1 token.	Yes (up to bucket size)	Most APIs — allows bursty traffic
Leaky Bucket	Requests enter queue, processed at fixed rate. Overflow is dropped.	No — strictly smooth	Payment systems, smooth traffic
Fixed Window	Count per time window (e.g., 100 req/min). Resets at boundary.	2× burst at boundary	Simple quota enforcement
Sliding Window Log	Store timestamp of each request. Count within rolling window.	Precise no-burst	Accurate enforcement, high memory
Sliding Window Counter	Blend current + previous window weighted by position.	Approximate	Best balance of accuracy vs memory

Real World GitHub API: 5,000 req/hour per token (token bucket). Twitter API: 300 req/15 min (fixed window). Stripe: 100 req/sec (token bucket with burst). Return HTTP 429 with Retry-After header.

Auto-Scaling

Automatically add or remove servers based on metrics. Cloud providers (AWS, GCP, Azure) all offer managed auto-scaling groups.

Reactive scaling — trigger on CPU > 70% for 5 min. Lag means brief overload before new instances are warm.
Predictive scaling — ML predicts load (e.g., scale up at 8 AM before morning traffic). AWS now offers this natively.
Schedule-based — scale up Friday evenings, down weekday nights. Known traffic patterns (e-commerce flash sales).
Always define scale-in slowly (to avoid thrashing) and scale-out aggressively.

Databases

Database choice is often the most consequential decision in a system. Understand the trade-offs before committing — migrations are expensive.

SQL vs NoSQL

Attribute	SQL (Relational)	NoSQL
Schema	Fixed, enforced by DB	Flexible, app-enforced
Transactions	Full ACID	Varies (Mongo 4.0+ has multi-doc ACID)
Scaling	Vertical primary; sharding complex	Designed for horizontal scale
Joins	Native; performant with indexes	Avoid; denormalise data
Query language	SQL — expressive, standardised	Varies by DB
Best for	Complex queries, financial, relational data	High-write, flexible schema, huge scale

Common Mistake Defaulting to NoSQL for "scale" without considering that most products never exceed what a well-tuned PostgreSQL cluster can handle (millions of req/day). Instagram ran on Postgres for years at massive scale.

NoSQL Database Types

Key-Value Store

Examples: Redis, DynamoDB, Riak
Fast O(1) lookup by key. No complex queries. Best for: sessions, caches, user preferences, shopping carts.

Document Store

Examples: MongoDB, CouchDB, Firestore
JSON/BSON documents, flexible schema. Good for: user profiles, content management, catalogs with varying attributes.

Wide-Column

Examples: Cassandra, HBase, BigTable
Rows + dynamic columns. Optimised for time-series, write-heavy. Good for: IoT, analytics, activity feeds. Cassandra handles 1M writes/sec.

Graph DB

Examples: Neo4j, Neptune, JanusGraph
Nodes + edges + properties. Relationships are first-class. Good for: social graphs, fraud detection, recommendation engines.

Time-Series DB

Examples: InfluxDB, TimescaleDB, Prometheus
Optimised for timestamp-indexed data. Automatic downsampling and retention. Good for: metrics, monitoring, IoT sensor data.

Search Engine

Examples: Elasticsearch, OpenSearch, Solr
Inverted index, full-text search, fuzzy matching, aggregations. Not a primary DB — sync from main DB via events.

Indexing

Without an index, every query is a full table scan O(n). With an index, it's O(log n) for B-Tree or O(1) for hash.

-- Without index: full scan of 10M rows
SELECT * FROM orders WHERE user_id = 42;   -- scans ALL rows

-- With B-Tree index: log(10M) ≈ 23 comparisons
CREATE INDEX idx_orders_user ON orders(user_id);
SELECT * FROM orders WHERE user_id = 42;   -- ~23 comparisons

-- Composite index: column order matters! Left-prefix rule
CREATE INDEX idx_comp ON orders(user_id, created_at);
-- ✅ WHERE user_id = 42                (uses index)
-- ✅ WHERE user_id = 42 AND created_at > '2024-01-01' (uses index)
-- ❌ WHERE created_at > '2024-01-01'   (skips user_id prefix, full scan)
    

B-Tree — default; great for range queries, equality, ORDER BY
Hash — O(1) equality only; no range support
GIN (Generalized Inverted Index) — arrays, JSONB, full-text search in Postgres
Partial Index — index only rows matching a condition: WHERE status = 'active'
Covering Index — includes all SELECT columns; never touches heap table

Replication Strategies

Single-Leader (Primary-Replica)

One primary accepts all writes. Changes replicated to read replicas. Replicas can serve reads, offloading the primary. Used by: most RDBMS, Redis Sentinel.

Sync replication: write confirmed when replica acknowledges — no data loss, but latency doubles
Async replication: write confirmed immediately — fast, but replica lag means potential data loss on failover

Multi-Leader

Multiple nodes accept writes. Each replicates to others. Needed for multi-datacenter active-active. Used by: CouchDB, Google Docs real-time sync.

Write conflicts must be resolved (LWW, CRDTs, custom merge)
Complex to reason about; avoid if possible

Leaderless (Dynamo-style)

Any node accepts reads/writes. Uses quorum: write to W nodes, read from R nodes. If W+R > N, you have strong consistency.

Cassandra default: N=3, W=1, R=1 (eventual consistency)
Strong consistency: N=3, W=2, R=2 (quorum)

Sharding (Partitioning)

Split data across multiple DB instances. Each shard is an independent DB holding a subset of data.

Range sharding: user A-M on shard 1, N-Z on shard 2. Risk: hot shards
Hash sharding: hash(user_id) mod N. Even distribution but no range queries
Directory-based: lookup service maps key to shard. Flexible but adds a hop

Database Selection Guide

Need	Best Choice	Why
User accounts, orders, payments	PostgreSQL / MySQL	ACID, joins, complex queries
Session store, rate limiting	Redis	In-memory, O(1), TTL support
Product catalog, CMS	MongoDB	Flexible schema, nested docs
Social graph, recommendations	Neo4j / Neptune	Graph traversals are native
Write-heavy IoT / activity logs	Cassandra	1M+ writes/sec, no single point of failure
Full-text search, autocomplete	Elasticsearch	Inverted index, relevance scoring
Metrics, monitoring dashboards	InfluxDB / Prometheus	Built-in downsampling, fast range queries
Data warehouse, analytics	BigQuery / Redshift / Snowflake	Columnar, massively parallel

Caching

Caching is the single highest-leverage optimisation in distributed systems. A well-placed cache can reduce DB load by 90%+.

Cache Hierarchy

Client Browser Cache → (miss) → CDN Edge Cache → (miss) → API Gateway Cache → (miss) → App-Level Cache → (miss) → Redis / Memcached → (miss) → Database

Each layer should only be needed when the layer above it misses. A well-tuned system sees 90%+ of traffic absorbed by CDN + Redis.

Cache Patterns

Cache-Aside (Lazy)

App checks cache → miss → reads DB → writes cache → returns. Cache only contains accessed data. Most common. Risk: stale data if DB updated without invalidating cache.

Read-Through

Cache is the only data source from app's perspective. On miss, cache itself fetches and caches. Transparent to app. Used by: AWS ElastiCache with DAX.

Write-Through

Every write goes to cache AND DB synchronously. No stale data ever. Write latency doubles. Good for: user preferences, settings (infrequent writes, frequent reads).

Write-Behind (Write-Back)

Write to cache, return immediately, flush to DB asynchronously. Low write latency. Risk: data loss if cache crashes before flush. Good for: likes/views counters, analytics.

Refresh-Ahead

Cache proactively refreshes entries before they expire, based on predicted access patterns. Complex but eliminates cold misses for hot items. Used by: Netflix for content metadata.

Cache Problems & Solutions

Problem	Description	Real Example	Solutions
Cache Stampede	Many requests miss simultaneously after TTL expiry → DB flooded	Reddit front page after a popular post's cache expires	Mutex/lock on miss, probabilistic early expiry, background refresh
Cache Penetration	Requests for keys that will never exist in DB (e.g., invalid IDs) → always miss cache, always hit DB	Attacker querying random non-existent user IDs	Cache null results with short TTL, Bloom filter at gateway
Cache Avalanche	Many keys expire at same time → large spike on DB	Black Friday: product catalog cache all set to 1-hour TTL at deploy time	Jitter on TTL (±20%), persistent cache (no global TTL), staggered deploys
Hot Key	One key gets massive traffic (e.g., a celebrity's profile)	Justin Bieber problem on Twitter	Replicate hot key across N cache nodes, local in-process cache for ultra-hot keys
Cache Inconsistency	Cache and DB diverge — stale reads	User changes name but sees old name for 10 min	Write-through, event-driven invalidation, shorter TTL, version keys

Redis Data Structures (with Use Cases)

Structure	Operations	Use Case
`String`	GET, SET, INCR, EXPIRE	Sessions, counters, simple cache, rate limit counters
`Hash`	HGET, HSET, HMGET	User profile fields, product attributes
`List`	LPUSH, RPOP, LRANGE	Activity feed, job queues, chat history
`Set`	SADD, SISMEMBER, SUNION	Unique visitors, tags, friends list
`Sorted Set (ZSet)`	ZADD, ZRANGE, ZRANGEBYSCORE	Leaderboards, rate limiting (sliding window), priority queues
`Bitmap`	SETBIT, GETBIT, BITCOUNT	Daily active users, feature flags per user (512MB = 4B users)
`HyperLogLog`	PFADD, PFCOUNT	Approximate unique counts with <1% error, tiny memory (12KB for any cardinality)
`Geo`	GEOADD, GEORADIUS	Nearby drivers, location search
`Stream`	XADD, XREAD, XGROUP	Event log, lightweight Kafka alternative

Networking & APIs

API Design Patterns Compared

Style	Protocol	Payload	Best For	Avoid When
REST	HTTP/1.1+	JSON/XML	Public APIs, simple CRUD, well-understood semantics	Need real-time; complex nested queries
GraphQL	HTTP	JSON	Mobile clients (minimise bytes), flexible frontends, aggregating multiple APIs	Simple CRUD; when caching is critical (no GET)
gRPC	HTTP/2	Protobuf (binary)	Internal microservices, high throughput, streaming	Browser clients (needs grpc-web proxy)
WebSocket	TCP upgrade from HTTP	Any (text/binary)	Chat, live dashboards, multiplayer games, collaborative editing	Simple request-response; fire-and-forget
Server-Sent Events	HTTP	Text	One-way server push: notifications, live feeds, progress updates	Bidirectional; binary data
Webhooks	HTTP POST callback	JSON	Event notifications to third parties (Stripe payment events, GitHub PR hooks)	Real-time (client must have public endpoint)

REST API Best Practices

// Resource-oriented URLs — nouns, not verbs
GET    /users/42            ✅ get user
POST   /users               ✅ create user
PUT    /users/42            ✅ replace user (idempotent)
PATCH  /users/42            ✅ partial update (idempotent)
DELETE /users/42            ✅ delete user (idempotent)
GET    /getUserInfo?id=42    ❌ verb in URL, not RESTful

// Versioning — always version your API
GET /v1/users/42             ✅ URL versioning (visible, cacheable)
Accept: application/vnd.api+json; version=2  header versioning

// Pagination for list endpoints
GET /posts?page=2&limit=20   offset pagination (simple, but expensive on large offsets)
GET /posts?cursor=abc123&limit=20  cursor pagination (consistent, scalable)

// Consistent error responses
{ "error": { "code": "USER_NOT_FOUND", "message": "...", "request_id": "abc" } }
    

Long Polling vs WebSocket vs SSE

Short Polling

Client asks "any updates?" every N seconds. Simple but wasteful — most responses are empty. Good for: email clients checking every 30s.

Long Polling

Client asks, server holds connection open until update arrives, then responds. Client immediately re-polls. Simulates push over HTTP. Good for: notifications when WebSocket not available.

WebSocket

Full-duplex persistent TCP connection after HTTP upgrade handshake. Both client and server can send anytime. Best for: chat, games, live collaboration.

Server-Sent Events

Server pushes events to client over persistent HTTP GET. One-directional. Automatic reconnection built-in. Good for: Twitter live timeline, sports scores, progress bars.

Message Queues & Event Streaming

Async communication via queues is the backbone of scalable, resilient systems. It decouples services so they can fail and scale independently.

Why Queues? A Concrete Example

WITHOUT QUEUE (synchronous): User → API Server → Image Resizer → Email Service → Analytics → DB If any service is slow or down, the whole request fails. User waits 8 seconds. WITH QUEUE (asynchronous): User → API Server → Queue → [Image Resizer] ← each service picks up when ready ↓ └──→ [Email Service] ← can retry on failure Responds in <100ms └──→ [Analytics] ← can scale independently

Kafka vs RabbitMQ vs SQS

Feature	Apache Kafka	RabbitMQ	Amazon SQS
Model	Distributed log (topics/partitions)	Traditional broker (exchanges/queues)	Managed queue service
Message retention	Configurable (days/weeks), replayable	Until consumed	14 days max
Throughput	Millions/sec per cluster	20-50k msg/sec	Unlimited (managed)
Ordering	Per partition (strict)	Per queue	Standard: no; FIFO: yes
Routing	Simple (topic-based)	Complex (topic, direct, fanout, headers)	Simple
Consumer model	Pull; consumer groups share partitions	Push; competing consumers	Pull; long polling
Best for	Event sourcing, analytics pipelines, audit logs, stream processing	Task queues, RPC, complex routing, work distribution	Serverless, AWS ecosystem, simple queuing

Kafka Deep Dive

Topic: "order-events" (partitioned for parallelism) ┌─ Partition 0 ──────────────────────────────────────┐ │ offset: 0 1 2 3 4 5 6 7 8 ... │ ← append-only log └────────────────────────────────────────────────────┘ ┌─ Partition 1 ──────────────────────────────────────┐ │ offset: 0 1 2 3 4 ... │ └────────────────────────────────────────────────────┘ Producer hashes order_id to pick partition → same order always same partition → ordered per order Consumer Group A (payment service): each instance reads 1 partition Consumer Group B (analytics service): independent read position, can replay from offset 0

Key Insight Because Kafka stores messages on disk and tracks only offsets (not message acks), adding a new consumer service can replay ALL historical events. This is the foundation of Event Sourcing — your queue IS your audit log.

Distributed Systems

CAP Theorem — The Real Story

During a network partition (unavoidable in any distributed system), you must choose:

CP System (Consistent + Partition-Tolerant)

Returns error or waits rather than returning stale data. Sacrifices availability. Examples: HBase, ZooKeeper, etcd, MongoDB (in majority). Use when: banking, inventory management (can't oversell).

AP System (Available + Partition-Tolerant)

Returns best available data (possibly stale). Sacrifices consistency. Examples: Cassandra, CouchDB, DynamoDB (default). Use when: social feeds, shopping carts, DNS — stale data is acceptable.

PACELC Extension CAP only considers behaviour during partitions. PACELC adds: even without partitions, there's a Latency vs Consistency trade-off. DynamoDB: PA/EL (available during partition, low latency else). Spanner: PC/EC (consistent always, at latency cost).

Distributed Patterns

Circuit Breaker

Prevents calling a failing service. States: Closed (normal) → Open (fail fast after threshold) → Half-Open (probe after timeout). Prevents cascade failures. Used by: Netflix Hystrix, Resilience4j.

Saga Pattern

Distributed transactions without 2-Phase Commit. Sequence of local transactions; on failure, run compensating transactions backwards. Two flavours: Choreography (events) or Orchestration (central coordinator).

Bulkhead

Isolate failures like watertight compartments in a ship. Separate thread pools per dependency. If payment service hangs, it doesn't exhaust threads for search service.

Retry with Backoff

Retry failed requests with exponential backoff + jitter. Without jitter, all clients retry in sync → stampede. Formula: wait = min(cap, base × 2^attempt) + rand()

Idempotency Keys

Client sends unique key per logical operation. Server deduplicates using the key. Critical for payment APIs — ensures charging once even if request retried. Stripe, PayPal use this.

Outbox Pattern

Write to DB and outbox table atomically. Separate process reads outbox and publishes to queue. Guarantees at-least-once message delivery without distributed transactions.

Consistent Hashing — Why It Matters

Traditional hash: hash(key) mod N servers Add 1 server: mod 4 instead of 3 → 75% of keys remapped → cache invalidation Consistent hashing: keys and servers on a ring [0, 2^32) hash("user:42") = 500 → goes to first server clockwise → Server B (at pos 600) Add Server D (at pos 550): only keys between 500 and 550 move to D → ~25% remapped Remove Server B: only its keys move to next server → ~33% remapped (1/N)

Virtual nodes (vnodes): each physical server owns multiple points on the ring. Distributes load evenly even with heterogeneous servers.

Storage & CDN

Storage Types

Type	Description	Example	Use Case	Latency
In-Memory	RAM, volatile	Redis, Memcached	Cache, sessions, real-time leaderboards	~1 µs
Block Storage	Raw volumes, OS-level	AWS EBS, GCP PD	Database storage, OS boot volumes	~1 ms
File Storage (NAS)	Hierarchical filesystem	AWS EFS, NFS	Shared filesystems, CMS media libraries	~5 ms
Object Storage	Flat key-value for blobs	AWS S3, GCS, R2	Media, backups, ML datasets, static assets	~50-100 ms
Cold/Archive	Tape/glacier, infrequent	S3 Glacier, GCS Archive	Compliance backups, audit logs	Minutes to hours

CDN Architecture

Origin Server (US-East) ↑ cache miss (rare) ┌────────────────────────────────────┐ │ CDN Edge Network │ │ ┌──────────┐ ┌──────────────┐ │ │ │ PoP Dubai│ │ PoP Singapore│ │ │ │ ~10ms │ │ ~15ms │ │ │ └──────────┘ └──────────────┘ │ └────────────────────────────────────┘ ↑ cache hit (~95% of requests) User in Pakistan User in Malaysia

What to cache on CDN: Static assets (JS, CSS, images), HTML for SPAs, video segments, API responses with low variance
What NOT to cache: Personalised responses, payment pages, admin dashboards, frequently changing data
Cache-Control headers: max-age=31536000, immutable for versioned assets; no-cache for HTML
Cache invalidation: Content-hashed filenames (bundle.a3f9b2.js) are infinitely cacheable and invalidate naturally

Microservices Architecture

Microservices split a system into small, independently deployable services. Not always better than a monolith — understand the trade-offs.

Monolith vs Microservices

Dimension	Monolith	Microservices
Deployment	Deploy everything at once	Deploy services independently
Scaling	Scale everything or nothing	Scale only bottleneck services
Dev velocity (small team)	Fast — no network overhead	Slow — service contracts, infra complexity
Dev velocity (large team)	Slow — merge conflicts, coordination	Fast — teams own services independently
Failure isolation	One bug can take down everything	Failures contained to a service
Data	Shared DB — easy joins	Each service owns its DB — no direct joins
Debugging	Simple — single process	Hard — distributed traces needed
Start with	Always — extract later if needed	Only when team/scale demands it

Amazon's Rule of Thumb If a service can't be owned by a team that could be fed by 2 pizzas, it's too big. But also: don't start with microservices. Shopify, Stack Overflow, and GitHub run primarily on monoliths at scale.

Service Communication

Synchronous (HTTP/gRPC)

Service A calls Service B and waits for response. Simple, easy to reason about. Problem: if B is slow, A is slow. Creates temporal coupling.

Asynchronous (Queue/Events)

Service A publishes event. Service B processes when ready. No coupling. Service B can be down and catch up later. Problem: harder to trace, eventual consistency.

Service Mesh (Envoy/Istio)

Sidecar proxy handles: mTLS, retries, circuit breaking, distributed tracing, load balancing — transparently without app code changes. Operations team's best friend.

API Gateway

Single entry point: auth, rate limiting, routing, response aggregation, SSL termination. Clients don't need to know internal service topology. Examples: Kong, AWS API Gateway, nginx.

Security & Authentication

Authentication Patterns

Session Tokens

Server stores session. Client sends session ID cookie. Simple, revocable immediately. Problem: requires shared session store (Redis) for horizontal scaling. Used by: traditional web apps.

JWT (JSON Web Token)

Stateless — server signs payload, no storage needed. Client sends in Authorization header. Problem: can't revoke before expiry without a blocklist. Use short expiry (15 min) + refresh tokens.

OAuth 2.0

Delegated authorization. "Login with Google" flow: user authorises your app to access their Google data. Separate auth server issues tokens. Industry standard for third-party integrations.

API Keys

Long-lived opaque tokens for machine-to-machine auth. Simple, no expiry complexity. Hash before storing (treat like passwords). Rate limit per key. Used by: Stripe, SendGrid, Twilio.

JWT Flow

1. User logs in → server validates credentials 2. Server signs JWT: Header.Payload.Signature Payload = { user_id: 42, role: "admin", exp: now + 15min } Signature = HMAC_SHA256(base64(header) + "." + base64(payload), SECRET_KEY) 3. Client stores JWT (memory or httpOnly cookie — NOT localStorage for sensitive apps) 4. Client sends: Authorization: Bearer eyJhbGci... 5. Server verifies signature (no DB lookup needed!) and checks exp 6. Refresh flow: access token (15 min) + refresh token (7 days, stored in DB, revocable)

Security Essentials Checklist

Always use HTTPS — TLS everywhere, including internal services
Hash passwords with bcrypt/Argon2 — never SHA1/MD5, never plain text
Input validation — validate server-side, never trust client input
SQL injection prevention — always use parameterised queries, never string concat
Rate limiting on auth endpoints — prevent brute force; lock after N failures
Secrets management — AWS Secrets Manager / Vault; never hardcode in code
Principle of least privilege — DB user only has SELECT on read-only service
CSRF protection — SameSite cookie attribute or CSRF tokens

Observability & Monitoring

You can't fix what you can't see. Observability is the ability to infer internal system state from external outputs.

The Three Pillars

Metrics

Aggregated numbers over time. CPU%, RPS, error rate, P99 latency, cache hit rate. Tool: Prometheus (scraping) + Grafana (dashboards). Alert on SLO violations.

Logs

Immutable, timestamped event records. Structured logs (JSON) are searchable. Ship to: ELK Stack (Elasticsearch + Logstash + Kibana) or Loki + Grafana. Include request IDs for correlation.

Traces

Follow a single request across multiple services. Each span records: service, operation, duration, status, tags. Tool: Jaeger, Zipkin, AWS X-Ray. Critical for diagnosing microservice latency.

Key Metrics to Track

Category	Metrics	Typical Alert Thresholds
Traffic	Requests/sec, active connections	Alert on sudden 2× spike or drop
Latency	P50, P95, P99 response time	P99 > 500ms for APIs, P99 > 200ms for search
Errors	5xx rate, exception count, failed jobs	5xx rate > 0.1% of traffic
Saturation	CPU%, memory%, disk%, queue depth	CPU > 80% sustained, queue > 1M items
Database	Query latency, connection pool usage, replication lag	Replica lag > 30s, pool usage > 90%
Cache	Hit rate, eviction rate, memory usage	Hit rate drops below 90%
Business	Orders/min, signups/hour, revenue/min	50% drop in orders → page on-call

Google's Four Golden Signals Latency, Traffic, Errors, Saturation (LTES). If you monitor only these four, you'll catch most production incidents.

Classic Design Problems

These are the 12 most commonly asked system design problems. For each, understand the key decisions — not just the answer.

URL Shortener (Bitly)

Read: 100:1 Low latency Write: ~100 URLs/sec

User → [API] → Short Code Generator → DB (shortCode → longURL) → Cache (Redis: shortCode → longURL, TTL 24h) GET /abc123 → check Redis → if hit: 302 redirect → if miss: check DB → cache + redirect

Key design choice: short code generation. Options: (a) auto-increment ID → base62 encode — simple, sequential but predictable; (b) random UUID → take 7 chars — unpredictable but needs collision check; (c) MD5(longURL) → take 7 chars — deterministic, same URL always same code
Schema: id, short_code (indexed), long_url, user_id, created_at, expires_at, click_count
Redirect type: 301 Permanent (browser caches → can't track clicks) vs 302 Temporary (always hits server → track every click) — Bitly uses 302
Analytics: Async — write click events to Kafka, process in batches, store in Cassandra by (short_code, date)

Twitter / Social Feed

Fan-out problem Celebrity edge case Read-heavy

HYBRID APPROACH (Twitter's actual strategy): Regular user posts tweet: Tweet DB → Fan-out Service → pushes to all followers' timeline caches (Redis) Read: load from user's pre-computed timeline cache → O(1) Celebrity (10M followers) posts tweet: Tweet DB only (no fan-out — 10M writes would take too long) Read: merge pre-computed timeline + fetch celebrity's recent tweets → O(followers user follows who are celebrities)

Timeline storage: Redis Sorted Set per user, score = timestamp, member = tweet_id. Fetch top 200 tweets → hydrate from tweet cache
Who is a celebrity? Threshold: > 1M followers = pull model. Twitter uses ~10K as threshold internally
Tweet storage: MySQL with Vitess sharding (Twitter's actual stack), sharded by tweet_id
Media: Images/videos → S3 → CDN (CloudFront). Never stored in DB.

WhatsApp / Chat System

Real-time Offline handling E2E encryption

Online-to-Online: Alice → WebSocket → Chat Server A → Message Queue → Chat Server B → WebSocket → Bob Offline delivery: Alice → Chat Server A → Message Queue → Storage DB → Push notification (APNs/FCM) → Bob's device Bob comes online → WebSocket connects → fetches queued messages → Server deletes from queue Group message (100 members): Alice → Chat Server → Fan-out to 100 member message queues (or direct delivery if online) For large groups (1000+): different strategy — pull model on load

Connection management: Each user has WebSocket to one Chat Server. Need to know which server holds which user's connection → store in Redis: user_id → server_id
Read receipts: Small event sent back through queue: {type: "read", msg_id: X, user_id: Y, ts: T}
Message ordering: Logical timestamps / sequence IDs per conversation. Cassandra for message storage: partition key = conversation_id, cluster key = message_id (time-ordered)
E2E encryption: Signal Protocol — keys generated on device, server never sees plaintext

YouTube / Video Platform

Storage-heavy Transcoding pipeline CDN-delivered

Upload Pipeline: User → Resumable Upload API → Raw Storage (S3) → Kafka: "video.uploaded" event → Transcoding Workers (pick up from Kafka) → FFmpeg: 360p, 720p, 1080p, 4K in H.264 + VP9 → Thumbnail generation → Store outputs → S3 → Kafka: "video.ready" event → CDN pre-warming (popular content pushed to edge) Watch Pipeline: User requests video → API returns CDN URLs for each quality tier Player (HLS/DASH) → picks quality based on bandwidth → requests 2-10s segments from CDN → segment not in CDN → CDN fetches from S3 → cached for next viewers

Resumable uploads: Clients upload in chunks (5MB). Server tracks progress. Failed upload resumes from last chunk. YouTube uses tus protocol.
Adaptive bitrate: HLS/DASH splits video into segments. Manifest file (.m3u8) lists available quality tiers. Player switches dynamically — smooth even on variable connections.
View count: Not real-time. Batch updated from Kafka stream via Flink. Prevents race conditions at scale.
Recommendation: Separate ML service. Reads from Cassandra (watch history) + Redis (trending). Serves pre-computed recommendations.

Uber / Ride-Sharing

Geospatial Real-time matching Event-driven

Driver location updates (every 4s): Driver App → Location Service → Redis GEOADD drivers {lng, lat, driver_id} Rider requests ride: Rider App → Matching Service → GEORADIUS (find drivers within 2km) → Score by distance + rating + car type → Send offer to top 3 drivers via WebSocket → First to accept → trip created → Notify rider → track driver on map Trip events (Kafka): ride.requested → ride.driver_assigned → ride.started → ride.completed → payment.triggered

Geohash: Encode (lat, lng) as string. "ww8p1r4t8" → nearby cells share prefix. Range query = prefix search. Redis GEO uses sorted set with geohash score internally.
Surge pricing: Real-time supply/demand ratio per geohash cell. Stream processing on Kafka with Flink.
ETA calculation: Not just distance — real-time traffic data from map service + historical patterns per time of day.

Google Drive / File Storage

Sync across devices Deduplication Collaboration

Upload: Client splits file into chunks (4MB each) Hash each chunk (SHA-256): skip if already on server (deduplication) Upload only new chunks → S3 Metadata DB: file_id, name, owner, chunks[], version, modified_at Sync: Client → WebSocket → Notification Service → "file X changed" Client fetches delta: which chunks changed? download only those chunks Local reassembly Conflict resolution: Dropbox: last-write-wins + keeps conflicted copy (e.g. "file (John's conflicted copy 2024)") Google Docs: OT (Operational Transformation) / CRDT for real-time collaborative editing

Rate Limiter Service

Distributed Token bucket

// Redis-based sliding window rate limiter
function isAllowed(userId, limit=100, windowSecs=60):
  key = "ratelimit:{userId}"
  now = currentTimestamp()
  windowStart = now - windowSecs * 1000

  // Atomic pipeline
  pipeline = redis.pipeline()
  pipeline.ZREMRANGEBYSCORE(key, 0, windowStart)    // remove old entries
  pipeline.ZADD(key, now, now)                       // add current request
  pipeline.ZCARD(key)                                // count requests in window
  pipeline.EXPIRE(key, windowSecs)                   // auto-cleanup
  [_, _, count, _] = pipeline.execute()

  return count <= limit   // true = allowed, false = 429
    

Design Interview Framework

A repeatable process for any design question. Interviewers evaluate your thinking process, not just the final diagram.

The 5-Step Framework (45 minutes)

Clarify Requirements (5 min)

Never start designing before asking questions. Functional requirements: what does the system DO? Non-functional: scale, latency SLA, availability, consistency needs, geographic distribution?

Questions to ask: "How many daily active users? What's the read/write ratio? Do we need global distribution? What's the acceptable latency? Does the feed need to be real-time?"

Capacity Estimation (5 min)

Estimate QPS (write + read), storage per day/year, bandwidth. These numbers drive your architecture choices. Don't skip this — it reveals if you need 1 server or 1,000.

High-Level Design (10 min)

Draw the major boxes: clients, DNS/CDN, load balancer, app servers, cache, DB, queues, object store. Connect them with arrows and label data flows. Get alignment before diving deep.

Deep Dive (15 min)

Pick the most interesting or complex component. Design the DB schema, API endpoints, or the core algorithm. The interviewer often directs you here. Show depth in the area that matters most.

Scale & Trade-offs (10 min)

Identify bottlenecks. How does the design handle 10× traffic? What are the failure modes? What did you trade off — and why? Great engineers articulate trade-offs, not just solutions.

Standard Component Checklist

Always Consider

☐ DNS + CDN for static assets
☐ Load balancer (L4 or L7)
☐ Stateless app servers (scale-out)
☐ Primary DB + read replicas
☐ Cache layer (Redis)
☐ Async queue for heavy tasks
☐ Object store for media (S3)

Add When Needed

☐ Search index (Elasticsearch)
☐ Auth service / API gateway
☐ Rate limiter
☐ Notification service (push/email)
☐ Monitoring (Prometheus + Grafana)
☐ Distributed tracing (Jaeger)
☐ Feature flags (LaunchDarkly)

Avoid Premature

☐ Microservices for small teams
☐ Multiple DB types unless needed
☐ Custom consensus algorithms
☐ ML pipelines before product-market fit
☐ Multi-region before 1M users

Thinking & Articulation Framework

System design is as much about how you think out loud as what you design. This section is about developing the mental models and communication skills that separate great engineers from good ones.

The Mental Model Ladder

Move through these levels for any component you're designing:

What does it do? (Function)

State the job in one sentence. Don't overcomplicate. If you can't explain it simply, you don't understand it yet.

Example: "Redis is an in-memory key-value store. I'm using it as a cache to avoid hitting the DB for user sessions."

Why this choice? (Trade-offs)

Every choice excludes alternatives. Name what you're trading off. This shows you've thought beyond "I know Redis".

Example: "I chose Redis over Memcached because I need sorted sets for the leaderboard feature, not just strings. Memcached would have been simpler for pure caching."

What breaks at scale? (Failure modes)

Every design has weaknesses. Name them before the interviewer does. This demonstrates systems-level thinking.

Example: "Single Redis instance is a SPOF and limited to one machine's RAM. At our scale, I'd shard with Redis Cluster across 6 nodes, giving us failover and ~6× memory."

What would you change with 10× load? (Evolution path)

Good designs have a clear growth path. You shouldn't need to redesign from scratch to handle more load.

Example: "At 10× load, the DB becomes the bottleneck. I'd add read replicas first. At 100×, I'd shard by user_id using consistent hashing. At 1000×, consider a NoSQL migration for the hot path."

Articulation Patterns to Practice

The Trade-off Statement

"I'm choosing X over Y because our workload is [read/write/latency] heavy. The downside of X is [Z], which I'll mitigate by [solution]. If requirements change towards [different constraint], Y would be better."

The Scale Escalation

"For our initial scale of [N users], [simple solution] works. As we grow to [10N], the bottleneck becomes [component] because [reason]. I'd address that by [next level solution]."

The Constraint Surfacing

"Before I dive in — can I check: is the primary concern here latency, throughput, or consistency? That'll change my approach significantly. For a payment system I'd prioritise consistency; for a social feed I'd accept eventual consistency for lower latency."

The Failure Acknowledgement

"This design has a weakness: [specific failure mode]. In production I'd add [mitigation]. I'm leaving it simplified for now — should I detail the production-grade version?"

Common Thinking Traps

Trap	What It Looks Like	Fix
Gold plating	Designing a Kubernetes cluster with 12 microservices for a URL shortener MVP	Ask scale first. Match complexity to requirements.
Buzzword dropping	"I'd use blockchain and Kubernetes and Kafka and GraphQL" without reasoning	Every technology should solve a stated problem. Name the problem first.
Skipping the obvious	Jumping to sharding without considering "can a single Postgres handle this?"	Start simple. Postgres handles millions of rows fine with good indexing.
Ignoring failure modes	Designing the happy path only	For every component, ask "what happens when this fails?"
Premature optimisation	"We'll need CDN, multi-region, eventual consistency..." for day-1 product	Design for current scale, show awareness of future scaling steps.
Silent designing	Drawing boxes without explaining why	Think out loud. Narrate your decisions. The interviewer evaluates process, not just output.

Building Intuition: The 5-Whys Drill

When you encounter any design decision, drill into the "why" five levels deep. This builds permanent mental models:

Example Drill

"Why does Uber use consistent hashing for driver location?"

Why 1: To distribute driver keys across multiple Redis nodes.
Why 2: Because one Redis instance can't hold 5M driver locations.
Why 3: Because RAM is finite and ~1M drivers × 100 bytes = 100MB — easily fits. But they want horizontal scale too.
Why 4: Because consistent hashing means adding a new node only remaps 1/N drivers, not all of them.
Why 5: Because if adding a node invalidated all keys, you'd get a thundering herd hitting the DB during scaling events — exactly when you need stability most.

How to Practice System Design

Deliberate practice beats passive reading. Here's a structured roadmap from beginner to advanced.

Learning Roadmap

Month 1 — Foundations

Read DDIA Chapters 1–6 (storage, encoding, replication, partitioning). Design a URL shortener and a key-value store from scratch. Watch ByteByteGo's fundamentals playlist. Build latency intuition.

Month 2 — Core Systems

Design Twitter, WhatsApp, and YouTube. Read Dynamo and Bigtable papers. Study consistent hashing, CAP theorem, and caching patterns deeply. Start mock interviews with peers.

Month 3 — Advanced Patterns

Read DDIA Chapters 7–12 (transactions, distributed systems). Design Google Drive, Uber, and a search engine. Study Raft consensus. Read Netflix/Discord/Airbnb engineering blogs.

Month 4+ — Mastery

Design novel systems (TypeAhead/Autocomplete, Notification Service, Ad Click Aggregator). Do 2 timed mock interviews/week. Read primary sources: papers, engineering blogs. Teach concepts to others.

Weekly Practice Template

Recommended Weekly Routine

5 hours/week split

Monday (1 hr): Read one chapter of DDIA or one engineering blog post. Take notes.
Wednesday (1.5 hr): Design one system from scratch. Timer = 45 min. Then compare to reference solution.
Friday (1 hr): Mock interview with a peer. One person designs, one asks clarifying questions.
Weekend (1.5 hr): Deep dive into one component from the week's design (e.g., how does Redis cluster work internally?).

The Solo Practice Loop

Pick a system

Use this list: URL Shortener → KV Store → Twitter → Pastebin → Instagram → WhatsApp → Uber → YouTube → Google Drive → TypeAhead → Notification Service → Web Crawler → Distributed Cache

Set a 45-minute timer and design without references

Use Excalidraw or paper. Write out your requirements, estimation, high-level design, and one deep dive. Narrate out loud as if in an interview. This is uncomfortable — that's the point.

Compare with the reference solution

ByteByteGo, Alex Xu's book, or Grokking. Note every component you missed or over-engineered. Write down 3 specific gaps.

Deep dive one gap

Spend 30 minutes understanding one thing you got wrong. Read the relevant section of DDIA, a blog post, or watch a video. Build the mental model from first principles.

Spaced repetition

Revisit each system 1 week later and 1 month later. Can you reproduce the design in 15 minutes? True mastery = fast retrieval, not recognition.

Real Engineering Blog Reading Strategy

Read engineering blogs — but read them analytically, not passively:

Before reading: "What problem is this company solving? What's their scale?"
During reading: "What trade-off did they make? What alternatives did they consider? What surprised me?"
After reading: Write 3 bullet points from memory. If you can't, you didn't absorb it.

Top Blogs to Follow Netflix Tech Blog (streaming, chaos engineering), Discord Engineering (WebSocket at scale), Airbnb Engineering (distributed systems), Cloudflare Blog (networking/CDN), Martin Fowler (patterns), High Scalability (curated case studies).

Tooling Ecosystem

Know the landscape. In real systems and interviews, naming the right tools (and knowing why) demonstrates production experience.

Diagramming & Design

✏️

Excalidraw

Diagramming

Free, open-source whiteboard with hand-drawn aesthetic. Best for quick architecture diagrams. Collaboration built-in. Ideal for system design interviews.

excalidraw.com — free, no signup

🔷

draw.io / diagrams.net

Diagramming

Feature-rich free diagramming. AWS/GCP/Azure icon packs included. Integrates with Confluence, Google Drive. Professional look for documentation.

app.diagrams.net

🎨

Figma / FigJam

Whiteboard

FigJam for collaborative whiteboarding. Sticky notes, connectors, vote stickers. Used for design sprints and team architecture reviews. Free tier available.

figma.com

🏗️

Miro

Whiteboard

Enterprise-grade collaborative board. Architecture templates, flowcharts, user story maps. Best for team workshops. Free tier limited to 3 boards.

miro.com

Databases & Storage

🐘

PostgreSQL

Relational DB

Most advanced open-source RDBMS. JSONB support, full-text search, window functions, advanced indexing. Default choice for relational data. Scales to tens of TB.

postgresql.org

🍃

MongoDB

Document DB

Leading document database. Flexible schemas, Atlas cloud service, built-in replication and sharding. ACID transactions in v4.0+. Great for content/catalog apps.

mongodb.com

🔴

Redis

In-Memory / Cache

The Swiss Army knife of system design. Cache, session store, pub/sub, rate limiter, geospatial index, leaderboard, queue. Sub-millisecond latency. Redis Cluster for horizontal scaling.

redis.io

⚡

Cassandra

Wide-Column DB

Masterless, peer-to-peer. No single point of failure. Handles 1M+ writes/sec. Linear horizontal scaling. Used by Netflix, Instagram, Discord (billions of messages). Partition key selection is critical.

cassandra.apache.org

🔍

Elasticsearch

Search Engine

Distributed search and analytics. Inverted index, relevance scoring (BM25), fuzzy matching, aggregations. ELK stack for logging. Not a primary DB — sync from main DB.

elastic.co

☁️

Amazon S3

Object Storage

Industry-standard object store. 11 nines durability, unlimited scale, lifecycle policies, versioning. Competitors: GCS, Azure Blob, Cloudflare R2 (no egress fees). Default for media storage.

aws.amazon.com/s3

Message Queues & Streaming

📨

Apache Kafka

Event Streaming

The standard for high-throughput event streaming. Persistent, replayable log. Consumer groups, compaction, exactly-once semantics. Confluent Cloud for managed. Used by Uber, Airbnb, LinkedIn.

kafka.apache.org

🐇

RabbitMQ

Message Broker

AMQP protocol. Exchange types: direct, fanout, topic, headers. Dead letter queues, priority queues, delayed messages. Best for complex routing and task queues.

rabbitmq.com

☁️

Amazon SQS + SNS

Managed Queue

SQS: managed queue (standard + FIFO). SNS: pub/sub fanout to SQS, Lambda, email, SMS. Perfect for serverless architectures. No infrastructure to manage.

aws.amazon.com/sqs

Infrastructure & Orchestration

☸️

Kubernetes (K8s)

Container Orchestration

Automates deployment, scaling, and management of containerised apps. Pods, services, deployments, ingress, HPA (auto-scaling). Industry standard for microservices. Managed: EKS, GKE, AKS.

kubernetes.io

🐳

Docker

Containerisation

Package app + dependencies into portable containers. Consistent environments from dev to prod. Foundation of modern deployment pipelines. Docker Compose for local multi-service development.

docker.com

🌿

Terraform

Infrastructure as Code

Declarative infra provisioning across AWS, GCP, Azure. Define infrastructure in HCL, version in Git, apply with plan/apply. State management tracks actual vs desired infra.

terraform.io

🔗

Istio / Envoy

Service Mesh

Sidecar proxies for every service. Automatic mTLS, circuit breaking, retries, load balancing, distributed tracing — without changing app code. Envoy is the proxy; Istio is the control plane.

istio.io

Observability & Monitoring

📊

Prometheus + Grafana

Metrics & Dashboards

Prometheus scrapes metrics endpoints, stores time-series. Grafana visualises with dashboards and alerts. The open-source standard for infrastructure monitoring. PromQL for flexible querying.

grafana.com

🔎

Jaeger / Zipkin

Distributed Tracing

Trace requests across microservices. Visualise call graphs, find latency sources. Jaeger (Uber-created, CNCF) and Zipkin (Twitter-created) both use OpenTelemetry standard. Datadog APM for managed.

jaegertracing.io

📝

ELK Stack

Log Management

Elasticsearch + Logstash + Kibana. Ingest, transform, search, and visualise logs. Filebeat for shipping. Managed: Elastic Cloud, AWS OpenSearch. Grafana Loki as lightweight alternative.

elastic.co/elk-stack

🔔

PagerDuty / OpsGenie

Incident Management

On-call scheduling, alert routing, escalation policies. Integrates with Prometheus, Datadog, CloudWatch. Reduces alert fatigue with intelligent grouping and suppression.

pagerduty.com

API Gateways & Load Balancers

🦁

nginx

Load Balancer / Proxy

High-performance HTTP server and reverse proxy. Load balancing, SSL termination, caching, rate limiting. Powers 40%+ of the web. Lightweight and battle-tested.

nginx.org

🦅

Kong / AWS API Gateway

API Gateway

Kong: open-source, plugin-based (auth, rate limiting, logging, caching). AWS API Gateway: managed, integrates natively with Lambda/ECS. Both handle routing, auth, and transformation.

konghq.com

⚖️

HAProxy

Load Balancer

Extremely performant L4/L7 load balancer. Sub-millisecond overhead. Health checks, sticky sessions, ACLs. Used by GitHub, Reddit. Often outperforms nginx for pure LB workloads.

haproxy.org

Cloud Providers Quick Reference

Category	AWS	Google Cloud	Azure
Compute	EC2, Lambda, ECS/EKS	GCE, Cloud Run, GKE	VMs, Functions, AKS
Object Storage	S3	Cloud Storage	Blob Storage
Relational DB	RDS, Aurora	Cloud SQL, Spanner	Azure SQL, CosmosDB
NoSQL / Cache	DynamoDB, ElastiCache	Firestore, Memorystore	CosmosDB, Cache for Redis
Queue / Streaming	SQS, Kinesis, MSK	Pub/Sub, Dataflow	Service Bus, Event Hubs
CDN	CloudFront	Cloud CDN	Azure CDN
Load Balancer	ALB / NLB	Cloud Load Balancing	Application Gateway
Search	OpenSearch Service	Vertex AI Search	Azure Cognitive Search

References & Resources

📖 Book — Essential

Designing Data-Intensive Applications

Martin Kleppmann (O'Reilly, 2017). The definitive resource. Deep dives on storage engines, replication, consistency, and stream processing. Read once, reference forever.

dataintensive.net →

📖 Book — Interview

System Design Interview Vol. 1 & 2

Alex Xu. Vol 1 covers 16 systems; Vol 2 covers 13 more complex ones. Step-by-step with diagrams. Best book for interview preparation.

📖 Book — Patterns

Building Microservices (2nd Ed)

Sam Newman (O'Reilly). Comprehensive guide to microservice patterns: decomposition, communication, data management, observability.

🎬 YouTube

ByteByteGo

Alex Xu's channel. Animated explainers of system design concepts and architecture patterns. Free, high quality, beginner-friendly.

youtube.com/@ByteByteGo →

🎬 YouTube

Gaurav Sen

Deep conceptual explanations. Especially good for consistent hashing, messaging systems, and distributed system fundamentals.

youtube.com/@gkcs →

🎬 YouTube

Hussein Nasser

Backend engineering, database internals, proxy servers, networking. Very practical and hands-on. Covers HTTP/2, gRPC, WebSocket deeply.

youtube.com/@hnasr →

📚 Course

Grokking System Design — Educative

Structured course with 20+ systems, interactive diagrams, and design considerations. Good for structured learners.

educative.io →

🌐 GitHub Repo

System Design Primer

Donne Martin's massive open-source resource. 270k+ stars. Covers everything with diagrams, Anki flashcards, and coding problems.

github.com/donnemartin →

🌐 Website

High Scalability Blog

Real-world architecture breakdowns of how companies (Pinterest, Twitter, Netflix) built and scaled their systems.

highscalability.com →

📰 Newsletter

ByteByteGo Newsletter

Weekly system design posts. Covers new topics not in the book. Excellent diagrams. Free tier available.

blog.bytebytego.com →

📄 Paper

Amazon Dynamo (2007)

Foundational paper on highly available key-value storage. Introduced consistent hashing + vector clocks. Basis for Cassandra and DynamoDB.

allthingsdistributed.com →

📄 Paper

Google Bigtable (2006)

Describes the wide-column storage system behind Google's web indexing. Foundation for HBase and Cassandra's data model.

research.google →

📄 Paper

Raft Consensus Algorithm

Understandable consensus. Diego Ongaro & Ousterhout, 2014. Powers etcd (Kubernetes backbone), CockroachDB, TiKV. More readable than Paxos.

raft.github.io →

📄 Paper

Google Spanner (2012)

Globally distributed SQL database with external consistency. Uses TrueTime (GPS/atomic clocks) to achieve linearisability at global scale.

research.google →

🌐 Blog

Netflix Tech Blog

Chaos engineering, streaming at scale, microservices patterns, Hystrix. Essential reading for resilience and large-scale system design.

netflixtechblog.com →

🌐 Blog

Discord Engineering Blog

WebSocket at millions of concurrent connections, migrating from Cassandra to ScyllaDB, storing billions of messages. Excellent case studies.

discord.com/blog →

Table of Contents

Fundamentals & Estimation

The Core Properties

Numbers Every Engineer Should Know

Back-of-Envelope: Twitter Example

Scalability

Horizontal vs Vertical Scaling

Load Balancing Deep Dive

Rate Limiting

Auto-Scaling

Databases

SQL vs NoSQL

NoSQL Database Types

Indexing

Replication Strategies

Database Selection Guide

Caching

Cache Hierarchy

Cache Patterns

Cache Problems & Solutions

Redis Data Structures (with Use Cases)

Networking & APIs

API Design Patterns Compared

REST API Best Practices

Long Polling vs WebSocket vs SSE

Message Queues & Event Streaming

Why Queues? A Concrete Example

Kafka vs RabbitMQ vs SQS

Kafka Deep Dive

Distributed Systems

CAP Theorem — The Real Story

Distributed Patterns

Consistent Hashing — Why It Matters

Storage & CDN

Storage Types

CDN Architecture

Microservices Architecture

Monolith vs Microservices

Service Communication

Security & Authentication

Authentication Patterns

JWT Flow

Security Essentials Checklist

Observability & Monitoring

The Three Pillars

Key Metrics to Track

Classic Design Problems

URL Shortener (Bitly)

Twitter / Social Feed

WhatsApp / Chat System

YouTube / Video Platform

Uber / Ride-Sharing

Google Drive / File Storage

Rate Limiter Service

Design Interview Framework

The 5-Step Framework (45 minutes)

Clarify Requirements (5 min)

Capacity Estimation (5 min)

High-Level Design (10 min)

Deep Dive (15 min)

Scale & Trade-offs (10 min)

Standard Component Checklist

Thinking & Articulation Framework

The Mental Model Ladder

What does it do? (Function)

Why this choice? (Trade-offs)

What breaks at scale? (Failure modes)

What would you change with 10× load? (Evolution path)

Articulation Patterns to Practice

Common Thinking Traps

Building Intuition: The 5-Whys Drill

"Why does Uber use consistent hashing for driver location?"

How to Practice System Design

Learning Roadmap

Month 1 — Foundations

Month 2 — Core Systems

Month 3 — Advanced Patterns

Month 4+ — Mastery

Weekly Practice Template

5 hours/week split