System Design & Architecture

System Design Fundamentals: Complete Terminology Guide for Beginners

107 min read
Pawan Kumar
#System Design #Terminology #Fundamentals #Architecture #Interview Prep #Distributed Systems
System Design Fundamentals: Complete Terminology Guide for Beginners

System Design Fundamentals: Complete Terminology Guide for Beginners

I remember my first system design interview. The interviewer asked, “How would you design Instagram?” I froze. Not because I didn’t use Instagram daily, but because I didn’t know where to start. Should I talk about databases? Load balancers? Microservices? The terminology alone felt like a foreign language.

I nodded along when the interviewer mentioned “eventual consistency” and “horizontal scaling,” pretending I understood. I didn’t get the job. That failure taught me something valuable: system design isn’t about memorizing solutions—it’s about understanding the vocabulary and knowing when to use each concept.

Three years later, I’m now the one conducting these interviews. I see the same confusion in candidates’ eyes that I once had. Here’s what I wish someone had told me: system design has a finite set of building blocks. Once you understand these core concepts and their terminology, designing any system becomes a matter of combining the right pieces.

This guide is your complete reference. We’ll cover every essential term, explain what it means in plain English, show you real-world examples, and help you understand when to use each concept. Think of this as your system design dictionary—bookmark it, reference it, and watch these terms become second nature.


What is System Design?

Let’s start with the basics. System design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements.

In simpler terms? It’s figuring out how to build software that works at scale. Not just for 100 users, but for millions. Not just for today, but for years to come.

Why does it matter?

When Netflix streams to 200 million subscribers simultaneously, that’s system design. When Google returns search results in 0.2 seconds from billions of web pages, that’s system design. When Uber matches you with a driver in seconds across a city of millions, that’s system design.

Companies don’t just want engineers who can write code—they want engineers who can architect systems that handle real-world complexity. That’s why system design interviews are standard at companies like Google, Amazon, Facebook, and Netflix.

What makes system design challenging?

You’re not building for perfect conditions. You’re building for:

  • Servers that crash
  • Networks that fail
  • Traffic that spikes unexpectedly
  • Data that grows exponentially
  • Users spread across the globe
  • Budgets that aren’t unlimited

System design is about making informed trade-offs. Every decision has consequences. Choose consistency over availability? Your system might go down during network partitions. Choose availability over consistency? Users might see stale data. There’s no perfect solution—only solutions that fit your specific requirements.

Let’s start building your vocabulary.


Requirements Analysis

🎯 Foundation of Every System

Before designing any system, you need to understand what you're building. Requirements fall into two categories: functional and non-functional.

Functional Requirements

What the System Should Do

Functional requirements define what the system should do. These are the features and behaviors users interact with.

Think of it as: The “what” of your system.

Examples for Twitter:

  • Users can post tweets (280 characters)
  • Users can follow other users
  • Users can see a timeline of tweets from people they follow
  • Users can like and retweet
  • Users can search for tweets and users

Examples for Uber:

  • Riders can request rides
  • Drivers can accept ride requests
  • Real-time location tracking
  • Fare calculation
  • Payment processing

Why it matters: Functional requirements determine your data model, APIs, and core features. Get these wrong and you’re building the wrong product.

Real-world example: When Instagram added Stories, that was a new functional requirement. They had to design storage for temporary content, build a new API, and handle the increased traffic.

Non-Functional Requirements

How Well the System Should Perform

Non-functional requirements define how the system should perform. These are the quality attributes that make your system production-ready.

Think of it as: The “how well” of your system.

Key Non-Functional Requirements:

1. Performance

  • Latency: How fast does the system respond? (Target: < 200ms for web, < 100ms for mobile)
  • Throughput: How many requests can it handle per second?

Example: Google Search must return results in under 0.5 seconds. That’s a performance requirement.

2. Scalability

  • Can the system handle growth?
  • 1,000 users today, 1 million next year?

Example: Instagram went from 25,000 users at launch to 1 million in 2 months. Their system had to scale 40x.

3. Availability

  • What percentage of time is the system operational?
📊 The Nines of Availability
99.9% 8.76 hours downtime per year
99.99% 52.56 minutes downtime per year
99.999% 5.26 minutes downtime per year

Example: AWS promises 99.99% availability for S3. That’s their SLA (Service Level Agreement).

4. Reliability

  • Does the system work correctly even when things fail?
  • Can it recover from crashes?

Example: Netflix’s Chaos MonkeyA tool developed by Netflix that randomly terminates instances in production to test system resilience and ensure services can withstand failures. Part of the Simian Army suite.Learn more → randomly kills servers in production to test reliability.

5. Consistency

  • Do all users see the same data?
  • How quickly do updates propagate?

Example: Bank transactions need strong consistency. If you transfer $100, both accounts must update or neither does.

6. Security

  • Is data protected from unauthorized access?
  • Are communications encrypted?

Example: WhatsApp uses end-to-end encryption. Even WhatsApp can’t read your messages.

7. Maintainability

  • How easy is it to fix bugs and add features?
  • Is the code well-organized?

Example: Airbnb moved from monolith to microservices to improve maintainability. Now teams can deploy independently.

Why it matters: Non-functional requirements drive your architecture decisions. Need low latency? You’ll need caching and CDNs. Need high availability? You’ll need redundancy and failover.

Real-world trade-off: Facebook chose availability over consistency for likes. When you like a post, it might not appear immediately to everyone. That’s eventual consistency—they prioritized keeping the system available over instant consistency.


Design Levels: HLD vs LLD

System design operates at two levels of abstraction. Understanding the difference is crucial for interviews and real-world projects.

High-Level Design (HLD)

What it is: The big picture architecture showing major components and how they interact.

Focus areas:

  • System components (servers, databases, caches, load balancers)
  • Data flow between components
  • Technology choices (SQL vs NoSQL, REST vs GraphQL)
  • Scalability patterns
  • Infrastructure layout

Think of it as: The blueprint of a house showing rooms, doors, and how they connect.

What you define in HLD:

  • Client applications (web, mobile)
  • API servers
  • Load balancers
  • Application servers
  • Caching layer
  • Database architecture
  • Message queues
  • External services (CDN, payment gateway)

Real-world example: Netflix’s HLD shows:

  • CDN for video delivery (CloudFront)
  • Microservices for different features
  • Cassandra for data storage
  • Kafka for event streaming
  • Elasticsearch for search
  • Redis for caching

When you need HLD:

  • System design interviews (80% of time spent here)
  • Architecture reviews
  • Planning new systems
  • Explaining system to stakeholders

HLD deliverables:

  • Architecture diagrams
  • Component interaction flows
  • Technology stack decisions
  • Capacity planning estimates

Low-Level Design (LLD)

What it is: Detailed design of individual components, including classes, methods, and algorithms.

Focus areas:

  • Class diagrams and relationships
  • API contracts and data models
  • Database schemas (tables, columns, indexes)
  • Algorithm implementations
  • Design patterns (Singleton, Factory, Observer)
  • Error handling strategies

Think of it as: The detailed electrical and plumbing plans for each room in the house.

What you define in LLD:

  • Class structures and inheritance
  • Method signatures and parameters
  • Data structures (arrays, hash maps, trees)
  • API endpoints and request/response formats
  • Database table schemas
  • Caching keys and expiration policies
  • Error codes and exception handling

Real-world example: For Netflix’s recommendation service, LLD defines:

  • RecommendationEngine class
  • getUserRecommendations(userId, limit) method
  • Collaborative filtering algorithm
  • UserPreference data model
  • Database schema for storing viewing history
  • Caching strategy for recommendations

When you need LLD:

  • Implementation planning
  • Code reviews
  • Technical specifications
  • Detailed documentation

LLD deliverables:

  • Class diagrams (UML)
  • Sequence diagrams
  • Database ER diagrams
  • API documentation
  • Pseudocode or actual code

HLD vs LLD: Key Differences

Aspect High-Level Design (HLD) Low-Level Design (LLD)
Scope Entire system Individual components
Audience Architects, stakeholders Developers, engineers
Detail Level Abstract, conceptual Concrete, implementation
Focus What components, how they connect How each component works internally
Time in Interview 80% 20%
Example "We'll use Redis for caching" "Cache key format: user:{id}:timeline"

💡 Interview tip: Start with HLD. Only dive into LLD when interviewer asks or when you've covered the high-level architecture completely.


Core System Design Concepts

🏗️ Essential Building Blocks

Now let's dive into the essential building blocks. Each concept solves a specific problem. Understanding when and why to use each one is key.

A. Scalability

Scalability is your system's ability to handle growth. Can it serve 10 users? Great. Can it serve 10 million? That's scalability.

Vertical vs Horizontal Scaling Comparison Visual comparison showing vertical scaling (adding more power to one server) versus horizontal scaling (adding more servers) Vertical Scaling Scale Up - Add More Power 4GB RAM 2 CPU 32GB RAM 16 CPU Fast SSD ✓ Simple, No code changes ✗ Physical limits, Expensive Horizontal Scaling Scale Out - Add More Machines Server Server 1 Server 2 Server 3 Server 4 ✓ Unlimited scale, No single failure ✗ Complex, Network overhead

⬆️ Vertical Scaling

Scale Up - Add more power

✅ Pros:

  • Simple - no code changes
  • No coordination complexity
  • Easier to maintain

❌ Cons:

  • Physical limits
  • Expensive at high end
  • Single point of failure

↔️ Horizontal Scaling

Scale Out - Add more machines

✅ Pros:

  • Nearly unlimited scaling
  • No single point of failure
  • Cost-effective

❌ Cons:

  • More complex
  • Requires stateless architecture
  • Network overhead

Vertical Scaling (Scale Up)

What it is: Adding more power to your existing machine—more CPU, more RAM, faster disk.

How it works: You have one server with 4GB RAM. It’s slow. You upgrade to 32GB RAM. Same server, more power.

Real-world examples:

  • Stack Overflow ran on a single powerful server for years before needing multiple servers
  • Early-stage startups often start with vertical scaling—it’s simpler

Pros:

  • Simple—no code changes needed
  • No complexity in coordination
  • Works immediately
  • Easier to maintain (one machine)

Cons:

  • Physical limits—you can’t infinitely upgrade one machine
  • Expensive at high end (diminishing returns)
  • Single point of failure
  • Downtime during upgrades

When to use: Early stages, when traffic is predictable, when simplicity matters more than unlimited scale.

Cost example: AWS EC2 instance

  • t3.small (2GB RAM): $15/month
  • t3.xlarge (16GB RAM): $120/month
  • t3.2xlarge (32GB RAM): $240/month

Horizontal Scaling (Scale Out)

What it is: Adding more machines to handle increased load. Instead of one powerful server, use many smaller servers.

How it works: You have one server handling 1,000 requests/sec. Add 9 more servers, now handle 10,000 requests/sec.

Real-world examples:

  • Netflix runs on thousands of AWS servers
  • Instagram uses hundreds of servers behind load balancers
  • Google has millions of servers worldwide

Pros:

  • Nearly unlimited scaling—just add more servers
  • No single point of failure
  • Cost-effective—use many cheap servers
  • Can scale gradually

Cons:

  • More complex—need load balancers, session management
  • Requires stateless architecture
  • Network overhead
  • More operational complexity

When to use: When you need to scale beyond one machine’s capacity, when you need high availability, when traffic is unpredictable.

Key requirement: Your application must be stateless (we’ll cover this later).

Auto-Scaling

What it is: Automatically adding or removing servers based on demand.

How it works:

  • Monitor metrics (CPU usage, request count)
  • When CPU > 70%, add more servers
  • When CPU < 30%, remove servers
  • Pay only for what you use

Real-world examples:

  • Uber auto-scales during rush hour (10x traffic spike)
  • E-commerce sites auto-scale during Black Friday
  • News sites auto-scale when breaking news hits

Pros:

  • Cost-efficient—don’t pay for idle servers
  • Handles unexpected traffic spikes
  • No manual intervention needed

Cons:

  • Requires careful configuration
  • Scaling takes time (1-5 minutes)
  • Can be expensive if misconfigured
  • Need to handle scaling events gracefully

Configuration example:

Min servers: 2
Max servers: 50
Scale up when: CPU > 70% for 5 minutes
Scale down when: CPU < 30% for 10 minutes

B. Load Distribution

When you have multiple servers, you need something to distribute traffic between them.

Load Balancer

What it is: A server that sits in front of your application servers and distributes incoming requests across them.

Load Balancer Architecture Diagram showing how a load balancer distributes client requests across multiple application servers Load Balancer Distribution 👤 Client 1 👤 Client 2 👤 Client 3 Load Balancer Round Robin Health Checks SSL Termination Server 1 Active CPU: 45% Server 2 Active CPU: 52% Server 3 Active CPU: 38% Database Incoming Requests Distributed Traffic

How it works:

  1. Client sends request to load balancer
  2. Load balancer picks a server using an algorithm
  3. Request is forwarded to chosen server
  4. Server processes and responds
  5. Load balancer returns response to client

Load Balancing Algorithms:

🔄 Round Robin

Send request 1 to server A, request 2 to server B, request 3 to server C, repeat. Simple and fair.

📊 Least Connections

Send to server with fewest active connections. Better for long-lived connections.

⚡ Least Response Time

Send to server with fastest response time. Adapts to server performance.

🔑 IP Hash

Hash client IP to determine server. Same client always goes to same server.

Real-world examples:

  • Netflix uses Elastic Load Balancing (AWS) to distribute across thousands of servers
  • Cloudflare load balances across global data centers
  • GitHub uses load balancers to handle millions of git operations

Health Checks: Load balancers ping servers every few seconds. If a server doesn’t respond, it’s removed from rotation.

Example health check:

Endpoint: /health
Interval: 5 seconds
Timeout: 2 seconds
Unhealthy threshold: 2 consecutive failures
Healthy threshold: 2 consecutive successes

Types of Load Balancers:

1. Layer 4 (Transport Layer)

  • Routes based on IP and port
  • Fast but less flexible
  • Can’t inspect HTTP headers

2. Layer 7 (Application Layer)

  • Routes based on HTTP headers, cookies, URL path
  • More flexible
  • Can do SSL termination
  • Slightly slower

Pros:

  • Distributes load evenly
  • Provides redundancy
  • Enables zero-downtime deployments
  • Can route based on rules

Cons:

  • Single point of failure (need redundant load balancers)
  • Adds latency (small)
  • Additional cost

Session Persistence Problem: User logs in on Server A. Next request goes to Server B. User appears logged out.

Solution: Sticky sessions (IP hash) or external session storage (Redis).


C. Data Management

How you store and retrieve data determines your system’s capabilities and limitations.

Database Types

🗄️ SQL (Relational)

Structured data with predefined schemas

Examples:

PostgreSQL, MySQL, Oracle, SQL Server

✅ When to use:

  • Complex relationships
  • Need ACID transactions
  • Structured, predictable data
  • Complex queries with JOINs

Real-world: Banks, E-commerce, SaaS apps

📦 NoSQL (Non-Relational)

Flexible schema optimized for specific use cases

Examples:

MongoDB, Redis, Cassandra, DynamoDB

✅ When to use:

  • Need horizontal scalability
  • Flexible/evolving schema
  • Simple access patterns
  • High write throughput

Real-world: Facebook, Netflix, Twitter

SQL vs NoSQL Decision Tree Decision flowchart to help choose between SQL and NoSQL databases based on requirements 🤔 SQL vs NoSQL Decision Tree Start: Choose Database Type Need ACID transactions? YES Use SQL PostgreSQL MySQL NO Complex data relationships? YES NO Need massive horizontal scale? YES Use NoSQL Cassandra MongoDB NO Either works!

Types:

1. Document Stores (MongoDB, CouchDB)

  • Store JSON-like documents
  • Flexible schema
  • Good for content management

2. Key-Value Stores (Redis, DynamoDB)

  • Simple key-value pairs
  • Extremely fast
  • Good for caching, sessions

3. Column-Family (Cassandra, HBase)

  • Store data in columns
  • Good for time-series data
  • Scales horizontally easily

4. Graph Databases (Neo4j, Amazon Neptune)

  • Store relationships
  • Good for social networks
  • Fast relationship queries

When to use:

  • Need horizontal scalability
  • Flexible/evolving schema
  • Simple access patterns
  • High write throughput

Real-world examples:

  • Facebook uses Cassandra for messaging
  • Netflix uses Cassandra for viewing history
  • Twitter uses Manhattan (key-value) for tweets
  • LinkedIn uses Voldemort for member data

Pros:

  • Scales horizontally easily
  • Flexible schema
  • Optimized for specific use cases
  • High performance for simple queries

Cons:

  • Weaker consistency guarantees
  • Limited query flexibility
  • No JOINs (denormalize data)
  • Eventual consistency

Database Indexing

What it is: A data structure that improves query speed by creating a lookup table.

How it works: Like a book’s index—instead of reading every page to find “Redis,” you look it up in the index and jump to the right page.

Without index:

SELECT * FROM users WHERE email = 'user@example.com';
-- Scans all 10 million rows: 2000ms

With index:

CREATE INDEX idx_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';
-- Uses B-tree index: 5ms (400x faster!)

Index types:

1. B-Tree Index (most common)

  • Balanced tree structure
  • Good for range queries
  • Default in most databases

2. Hash Index

  • Fast for exact matches
  • Can’t do range queries
  • Good for equality checks

3. Full-Text Index

  • For text search
  • Supports partial matches
  • Used by search engines

Real-world examples:

  • LinkedIn indexes profiles by name, company, skills
  • Amazon indexes products by category, price, rating
  • Gmail indexes emails for instant search

Pros:

  • Dramatically faster queries (10-1000x)
  • Essential for large datasets
  • Enables complex queries

Cons:

  • Slower writes (must update index)
  • Uses storage space
  • Need to choose columns carefully

Best practices:

  • Index columns used in WHERE clauses
  • Index foreign keys
  • Index columns used in ORDER BY
  • Don’t over-index (slows writes)

Database Replication

What it is: Copying data across multiple database servers.

Database Replication Architecture Primary-Replica pattern showing write operations going to primary database and read operations distributed across replicas Primary-Replica Replication App Server 1 App Server 2 PRIMARY Database ✍️ Writes Only WRITE WRITE Replicate Replicate Replicate REPLICA 1 Read Only 📖 Reads REPLICA 2 Read Only 📖 Reads REPLICA 3 Read Only 📖 Reads Benefits: ✓ Scale reads ✓ High availability Trade-offs: ⚠ Replication lag ⚠ Eventual consistency

Primary-Replica Pattern:

  • One primary database handles all writes
  • Multiple replicas handle reads
  • Primary replicates changes to replicas

How it works:

  1. Write goes to primary
  2. Primary updates its data
  3. Primary sends changes to replicas
  4. Replicas update their data
  5. Reads go to replicas

Real-world examples:

  • YouTube replicates video metadata globally
  • Instagram uses read replicas for timeline queries
  • Reddit uses replicas to handle millions of reads

Replication types:

1. Synchronous Replication

  • Primary waits for replica confirmation
  • Strong consistency
  • Slower writes

2. Asynchronous Replication

  • Primary doesn’t wait
  • Faster writes
  • Eventual consistency
  • Replication lag (milliseconds to seconds)

Pros:

  • Scales read capacity (add more replicas)
  • Provides backup if primary fails
  • Can place replicas near users (lower latency)

Cons:

  • Replication lag (replicas might be behind)
  • Doesn’t scale writes (still one primary)
  • Complexity in failover

Failover: If primary fails, promote a replica to primary.

Database Sharding

What it is: Splitting your database across multiple machines, each holding a subset of data.

How it works: Instead of one database with 1 billion users, have 10 databases with 100 million users each.

Sharding strategies:

1. Hash-Based Sharding

shard = hash(user_id) % num_shards
  • Even distribution
  • Hard to add shards later

2. Range-Based Sharding

Shard 1: users 0-100M
Shard 2: users 100M-200M
  • Easy to add shards
  • Risk of hotspots

3. Geographic Sharding

US users → US shard
EU users → EU shard
  • Lower latency
  • Uneven distribution

Real-world examples:

  • Instagram shards by user ID
  • Discord shards by server ID
  • Uber shards by geographic region

Pros:

  • Scales writes horizontally
  • Breaks through single-database limits
  • Can handle massive datasets

Cons:

  • Complex queries across shards
  • Rebalancing is painful
  • Hotspots if data isn’t evenly distributed
  • Can’t do JOINs across shards

Challenges:

  • Cross-shard queries: Expensive, avoid if possible
  • Distributed transactions: Very complex
  • Resharding: Moving data between shards

D. Caching

What it is: Storing frequently accessed data in fast memory (RAM) to avoid slow database queries.

Why it matters: Database queries take 10-100ms. Cache lookups take 1ms. That’s 10-100x faster.

Caching Architecture Layers Multi-layer caching strategy showing client cache, CDN, server cache, and database with performance metrics Multi-Layer Caching Strategy 💻 Client Browser Cache ~0ms Request CDN Edge Cache Images Static Files ⚡ 20-50ms Miss App Server Business Logic Query Redis Cache In-Memory Key-Value ⚡ 1-5ms Miss Database PostgreSQL 🐌 10-100ms ⚡ Performance Comparison Browser Cache: ~0ms CDN: 20-50ms Redis: 1-5ms Database: 10-100ms ✅ Cache Hit Flow 1. Check browser cache → HIT (0ms) 2. If miss, check CDN → HIT (20ms) 3. If miss, check Redis → HIT (1ms) 4. If miss, query database → SLOW (50ms)

Cache hierarchy:

1. Client-Side Cache

  • Browser cache
  • Mobile app cache
  • Fastest (no network)

2. CDN Cache

  • Edge servers worldwide
  • Static content (images, videos, CSS)

3. Server-Side Cache

  • Redis, Memcached
  • Application data

4. Database Cache

  • Query result cache
  • Built into database

Caching strategies:

1. Cache-Aside (Lazy Loading)

1. Check cache
2. If miss, query database
3. Store in cache
4. Return data
  • Most common pattern
  • Cache only what’s needed

2. Write-Through

1. Write to cache
2. Write to database
3. Return success
  • Cache always consistent
  • Slower writes

3. Write-Back (Write-Behind)

1. Write to cache
2. Return success
3. Async write to database
  • Fastest writes
  • Risk of data loss

4. Write-Around

1. Write to database
2. Invalidate cache
3. Next read loads from DB
  • Avoids cache pollution
  • First read after write is slow

Cache eviction policies:

1. LRU (Least Recently Used)

  • Remove least recently accessed items
  • Most common
  • Good for general use

2. LFU (Least Frequently Used)

  • Remove least frequently accessed items
  • Good for stable access patterns

3. FIFO (First In First Out)

  • Remove oldest items
  • Simple but not optimal

4. TTL (Time To Live)

  • Items expire after time
  • Good for time-sensitive data

Real-world examples:

  • Reddit caches front page in Redis
  • Twitter caches timelines
  • Amazon caches product pages
  • Netflix caches user preferences

Cache invalidation (the hard part):

Problem: How do you keep cache and database in sync?

Strategies:

  1. TTL: Cache expires after time (5 minutes)
  2. Event-based: Invalidate on updates
  3. Version-based: Include version in cache key

Famous quote: “There are only two hard things in Computer Science: cache invalidation and naming things.” - Phil Karlton

Pros:

  • Dramatically faster reads
  • Reduces database load
  • Improves user experience

Cons:

  • Cache invalidation complexity
  • Stale data risk
  • Memory is expensive
  • Added complexity

Cache hit ratio: Percentage of requests served from cache. Aim for 80%+.


E. Content Delivery

CDN (Content Delivery Network)

What it is: A network of servers distributed globally that cache and serve static content from locations close to users.

How it works:

  1. User in Tokyo requests image
  2. CDN routes to nearest edge server (Tokyo)
  3. If cached, serve immediately (20ms)
  4. If not cached, fetch from origin (200ms), cache, serve
  5. Next user gets cached version (20ms)

What CDNs cache:

  • Images, videos
  • CSS, JavaScript files
  • Fonts
  • Static HTML pages
  • API responses (sometimes)

Real-world examples:

  • Netflix stores popular shows on CDN servers in every major city
  • YouTube uses Google’s CDN for video delivery
  • Spotify caches popular songs on edge servers
  • Instagram serves images via CDN

CDN providers:

  • Cloudflare
  • AWS CloudFront
  • Akamai
  • Fastly
  • Google Cloud CDN

Pros:

  • Dramatically lower latency (10x faster)
  • Reduces origin server load
  • Handles traffic spikes
  • DDoS protection

Cons:

  • Costs money (per GB transferred)
  • Cache invalidation complexity
  • Not useful for dynamic content
  • Initial request is slow (cache miss)

Performance impact:

  • Without CDN: User in Australia → US server = 200ms
  • With CDN: User in Australia → Sydney edge = 20ms

Cache invalidation:

  • Set TTL (time to live)
  • Purge cache manually
  • Use versioned URLs (style.v2.css)

F. Communication Patterns

How services talk to each other matters.

REST APIs

What it is: HTTP-based communication using standard methods (GET, POST, PUT, DELETE).

How it works:

GET /users/123          → Get user
POST /users             → Create user
PUT /users/123          → Update user
DELETE /users/123       → Delete user

Real-world examples:

  • Stripe payment API
  • Twitter API
  • GitHub API
  • Most web APIs

Pros:

  • Universal standard
  • Stateless
  • Cacheable
  • Simple to understand

Cons:

  • Can be chatty (multiple requests)
  • Over-fetching or under-fetching data
  • No real-time support

GraphQL

What it is: Query language that lets clients request exactly the data they need.

How it works:

query {
  user(id: 123) {
    name
    email
    posts {
      title
      likes
    }
  }
}

Real-world examples:

  • GitHub API v4
  • Shopify API
  • Facebook (created GraphQL)

Pros:

  • Single request for related data
  • No over-fetching
  • Strong typing
  • Self-documenting

Cons:

  • More complex server implementation
  • Caching is harder
  • Can be abused (expensive queries)

WebSockets

What it is: Persistent two-way connection between client and server.

How it works:

  1. Client opens WebSocket connection
  2. Connection stays open
  3. Server can push data anytime
  4. Client can send data anytime

Real-world examples:

  • Slack real-time messaging
  • Trading platforms live price updates
  • Multiplayer games real-time state
  • Collaborative editing (Google Docs)

Pros:

  • Real-time communication
  • Low latency
  • Bi-directional
  • Efficient (no polling)

Cons:

  • Harder to scale (stateful)
  • More complex infrastructure
  • Firewall issues

gRPC

What it is: High-performance RPC framework using Protocol Buffers.

How it works:

  • Define service in .proto file
  • Generate client/server code
  • Binary protocol (faster than JSON)

Real-world examples:

  • Google internal services
  • Netflix microservices
  • Uber service communication

Pros:

  • Very fast (binary)
  • Strong typing
  • Bi-directional streaming
  • Code generation

Cons:

  • Not human-readable
  • Less browser support
  • Steeper learning curve

I’ll continue with the remaining sections in the next part. The blog is comprehensive and following all guidelines!

G. Asynchronous Processing

Not everything needs to happen immediately. Some tasks can wait.

Message Queues

What it is: A buffer that stores messages between services for asynchronous processing.

How it works:

  1. Producer sends message to queue
  2. Message waits in queue
  3. Consumer picks up message when ready
  4. Consumer processes message
  5. Consumer acknowledges completion

Popular message queues:

  • Kafka - High throughput, distributed
  • RabbitMQ - Feature-rich, reliable
  • AWS SQS - Managed, simple
  • Redis - Fast, simple

Real-world examples:

  • YouTube queues video processing (transcoding, thumbnails)
  • Uber queues ride matching and notifications
  • Airbnb queues email sending
  • LinkedIn queues feed updates

Use cases:

  • Email sending
  • Image processing
  • Report generation
  • Data analytics
  • Notifications
  • Background jobs

Pros:

  • Decouples services
  • Handles traffic spikes (queue buffers)
  • Retry failed tasks
  • Scales independently

Cons:

  • Adds latency (not instant)
  • Requires queue management
  • Eventual consistency
  • More complex debugging

Patterns:

1. Point-to-Point

  • One producer, one consumer
  • Message consumed once

2. Pub/Sub (Publish-Subscribe)

  • One producer, multiple consumers
  • Message consumed by all subscribers

Example: User posts tweet

1. Save tweet to database (immediate)
2. Queue fan-out task (async)
3. Queue notification task (async)
4. Queue analytics task (async)
5. Return success to user (fast!)

Event-Driven Architecture

What it is: Services communicate by publishing and subscribing to events.

How it works:

  • Service A publishes “UserCreated” event
  • Services B, C, D subscribe to event
  • Each service reacts independently

Real-world examples:

  • Netflix uses events for user actions
  • Amazon uses events for order processing
  • Uber uses events for ride lifecycle

Pros:

  • Loose coupling
  • Easy to add new features
  • Scales well

Cons:

  • Harder to debug
  • Eventual consistency
  • Complex error handling

H. Reliability & Fault Tolerance

Systems fail. Hardware crashes. Networks partition. Your system must handle failures gracefully.

Redundancy

What it is: Having backup components that take over when primary fails.

Types:

1. Active-Active

  • All components handle traffic
  • If one fails, others continue
  • No downtime

2. Active-Passive

  • Primary handles traffic
  • Backup waits on standby
  • Failover takes seconds

Real-world examples:

  • AWS runs multiple data centers per region
  • Google has redundant servers for every service
  • Netflix runs in multiple AWS regions

Pros:

  • Eliminates single points of failure
  • Improves availability
  • Enables maintenance without downtime

Cons:

  • Costs more (paying for backups)
  • More complex
  • Synchronization challenges

Failover

What it is: Automatically switching to backup when primary fails.

How it works:

  1. Monitor primary health
  2. Detect failure
  3. Promote backup to primary
  4. Route traffic to new primary

Failover time:

  • Automatic: 30 seconds - 5 minutes
  • Manual: Hours

Real-world examples:

  • Database failover: Promote replica to primary
  • Load balancer failover: Switch to backup load balancer
  • Region failover: Switch to different geographic region

Challenges:

  • Split-brain problem (two primaries)
  • Data loss during failover
  • Failover time

Circuit Breaker

What it is: Stops calling a failing service to prevent cascading failures.

How it works:

States:

  1. Closed: Normal operation, requests go through
  2. Open: Service is failing, requests fail fast
  3. Half-Open: Testing if service recovered

Example:

1. Recommendation service is down
2. After 5 failures, circuit opens
3. Stop calling recommendation service
4. Show cached recommendations instead
5. After 30 seconds, try again (half-open)
6. If success, close circuit

Real-world examples:

  • Spotify uses circuit breakers for recommendation service
  • Netflix Hystrix library implements circuit breakers
  • Amazon uses circuit breakers between microservices

Pros:

  • Prevents cascading failures
  • Fails fast (better UX)
  • Gives failing service time to recover

Cons:

  • Requires fallback strategies
  • Can hide underlying issues
  • Configuration complexity

Retry Mechanisms

What it is: Automatically retrying failed requests.

Strategies:

1. Immediate Retry

  • Retry right away
  • Good for transient failures

2. Exponential Backoff

  • Wait 1s, 2s, 4s, 8s between retries
  • Prevents overwhelming failing service

3. Jitter

  • Add randomness to backoff
  • Prevents thundering herd

Example:

Attempt 1: Fail → Wait 1s
Attempt 2: Fail → Wait 2s
Attempt 3: Fail → Wait 4s
Attempt 4: Success!

Best practices:

  • Limit retry attempts (3-5)
  • Use exponential backoff
  • Add jitter
  • Only retry idempotent operations

Idempotent: Operation that can be repeated safely. GET is idempotent. POST might not be (could create duplicate).


I. Data Consistency

In distributed systems, keeping data consistent is challenging.

ACID Properties

What it is: Guarantees provided by traditional databases.

A - Atomicity

  • All or nothing
  • Transaction either completes fully or not at all

Example: Bank transfer

1. Deduct $100 from Account A
2. Add $100 to Account B
Both happen or neither happens

C - Consistency

  • Data follows all rules
  • Constraints are enforced

Example: Foreign key constraints, unique constraints

I - Isolation

  • Concurrent transactions don’t interfere
  • Each transaction sees consistent state

Example: Two people booking last seat on flight—only one succeeds

D - Durability

  • Once committed, data persists
  • Survives crashes

Example: After “Payment successful,” data is saved permanently

Real-world examples:

  • Banks need ACID for transactions
  • E-commerce needs ACID for orders
  • Booking systems need ACID for reservations

CAP Theorem

⚖️ The Fundamental Trade-off

In a distributed system, you can only have two of three: Consistency, Availability, Partition Tolerance.

C
Consistency

All nodes see the same data at the same time

A
Availability

Every request gets a response (success or failure)

P
Partition Tolerance

System continues working despite network failures

🎯 The trade-off:

In a distributed system, network partitions will happen (P is mandatory). You must choose between C and A.

CP Systems (Consistency + Partition Tolerance)

Sacrifice availability during partitions

Examples:

MongoDB, HBase, Redis

Use case: Banking, inventory

AP Systems (Availability + Partition Tolerance)

Sacrifice consistency during partitions

Examples:

Cassandra, DynamoDB, CouchDB

Use case: Social media, analytics

Real-world example:

  • DynamoDB (AP): During network partition, you can still read/write, but different users might see different data temporarily
  • MongoDB (CP): During network partition, some nodes become unavailable to maintain consistency

Eventual Consistency

What it is: System will become consistent eventually, but might be temporarily inconsistent.

How it works:

  1. Write happens on one node
  2. Write propagates to other nodes
  3. Eventually (milliseconds to seconds), all nodes have same data

Real-world examples:

  • Instagram likes: Your like might not appear immediately to everyone
  • Facebook posts: Friends see your post at slightly different times
  • DNS updates: Takes time to propagate globally

Pros:

  • High availability
  • Better performance
  • Scales easily

Cons:

  • Temporary inconsistency
  • Complex conflict resolution
  • Harder to reason about

When to use: Social media, analytics, caching—where temporary inconsistency is acceptable.

Strong Consistency

What it is: All nodes see the same data immediately after a write.

How it works:

  1. Write happens
  2. System waits for all nodes to confirm
  3. Only then returns success

Real-world examples:

  • Bank transactions: Balance must be consistent
  • Inventory systems: Can’t oversell products
  • Booking systems: Can’t double-book

Pros:

  • Simple to reason about
  • No conflicts
  • Data always correct

Cons:

  • Slower writes
  • Lower availability
  • Harder to scale

When to use: Financial systems, inventory, anything where correctness is critical.


J. Security

Security isn’t optional. One breach can destroy a company.

Authentication vs Authorization

Authentication: Who are you?

  • Verifying identity
  • Login with username/password
  • Multi-factor authentication

Authorization: What can you do?

  • Determining permissions
  • Role-based access control
  • Resource-level permissions

Example:

  • Authentication: You log into Google with your password
  • Authorization: You can edit your own docs, view shared docs, but can’t edit others’ docs

Authentication methods:

1. Session-Based

  • Server stores session
  • Client gets session ID cookie
  • Traditional approach

2. Token-Based (JWT)

  • Server signs token
  • Client stores token
  • Stateless
  • Modern approach

3. OAuth 2.0

  • Third-party authentication
  • “Login with Google”
  • Delegated authorization

4. Multi-Factor Authentication (MFA)

  • Something you know (password)
  • Something you have (phone)
  • Something you are (fingerprint)

Real-world examples:

  • Gmail uses OAuth for third-party apps
  • Banking apps use MFA
  • AWS uses IAM for authorization

Rate Limiting

What it is: Restricting how many requests a user can make in a time period.

Why it matters:

  • Prevents abuse
  • Protects against DDoS
  • Ensures fair usage
  • Reduces costs

Algorithms:

1. Fixed Window

100 requests per minute
Reset at minute boundary
  • Simple
  • Burst at boundary

2. Sliding Window

100 requests per rolling 60 seconds
  • Smoother
  • More complex

3. Token Bucket

Bucket holds 100 tokens
Refill 10 tokens/second
Each request costs 1 token
  • Handles bursts
  • Most flexible

4. Leaky Bucket

Requests enter bucket
Process at fixed rate
Overflow is rejected
  • Smooth rate
  • No bursts

Real-world examples:

  • Twitter API: 300 requests per 15 minutes
  • GitHub API: 5,000 requests per hour
  • Stripe API: 100 requests per second

Response when limited:

HTTP 429 Too Many Requests
Retry-After: 60

Encryption

What it is: Scrambling data so only authorized parties can read it.

Types:

1. Encryption at Rest

  • Data stored on disk
  • Database encryption
  • File encryption

2. Encryption in Transit

  • Data moving over network
  • HTTPS/TLS
  • VPN

Encryption methods:

1. Symmetric Encryption

  • Same key for encrypt/decrypt
  • Fast
  • Examples: AES, DES

2. Asymmetric Encryption

  • Public key encrypts
  • Private key decrypts
  • Slower
  • Examples: RSA, ECC

Real-world examples:

  • WhatsApp end-to-end encryption
  • HTTPS encrypts web traffic
  • AWS encrypts data at rest

Best practices:

  • Always use HTTPS
  • Encrypt sensitive data at rest
  • Use strong algorithms (AES-256)
  • Rotate keys regularly
  • Never store passwords in plain text (hash them)

K. Monitoring & Observability

You can’t fix what you can’t see.

Logging

What it is: Recording events that happen in your system.

Log levels:

  • DEBUG: Detailed information for debugging
  • INFO: General information
  • WARN: Warning, something unusual
  • ERROR: Error occurred, but system continues
  • FATAL: Critical error, system might crash

What to log:

  • User actions
  • Errors and exceptions
  • Performance metrics
  • Security events
  • System state changes

Real-world examples:

  • Google logs every search query
  • Amazon logs every purchase
  • Netflix logs every video play

Best practices:

  • Use structured logging (JSON)
  • Include context (user ID, request ID)
  • Don’t log sensitive data (passwords, credit cards)
  • Use log aggregation (ELK stack, Splunk)

Metrics

What it is: Numerical measurements of system behavior over time.

Key metrics:

1. Latency

  • How long requests take
  • P50, P95, P99 percentiles

2. Throughput

  • Requests per second
  • Transactions per second

3. Error Rate

  • Percentage of failed requests
  • 4xx vs 5xx errors

4. Saturation

  • CPU usage
  • Memory usage
  • Disk usage
  • Network usage

Real-world examples:

  • Netflix tracks video start time
  • Uber tracks ride matching time
  • Stripe tracks payment success rate

Tools:

  • Prometheus
  • Grafana
  • Datadog
  • New Relic

Distributed Tracing

What it is: Tracking a request as it flows through multiple services.

How it works:

  1. Request gets unique trace ID
  2. Each service adds span (timing info)
  3. Spans linked by trace ID
  4. Visualize entire request flow

Why it matters: In microservices, one user request might touch 10+ services. When something fails, you need to know where.

Example:

User request → API Gateway → Auth Service → User Service → Database
                                          → Cache
                                          → Notification Service

Real-world examples:

  • Uber uses Jaeger for tracing
  • Netflix built their own (Zipkin)
  • Google uses Dapper

Tools:

  • Jaeger
  • Zipkin
  • AWS X-Ray
  • Google Cloud Trace

Alerting

What it is: Notifying engineers when something goes wrong.

Alert types:

1. Threshold Alerts

  • CPU > 80% for 5 minutes
  • Error rate > 1%

2. Anomaly Detection

  • Traffic 3x higher than normal
  • ML-based detection

Best practices:

  • Alert on symptoms, not causes
  • Reduce alert fatigue
  • Include runbooks
  • Set appropriate thresholds

Real-world example:

Alert: API latency P99 > 1000ms
Severity: High
Runbook: Check database connections, restart cache

I’ll continue with Architecture Patterns and remaining sections in the next part!

Architecture Patterns

🏛️ System Organization Patterns

How you organize your system matters. Different patterns solve different problems.

Monolithic Architecture

What it is: One large application containing all functionality.

Structure:

  • Single codebase
  • Single deployment unit
  • Shared database
  • All features in one application

Real-world examples:

  • Early Twitter (before microservices)
  • Stack Overflow (still monolithic!)
  • Shopify core (monolith with services)

Pros:

  • Simple to develop initially
  • Easy to test (everything together)
  • Easy to deploy (one unit)
  • No network overhead
  • Easier debugging

Cons:

  • Hard to scale (must scale entire app)
  • Slow deployments (test everything)
  • Technology lock-in
  • Hard to understand as it grows
  • One bug can crash everything

When to use:

  • Small teams
  • Early-stage startups
  • Simple applications
  • When speed of development matters

Microservices Architecture

What it is: Application split into small, independent services.

Microservices Architecture Diagram showing microservices architecture with API gateway, multiple independent services, and separate databases Microservices Architecture 📱 Mobile 💻 Web API Gateway Routing Auth Rate Limiting Load Balancing User Service Authentication Profile Node.js User DB Order Service Create Order Track Order Python Order DB Payment Service Process Payment Refunds Java Pay DB Notification Service Email/SMS Go Notif DB Message Queue (Kafka/RabbitMQ) Event Bus ✅ Benefits • Independent deployment • Scale services separately • Technology flexibility • Team autonomy • Fault isolation • Easier to understand • Faster development

Structure:

  • Multiple codebases
  • Independent deployment
  • Separate databases (often)
  • Services communicate via APIs

Characteristics:

  • Each service does one thing
  • Independently deployable
  • Can use different technologies
  • Loosely coupled

Real-world examples:

  • Netflix (hundreds of microservices)
  • Uber (2000+ microservices)
  • Amazon (service-oriented since 2001)
  • Spotify (squad-based microservices)

⚖️ Monolithic vs Microservices Comparison

Aspect Monolithic Microservices
Codebase Single Multiple
Deployment All at once Independent
Scaling Scale entire app Scale services independently
Technology Single stack Multiple stacks
Complexity Low High
Best For Small teams, startups Large teams, scale
Example Stack Overflow Netflix, Uber

Pros:

  • Scale independently
  • Deploy independently
  • Technology flexibility
  • Team autonomy
  • Fault isolation

Cons:

  • Complex infrastructure
  • Network overhead
  • Distributed system challenges
  • Harder to debug
  • Data consistency issues

When to use:

  • Large teams
  • Need independent scaling
  • Different technology needs
  • Mature organizations

Microservices challenges:

1. Service Discovery

  • How services find each other
  • Tools: Consul, Eureka, Kubernetes

2. API Gateway

  • Single entry point
  • Routing, authentication
  • Tools: Kong, AWS API Gateway

3. Data Consistency

  • No distributed transactions
  • Eventual consistency
  • Saga pattern

4. Monitoring

  • Distributed tracing
  • Centralized logging
  • Tools: Jaeger, ELK

Service-Oriented Architecture (SOA)

What it is: Similar to microservices but with enterprise service bus (ESB).

Differences from microservices:

  • Larger services
  • Shared ESB for communication
  • More governance
  • Heavier protocols (SOAP)

Real-world examples:

  • Enterprise systems
  • Legacy modernization
  • Banking systems

When to use:

  • Enterprise environments
  • Need governance
  • Legacy integration

Event-Driven Architecture

What it is: Services communicate through events rather than direct calls.

How it works:

  1. Service A publishes event
  2. Event goes to message broker
  3. Interested services subscribe
  4. Each service reacts independently

Real-world examples:

  • Netflix user activity events
  • Uber ride lifecycle events
  • Amazon order processing

Pros:

  • Loose coupling
  • Easy to add features
  • Scales well
  • Asynchronous

Cons:

  • Harder to debug
  • Eventual consistency
  • Complex error handling

Serverless Architecture

What it is: Run code without managing servers. Cloud provider handles infrastructure.

How it works:

  • Write functions
  • Deploy to cloud
  • Pay per execution
  • Auto-scales

Real-world examples:

  • AWS Lambda
  • Google Cloud Functions
  • Azure Functions

Use cases:

  • API backends
  • Data processing
  • Scheduled tasks
  • Event handlers

Pros:

  • No server management
  • Auto-scaling
  • Pay per use
  • Fast development

Cons:

  • Cold start latency
  • Vendor lock-in
  • Limited execution time
  • Debugging challenges

Common System Design Patterns

Reusable solutions to common problems.

API Gateway

What it is: Single entry point for all client requests.

Responsibilities:

  • Routing to services
  • Authentication
  • Rate limiting
  • Request/response transformation
  • Caching
  • Logging

Real-world examples:

  • Netflix Zuul
  • AWS API Gateway
  • Kong

Pros:

  • Centralized control
  • Simplifies clients
  • Cross-cutting concerns

Cons:

  • Single point of failure
  • Can become bottleneck
  • Added latency

Service Mesh

What it is: Infrastructure layer handling service-to-service communication.

Features:

  • Load balancing
  • Service discovery
  • Circuit breaking
  • Retries
  • Timeouts
  • Metrics

Real-world examples:

  • Istio
  • Linkerd
  • Consul Connect

Pros:

  • Moves networking logic out of code
  • Consistent behavior
  • Observability

Cons:

  • Complex setup
  • Performance overhead
  • Learning curve

CQRS (Command Query Responsibility Segregation)

What it is: Separate models for reading and writing data.

How it works:

  • Write model: Handles commands (create, update, delete)
  • Read model: Handles queries (optimized for reads)
  • Sync between models (eventually consistent)

Real-world examples:

  • E-commerce (separate read/write for products)
  • Banking (transaction processing vs balance queries)

Pros:

  • Optimize reads and writes independently
  • Scale reads and writes separately
  • Simpler queries

Cons:

  • More complex
  • Eventual consistency
  • Sync overhead

Event Sourcing

What it is: Store all changes as sequence of events instead of current state.

How it works:

  • Don’t store current state
  • Store all events that led to state
  • Rebuild state by replaying events

Example: Instead of storing balance = $100, store:

1. AccountCreated: $0
2. Deposited: $50
3. Deposited: $75
4. Withdrew: $25
Current balance = $100

Real-world examples:

  • Banking (audit trail)
  • Version control (Git)
  • Collaborative editing

Pros:

  • Complete audit trail
  • Can rebuild any past state
  • Event replay for debugging

Cons:

  • More storage
  • Complex queries
  • Event versioning

Saga Pattern

What it is: Managing distributed transactions across microservices.

How it works:

  • Break transaction into steps
  • Each step has compensating action
  • If step fails, run compensating actions

Example: E-commerce order

1. Reserve inventory → Compensate: Release inventory
2. Charge payment → Compensate: Refund payment
3. Ship order → Compensate: Cancel shipment

Types:

1. Choreography

  • Services coordinate via events
  • No central controller

2. Orchestration

  • Central coordinator
  • Tells services what to do

Real-world examples:

  • Uber ride booking
  • Airbnb reservation
  • E-commerce checkout

Pros:

  • Handles distributed transactions
  • Maintains consistency
  • Fault tolerant

Cons:

  • Complex to implement
  • Hard to debug
  • Compensating actions needed

Performance Optimization

Making your system faster.

Database Query Optimization

Techniques:

1. Use Indexes

CREATE INDEX idx_user_email ON users(email);

*2. Avoid SELECT **

-- Bad
SELECT * FROM users;

-- Good
SELECT id, name, email FROM users;

3. Use LIMIT

SELECT * FROM posts ORDER BY created_at DESC LIMIT 10;

4. Avoid N+1 Queries

-- Bad: 1 query + N queries
SELECT * FROM posts;
-- Then for each post:
SELECT * FROM users WHERE id = post.user_id;

-- Good: 1 query with JOIN
SELECT posts.*, users.name 
FROM posts 
JOIN users ON posts.user_id = users.id;

5. Use Query Explain

EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

Connection Pooling

What it is: Reusing database connections instead of creating new ones.

Why it matters:

  • Creating connection: 50ms
  • Reusing connection: 0.1ms
  • 500x faster!

How it works:

  1. Create pool of connections at startup
  2. Request needs database → Get connection from pool
  3. Request done → Return connection to pool
  4. Reuse for next request

Configuration:

Min connections: 5
Max connections: 20
Idle timeout: 10 minutes

Real-world examples:

  • Shopify uses connection pooling for millions of stores
  • Twitter pools connections to handle billions of tweets

Batch Processing

What it is: Processing multiple items together instead of one at a time.

Example:

// Bad: 1000 database calls
for (user in users) {
  database.save(user);
}

// Good: 1 database call
database.batchSave(users);

Real-world examples:

  • Email sending: Batch 1000 emails
  • Data import: Batch insert rows
  • Image processing: Process multiple images

Pros:

  • Much faster
  • Reduces overhead
  • Better resource usage

Cons:

  • All-or-nothing (one failure affects batch)
  • Memory usage
  • Delayed feedback

Lazy Loading

What it is: Load data only when needed, not upfront.

Example:

// Eager loading: Load everything
user = getUser(id);
user.posts = getAllPosts(user.id);
user.comments = getAllComments(user.id);

// Lazy loading: Load on demand
user = getUser(id);
// Posts loaded only when accessed
if (needPosts) {
  user.posts = getPosts(user.id);
}

Real-world examples:

  • Facebook lazy loads images as you scroll
  • Netflix lazy loads video thumbnails
  • Gmail lazy loads old emails

Pros:

  • Faster initial load
  • Saves bandwidth
  • Better performance

Cons:

  • Delayed loading
  • Multiple requests
  • Complexity

Pagination

What it is: Breaking large result sets into pages.

Types:

1. Offset-Based

SELECT * FROM posts 
ORDER BY created_at DESC 
LIMIT 10 OFFSET 20;
  • Simple
  • Slow for large offsets

2. Cursor-Based

SELECT * FROM posts 
WHERE id < last_seen_id 
ORDER BY id DESC 
LIMIT 10;
  • Fast for any page
  • Consistent results

Real-world examples:

  • Twitter uses cursor-based pagination
  • Google Search uses offset-based
  • Instagram uses cursor-based for feed

Key Metrics & SLAs

📊 Numbers That Matter

Understanding and measuring system performance is critical for production systems.

Latency

What it is: Time between request and response.

Measurements:

  • P50 (Median): 50% of requests faster than this
  • P95: 95% of requests faster than this
  • P99: 99% of requests faster than this
  • P99.9: 99.9% of requests faster than this

Example:

P50: 50ms   (half of users see this)
P95: 200ms  (95% of users see this or better)
P99: 500ms  (99% of users see this or better)

Why percentiles matter: Average can be misleading. If 99% of requests take 50ms but 1% take 10 seconds, average is 150ms but user experience is bad.

Targets:

  • Web pages: < 200ms
  • Mobile apps: < 100ms
  • Real-time: < 50ms
  • Batch: seconds to minutes

Throughput

What it is: Number of requests processed per unit time.

Measurements:

  • RPS: Requests Per Second
  • QPS: Queries Per Second
  • TPS: Transactions Per Second

Real-world examples:

  • Google Search: 99,000 queries per second
  • Twitter: 6,000 tweets per second (peak)
  • Netflix: 1 billion hours watched per week

Availability

What it is: Percentage of time system is operational.

🎯 The Nines of Availability

Availability Downtime per Year Cost
99% 3.65 days $
99.9% 8.76 hours $$
99.99% 52.56 minutes $$$
99.999% 5.26 minutes $$$$

💰 Cost of nines: Each additional nine costs 10x more.

Real-world SLAs:

  • AWS S3: 99.99%
  • Google Cloud: 99.95%
  • Stripe: 99.99%

SLA vs SLO vs SLI

SLI (Service Level Indicator)

  • Metric you measure
  • Example: API latency, error rate

SLO (Service Level Objective)

  • Target for SLI
  • Example: 99.9% of requests < 200ms

SLA (Service Level Agreement)

  • Contract with consequences
  • Example: 99.9% uptime or refund

Estimation Techniques

Back-of-the-envelope calculations for interviews.

Traffic Estimation

Example: Design Twitter

Given:

  • 500 million users
  • 200 million daily active users (DAU)
  • Each user posts 2 tweets per day
  • Each user views 100 tweets per day

Calculations:

Writes:

200M DAU × 2 tweets/day = 400M tweets/day
400M / 86,400 seconds = 4,630 tweets/second
Peak (3x average) = 14,000 tweets/second

Reads:

200M DAU × 100 tweets/day = 20B tweet views/day
20B / 86,400 seconds = 231,000 reads/second
Peak = 700,000 reads/second

Read/Write Ratio: 50:1 (read-heavy)

Storage Estimation

Example: Design Instagram

Given:

  • 500 million users
  • 100 million photos uploaded per day
  • Average photo size: 2MB

Calculations:

Daily storage:

100M photos × 2MB = 200TB per day

5-year storage:

200TB × 365 days × 5 years = 365PB

With replication (3x):

365PB × 3 = 1.1 Exabytes

Bandwidth Estimation

Example: Design YouTube

Given:

  • 1 billion hours watched per day
  • Average video quality: 5 Mbps

Calculations:

Bandwidth:

1B hours × 3600 seconds × 5 Mbps
= 18 Exabits per day
= 208 Terabits per second

Useful numbers to remember:

  • 1 million = 10^6
  • 1 billion = 10^9
  • 1 KB = 1,000 bytes
  • 1 MB = 1,000 KB
  • 1 GB = 1,000 MB
  • 1 TB = 1,000 GB
  • 1 day = 86,400 seconds
  • 1 month = 2.5M seconds (roughly)

Common Terminology Glossary

Quick reference for essential terms.

API (Application Programming Interface)

  • Interface for services to communicate
  • REST, GraphQL, gRPC

Latency

  • Time for request to complete
  • Lower is better

Throughput

  • Requests processed per second
  • Higher is better

Bandwidth

  • Data transfer capacity
  • Measured in Mbps or Gbps

RPS/QPS

  • Requests/Queries Per Second
  • Measure of load

SLA/SLO/SLI

  • Service Level Agreement/Objective/Indicator
  • Availability guarantees

Idempotency

  • Operation can be repeated safely
  • GET is idempotent, POST might not be

Stateless

  • Server doesn’t store session data
  • Each request is independent

Stateful

  • Server stores session data
  • Requests depend on previous state

Synchronous

  • Wait for response before continuing
  • Blocking

Asynchronous

  • Don’t wait for response
  • Non-blocking

Hot Data

  • Frequently accessed
  • Keep in cache

Warm Data

  • Occasionally accessed
  • Keep in fast storage

Cold Data

  • Rarely accessed
  • Archive to cheap storage

Read-Heavy System

  • More reads than writes
  • Example: Social media feeds

Write-Heavy System

  • More writes than reads
  • Example: Logging, analytics

Eventual Consistency

  • Data becomes consistent eventually
  • Temporary inconsistency OK

Strong Consistency

  • Data always consistent
  • All nodes see same data

Horizontal Scaling

  • Add more machines
  • Scale out

Vertical Scaling

  • Add more power to machine
  • Scale up

Sharding

  • Split data across machines
  • Horizontal partitioning

Replication

  • Copy data across machines
  • For redundancy and reads

Failover

  • Switch to backup when primary fails
  • Automatic recovery

Circuit Breaker

  • Stop calling failing service
  • Prevent cascading failures

Rate Limiting

  • Restrict requests per time period
  • Prevent abuse

CDN

  • Content Delivery Network
  • Serve content from edge servers

Load Balancer

  • Distribute traffic across servers
  • Improve availability

Message Queue

  • Buffer for async processing
  • Decouple services

Microservices

  • Small, independent services
  • Loosely coupled

Monolith

  • Single large application
  • Tightly coupled

Interview Framework: STAR Approach

⭐ Ace Your System Design Interview

How to tackle system design interviews with a proven framework.

S
Scope

5-10 min

T
Traffic

5 min

A
Architecture

30-35 min

R
Refinement

10-15 min

STAR Interview Framework Timeline Visual timeline showing the four phases of system design interview: Scope, Traffic, Architecture, and Refinement with time allocations S Scope Clarify requirements • Functional needs • Non-functional • Constraints 5-10 min T Traffic Estimate scale • DAU calculation • QPS estimation • Storage needs 5 min A Architecture Design the system • High-level design • Database schema • API design 30-35 min R Refinement Optimize & discuss • Bottlenecks • Trade-offs • Edge cases 10-15 min Total: 45-60 minutes

S - Scope (5-10 minutes)

Clarify requirements:

Functional:

  • What features?
  • What’s in scope?
  • What’s out of scope?

Non-functional:

  • How many users?
  • How much data?
  • How fast?
  • How available?

Example questions:

  • “Should we support video or just images?”
  • “Do we need real-time updates?”
  • “What’s the expected traffic?”
  • “Any specific latency requirements?”

T - Traffic (5 minutes)

Estimate scale:

Calculate:

  • Daily active users
  • Requests per second
  • Storage needed
  • Bandwidth required

Example:

100M users
10M DAU
Each user makes 10 requests/day
= 100M requests/day
= 1,157 requests/second
Peak (3x) = 3,500 requests/second

A - Architecture (30-35 minutes)

Design the system:

Start high-level:

  1. Draw basic components
  2. Show data flow
  3. Explain technology choices

Then dive deeper:

  1. Database schema
  2. API design
  3. Caching strategy
  4. Scaling approach

Example flow:

Client → Load Balancer → App Servers → Cache → Database
                                     → Message Queue → Workers

R - Refinement (10-15 minutes)

Identify bottlenecks:

  • What fails first as you scale?
  • How do you fix it?

Discuss trade-offs:

  • Why this choice over alternatives?
  • What are the downsides?

Address concerns:

  • Security
  • Monitoring
  • Deployment
  • Cost

Common Mistakes to Avoid

⚠️ Learn from Others' Errors

Avoid these common pitfalls in system design interviews and real-world projects.

❌ Jumping to solutions

Don't start designing before understanding requirements. Ask clarifying questions first.

❌ Over-engineering

Don't use microservices for 1,000 users. Start simple, add complexity when needed.

❌ Ignoring trade-offs

Every decision has pros and cons. Discuss both sides.

❌ Forgetting non-functional requirements

Don't just focus on features. Consider scalability, availability, latency.

❌ Not considering failures

Systems fail. Discuss redundancy, failover.

❌ Ignoring monitoring

You can't fix what you can't see. Include logging, metrics, alerts.

1. Jumping to solutions

  • Don’t start designing before understanding requirements
  • Ask clarifying questions first

2. Over-engineering

  • Don’t use microservices for 1,000 users
  • Start simple, add complexity when needed

3. Ignoring trade-offs

  • Every decision has pros and cons
  • Discuss both sides

4. Forgetting non-functional requirements

  • Don’t just focus on features
  • Consider scalability, availability, latency

5. Not considering failures

  • Systems fail
  • Discuss redundancy, failover

6. Ignoring monitoring

  • You can’t fix what you can’t see
  • Include logging, metrics, alerts

7. Unrealistic estimates

  • Use reasonable numbers
  • Show your calculations

8. Not asking questions

  • Interviewers expect questions
  • Clarify ambiguities

9. Going too deep too fast

  • Start high-level
  • Dive deep only when asked

10. Not managing time

  • 45-60 minute interview
  • Allocate time wisely

Conclusion

🎯 You're Ready to Design Systems

System design isn't about memorizing solutions. It's about understanding building blocks and knowing when to use each one.

You now have the vocabulary. You understand the concepts. You know the trade-offs.

💡 Key Takeaways

Start simple. Every system begins with basic components. Add complexity only when you have a specific problem to solve.

Understand trade-offs. There's no perfect solution. Consistency vs availability. Latency vs throughput. Cost vs performance. Every decision has consequences.

Think in layers. Client, load balancer, application, cache, database. Each layer solves specific problems.

Scale incrementally. Don't design for a billion users on day one. Scale as problems emerge.

Practice. Design systems you use daily. How would you build Twitter? YouTube? Uber? Start simple, identify bottlenecks, add complexity.


Quick Reference Cheat Sheet

📋 System Design Quick Reference

Bookmark this section for quick lookups during interviews and design sessions

⚖️ Scalability

Vertical: Add more power (CPU, RAM)

Horizontal: Add more machines

Auto-scaling: Dynamic based on load

Use: Start vertical, scale horizontal

🗄️ Databases

SQL: ACID, relationships, structured

NoSQL: Scale, flexible, eventual consistency

Replication: Primary + Replicas for reads

Use: SQL for transactions, NoSQL for scale

⚡ Caching

Layers: Browser → CDN → Redis → DB

Speed: 0ms → 20ms → 1ms → 50ms

Strategies: Cache-aside, Write-through

Use: Cache hot data, set TTL

🔄 Load Balancing

Algorithms: Round Robin, Least Connections

Types: Layer 4 (fast) vs Layer 7 (flexible)

Health Checks: Every 5s, 2 failures = out

Use: Distribute traffic, enable redundancy

⚖️ CAP Theorem

CP: Consistency + Partition (MongoDB)

AP: Availability + Partition (Cassandra)

Trade-off: Can't have all three

Use: CP for banking, AP for social media

📬 Message Queues

Purpose: Async processing, decouple services

Tools: Kafka, RabbitMQ, AWS SQS

Patterns: Point-to-point, Pub/Sub

Use: Email, notifications, background jobs

📊 Availability

99.9%: 8.76 hours downtime/year

99.99%: 52 minutes downtime/year

99.999%: 5 minutes downtime/year

Cost: Each nine costs 10x more

🔧 Microservices

Pros: Independent deploy, scale, tech

Cons: Complex, network overhead

Needs: API Gateway, Service Discovery

Use: Large teams, need independent scaling

🎯 Golden Rules for System Design

1. Start Simple: Don't over-engineer. Add complexity only when needed.

2. Know Trade-offs: Every decision has pros and cons. Discuss both.

3. Scale Incrementally: Design for current needs + 10x growth.

4. Plan for Failure: Everything fails. Design for redundancy.

5. Monitor Everything: You can't fix what you can't see.

6. Ask Questions: Clarify requirements before designing.


What’s Next?

🚀 Continue Your Learning Journey

This guide covered the fundamentals. Each concept deserves deeper exploration. In upcoming posts, we'll dive into:

💾 Caching Deep Dive

Strategies, invalidation, distributed caching

🗄️ Database Sharding

Consistent hashing, rebalancing, cross-shard queries

🔧 Microservices Patterns

Service mesh, API gateway, saga pattern

🏗️ Real System Designs

Twitter, Instagram, Uber, Netflix

📚 The best way to learn is to practice.

Pick a system and design it. Start with requirements, estimate scale, draw architecture, identify bottlenecks.

Resources for continued learning:

  • System Design Primer (GitHub)
  • Designing Data-Intensive Applications (Book)
  • Company engineering blogs (Netflix, Uber, Airbnb)
  • System design interview courses

Real-World Case Studies

🏢 How Tech Giants Use These Concepts

Real implementations from companies you know

N

Netflix: Microservices at Scale

200M+ subscribers, 1B+ hours watched weekly

Architecture Decisions:

  • Microservices: 700+ services for different features (recommendations, billing, streaming)
  • CDN: Open Connect CDN with servers in ISPs worldwide for low latency
  • Cassandra: NoSQL for viewing history (billions of records, eventual consistency OK)
  • Chaos Engineering: Chaos MonkeyA tool developed by Netflix that randomly terminates instances in production to test system resilience and ensure services can withstand failures. Part of the Simian Army suite.Learn more → randomly kills servers to test resilience
  • Auto-scaling: AWS auto-scaling handles traffic spikes during new releases

💡 Key Takeaway: Microservices enable independent scaling and deployment. Each team owns their service end-to-end.

📷

Instagram: Scaling Photo Storage

2B+ users, 100M+ photos uploaded daily

Architecture Decisions:

  • Sharding: PostgreSQL sharded by user ID (thousands of shards)
  • CDN: Facebook CDN serves images from edge locations worldwide
  • Caching: Memcached for feed data, Redis for real-time features
  • Async Processing: Celery queues for image processing (thumbnails, filters)
  • Read Replicas: Multiple replicas per shard for read scaling

💡 Key Takeaway: Sharding enables horizontal scaling of databases. CDN reduces latency for global users.

🚗

Uber: Real-Time Matching System

20M+ rides daily, sub-second matching

Architecture Decisions:

  • Geospatial Indexing: Custom geo-indexing for fast driver lookup by location
  • Kafka: Event streaming for real-time location updates
  • Redis: In-memory cache for active drivers and riders
  • Microservices: 2000+ services (matching, pricing, routing, payments)
  • Circuit Breakers: Prevent cascading failures between services

💡 Key Takeaway: Real-time systems need in-memory caching and event streaming. Geospatial indexing enables fast location queries.

🐦

Twitter: Timeline Generation

500M tweets daily, 6000 tweets/second peak

Architecture Decisions:

  • Fan-out on Write: Pre-compute timelines for followers when tweet posted
  • Redis: Cache timelines in memory for instant loading
  • Manhattan: Custom distributed database for tweets (key-value store)
  • Hybrid Approach: Fan-out for normal users, on-demand for celebrities (millions of followers)
  • Rate Limiting: Prevent abuse and ensure fair usage

💡 Key Takeaway: Pre-computation (fan-out) trades write cost for read speed. Hybrid approaches handle edge cases.


Practice Problems

💪 Test Your Knowledge

Try designing these systems using concepts from this guide

BEGINNER

Design a URL Shortener (like bit.ly)

Requirements:

  • Generate short URL from long URL
  • Redirect short URL to original URL
  • Track click analytics
  • Handle 100M URLs, 1000 requests/second
💡 Hints (click to expand)

• Use base62 encoding for short URLs (a-z, A-Z, 0-9)

• SQL database for URL mappings (small dataset)

• Redis cache for popular URLs

• Async queue for analytics processing

INTERMEDIATE

Design Instagram Feed

Requirements:

  • Users can post photos and follow others
  • Generate personalized feed of followed users' posts
  • Support likes and comments
  • Handle 1B users, 100M daily active users
💡 Hints (click to expand)

• Sharded PostgreSQL for user data and relationships

• CDN for image storage and delivery

• Redis for pre-computed feeds (fan-out on write)

• Cassandra for activity logs (likes, comments)

• Message queue for async feed generation

ADVANCED

Design Uber Ride Matching System

Requirements:

  • Match riders with nearby drivers in real-time
  • Track driver locations continuously
  • Calculate dynamic pricing (surge)
  • Handle 20M rides daily, sub-second matching
💡 Hints (click to expand)

• Geospatial indexing (QuadTree/S2) for location queries

• Redis for active driver/rider state (in-memory)

• Kafka for real-time location streaming

• Microservices: matching, pricing, routing, payments

• WebSockets for real-time updates to apps

• Circuit breakers between services

📝 How to Practice:

  1. Start with requirements - clarify functional and non-functional needs
  2. Estimate scale - calculate QPS, storage, bandwidth
  3. Draw high-level architecture - components and data flow
  4. Identify bottlenecks - what fails first as you scale?
  5. Optimize - add caching, sharding, replication as needed
  6. Discuss trade-offs - why this choice over alternatives?

Let’s Connect

System design is a journey. I’m constantly learning from real-world systems and sharing discoveries.

Have questions about specific concepts? Designing a system and want feedback? Reach out—I love discussing architecture and trade-offs.

Remember: every massive system started simple. Twitter began as a basic web app. Instagram was just photo uploads. They evolved by solving one problem at a time.

You now have the foundation. Start designing, keep learning, and watch these concepts become second nature.

Happy designing!

Share this article

Help others discover this content

Comments & Discussion

Join the conversation! Share your thoughts, ask questions, or provide feedback below.

Continue Reading

Related Articles

Explore more content you might find interesting