Understanding Connections

When a client requests data, an HTTP connection opens between load balancer and . Connection stays open until server responds. Then it closes.

Connection Count = Current Server Load

Server with 50 active connections is busier than server with 10 active connections.

How Least Connections Works

Load balancer tracks active connections to each server.

Current State:

Server 1: 15 connections
Server 2: 8 connections
Server 3: 12 connections

New request arrives. Load balancer routes to Server 2 (fewest connections).

After routing:

Server 1: 15 connections
Server 2: 9 connections (now handling new request)
Server 3: 12 connections

Next request arrives. Now routes to Server 2 again (still fewest at 9).

Load balancer continuously picks the server with minimum active connections.

Why This Matters

Request Duration Varies: Some requests finish in 100ms. Others take 10 seconds.

Round Robin Fails: Could send 10 quick requests to Server 1 and 10 slow requests to Server 2. Server 2 becomes overloaded despite equal request count.

Least Connections Wins: Recognizes Server 2 is busy (many active connections) and stops sending more requests until it frees up.

Perfect Use Cases

Video Encoding: Some videos encode in 5 seconds. Others take 2 minutes. Huge variance.

Least connections prevents dumping multiple slow jobs on one server while others sit idle.

Queries: Simple queries (fetch user by ID) finish instantly. Complex reports take 30 seconds.

Least connections routes based on actual server , not arbitrary request count.

API with Mixed Endpoints: /health responds in 10ms. /generate-report takes 20 seconds.

Least connections naturally balances heavy and light requests.

Real-World Example

Uber: Ride matching algorithms have variable processing times. Simple match (rider and driver both nearby) is instant. Complex match (surge pricing, driver preferences, optimal routing) takes longer.

Least connections ensures servers handling complex matches do not get buried with more requests while others idle.

Key Insight

When request processing time is unpredictable, least connections dynamically balances load based on real server availability, not blind distribution.

EXERCISE

Load Balancer: Your Traffic Controller

A load balancer sits between users and servers, acting as the single point of contact. It intelligently distributes incoming requests across multiple servers.

The Architecture

Before Load Balancer:

User → <TopicPreview slug="server">Server</TopicPreview> 1
User → Server 2  
User → Server 3

Users must know all server addresses. Manual selection. Chaos.

With Load Balancer:

User → Load Balancer → Server 1
                     → Server 2
                     → Server 3

Users only know the load balancer address. It handles everything.

How Requests Flow

Step 1: User types api.example.com in browser

Step 2: DNS resolves to load balancer IP address

Step 3: User request arrives at load balancer

Step 4: Load balancer picks one backend server (using an algorithm)

Step 5: Load balancer forwards request to chosen server

Step 6: Server processes request and responds to load balancer

Step 7: Load balancer sends response back to user

Save

Load Balancers - System Design Foundations Lesson | DevLoom

Benefit 1: Effortless Horizontal Scaling

The Old Way (No Load Balancer):

You have 1 handling 1000 requests/minute. Traffic doubles to 2000 requests/minute.

Options:

Upgrade server (vertical scaling): Limited by hardware, expensive, requires downtime
Add another server: Users must know two server addresses. Which one do they choose? Chaos.

The Load Balancer Way:

Add servers behind load balancer. Done.

Steps:

Deploy new server
Register it with load balancer
Load balancer automatically includes it in rotation

Zero downtime. Zero user impact. Instant capacity increase.

Real-World Scaling Example

Startup Launch:

Day 1: 1 server behind load balancer (100 users)
Month 1: 3 servers (traffic growing)
Month 6: 10 servers (going viral)
Black Friday: Temporarily 50 servers
Week after: Scale back to 15 servers

Load balancer makes this trivial. Without it, this scaling would require complex coordination and would break user experience.

Auto-Scaling Integration

Cloud platforms (AWS, Google Cloud) combine load balancers with auto-scaling.

Traffic increases → Metrics trigger scale-up → New servers launch automatically → Load balancer adds them to pool

Traffic decreases → Servers terminate → Load balancer removes them

Zero human intervention. Infrastructure scales itself.

Example: Netflix scales from 5,000 servers during daytime to 15,000 servers during evening peak viewing, back to 5,000 overnight. Fully automated.

Benefit 2: High (No More Downtime)

Server Failure Without Load Balancer:

Your single server crashes at 2 AM. Application goes down. Users see error pages. You get emergency calls. Rush to fix and restart server.

Downtime: 30 minutes to 2 hours. Lost revenue. Angry users.

Server Failure With Load Balancer:

You have 3 servers behind load balancer.

2 AM: Server 2 crashes.

What happens?

Load balancer health check detects Server 2 not responding. Instantly stops routing traffic to Server 2. All new requests go to Server 1 and Server 3.

Users experience zero downtime. They never know Server 2 crashed.

Next morning: You wake up, see alert, fix Server 2, add it back. Load balancer resumes sending traffic.

Total user-facing downtime: 0 minutes.

Health Checks Explained

Load balancers continuously check server health.

Every 10 seconds: Send request to /health endpoint on each server.

Server responds: Healthy. Keep routing traffic.

Server fails to respond: Unhealthy. Stop routing traffic immediately.

Server recovers: Automatically added back to rotation.

This happens 24/7 automatically. No human needed.

Rolling Updates (Zero Downtime Deployments)

Need to deploy new code? With load balancers, zero downtime.

Process:

Remove Server 1 from load balancer
Deploy new code to Server 1
Restart Server 1, verify healthy
Add Server 1 back to load balancer
Repeat for Server 2, Server 3

At any moment, 2 servers handle traffic while 1 updates. Users never experience downtime.

Before load balancers: Maintenance required scheduling downtime windows. "Site down for maintenance 2-4 AM." Unacceptable for modern applications.

Geographic Availability

Advanced: Load balancers can route traffic across multiple data centers.

US users → US load balancer → US servers
Europe users → Europe load balancer → Europe servers

One fails? Load balancer redirects all traffic to healthy data center.

Example: AWS uses this for services like S3. Multiple data centers per region. Seamless failover.

Why This Matters

Before load balancers: Scaling was painful. Downtime was inevitable.

After load balancers: Scaling is trivial. Downtime is nearly eliminated.

Load balancers turned scaling and availability from hard problems into solved problems. This is why every major application uses them.

EXERCISE

Algorithm 4: Hash-Based Routing

Route users to the same server consistently using hashing. Critical for session management and caching strategies.

The Problem: User Sessions

User logs into your application. Session data (shopping cart, preferences) stored in memory.

What if next request goes to a different server?

New server has no session data. User appears logged out. Cart is empty. Terrible experience.

Solution: Sticky sessions. Same user always routes to same server.

How Hash-Based Routing Works

Step 1: Choose a routing key (user ID, IP address, session token)

Step 2: Hash the key through a hash function

Step 3: Modulo with number of servers

Step 4: Result determines which server handles request

Example:

3 servers. User ID = 12345.

hash(12345) = 8732  
8732 % 3 = 1  
→ Route to Server 1

Every request from user 12345 goes through same calculation. Always routes to Server 1.

Why Hash Functions Work

Deterministic: Same input always produces same output. User 12345 always gets Server 1.

Save

EXERCISE

Load Balancers in the Real World

Modern applications use load balancers at multiple levels. Understanding production patterns helps you design better systems.

Multi-Layer

Enterprise applications use load balancers at multiple levels.

Example: E-Commerce Application

Internet
  ↓
[Global Load Balancer] - Routes by geography
  ↓
[Regional Load Balancer] - Routes to app <TopicPreview slug="server">servers</TopicPreview>
  ↓
[App Servers] - Business logic
  ↓
[<TopicPreview slug="database">Database</TopicPreview> Load Balancer] - Routes to read replicas
  ↓
[Database Servers]

Layer 1: Geographic distribution (US, Europe, Asia)
Layer 2: Application tier load balancing
Layer 3: Database tier load balancing

Each layer optimizes for different goals.

Types of Load Balancers

Application Load Balancer (Layer 7):

Understands HTTP. Can route based on URL path, headers, .

Example:

/api/products → Product Service
/api/users → User Service
/api/orders → Order Service

Same load balancer, different backend services based on URL.

Save

DevLoom

DevLoom

Load Balancers

Load Balancers

Lesson Contents

References

From Course

Share

Topics

Algorithm 3: Least Connections

Understanding Connections

How Least Connections Works

Why This Matters

Perfect Use Cases

Real-World Example

Key Insight

The Problem: One Server Cannot Handle Everything

Imagine This Scenario

Why One Server Fails

The Obvious Solution?

Load Balancer: Your Traffic Controller

The Architecture

How Requests Flow

Algorithm 1: Round Robin

How Round Robin Works

Why It Works

When Round Robin is Perfect

Real-World Example

Algorithm 2: Weighted Round Robin

The Problem with Basic Round Robin

Weighted Round Robin Solution

When to Use Weighted Round Robin

Real-World Impact

Key Insight

Service-to-Service Communication

Why This Design Wins

Limitations

Real-World Scenario

Key Takeaway

Why Load Balancers Changed Everything

Benefit 1: Effortless Horizontal Scaling

Real-World Scaling Example

Auto-Scaling Integration

Benefit 2: High Availability (No More Downtime)

Health Checks Explained

Rolling Updates (Zero Downtime Deployments)

Geographic Availability

Why This Matters

Algorithm 4: Hash-Based Routing

The Problem: User Sessions

How Hash-Based Routing Works

Why Hash Functions Work

Load Balancers in the Real World

Multi-Layer Load Balancing

Types of Load Balancers

Sticky Sessions in Action

IP-Based Hashing

User ID Hashing

Cache Locality Benefits

Limitation: Adding/Removing Servers

When to Use Hash-Based Routing

Cloud Load Balancer Services

Load Balancer as Single Point of Failure?

Performance Considerations

Monitoring and Observability

Configuration Best Practices

Cost Considerations

The Bottom Line

Benefit 2: High (No More Downtime)

Multi-Layer

Monitoring and