Master the infrastructure component that enables horizontal scaling. Learn how load balancers distribute traffic, enable high availability, and prevent server overload.
Save
Complete lesson & earn 250 PX
Load balancers sit between users and your , distributing requests evenly so no single server gets overwhelmed. They are the key to scaling from 100 users to 100 million users.
EXERCISE
5Route requests to the server currently handling the fewest active connections. Smart for variable request durations.
Save
When a client requests data, an HTTP connection opens between load balancer and . Connection stays open until server responds. Then it closes.
Connection Count = Current Server Load
Server with 50 active connections is busier than server with 10 active connections.
Load balancer tracks active connections to each server.
Current State:
New request arrives. Load balancer routes to Server 2 (fewest connections).
After routing:
Next request arrives. Now routes to Server 2 again (still fewest at 9).
Load balancer continuously picks the server with minimum active connections.
Request Duration Varies: Some requests finish in 100ms. Others take 10 seconds.
Round Robin Fails: Could send 10 quick requests to Server 1 and 10 slow requests to Server 2. Server 2 becomes overloaded despite equal request count.
Least Connections Wins: Recognizes Server 2 is busy (many active connections) and stops sending more requests until it frees up.
Video Encoding: Some videos encode in 5 seconds. Others take 2 minutes. Huge variance.
Least connections prevents dumping multiple slow jobs on one server while others sit idle.
Queries: Simple queries (fetch user by ID) finish instantly. Complex reports take 30 seconds.
Least connections routes based on actual server , not arbitrary request count.
API with Mixed Endpoints: /health responds in 10ms. /generate-report takes 20 seconds.
Least connections naturally balances heavy and light requests.
Uber: Ride matching algorithms have variable processing times. Simple match (rider and driver both nearby) is instant. Complex match (surge pricing, driver preferences, optimal routing) takes longer.
Least connections ensures servers handling complex matches do not get buried with more requests while others idle.
When request processing time is unpredictable, least connections dynamically balances load based on real server availability, not blind distribution.
EXERCISE
1A single server has limits. When traffic exceeds capacity, response times spike, then the server crashes. Users see error pages. Your business loses money.
You built an e-commerce site. Launched successfully. Initially, 100 concurrent users. One handles everything smoothly.
Black Friday arrives. Traffic explodes. 10,000 concurrent users hit your site.
What happens?
Your single server drowns. CPU hits 100%. Memory maxes out. Requests queue up. Response time jumps from 100ms to 30 seconds. Users get frustrated. Many leave. Server crashes entirely. Site goes down.
Revenue lost. Reputation damaged.
Limited Resources: Every server has finite CPU, memory, and network capacity. Exceed those limits and performance collapses.
Single Point of Failure: Hardware fails. Software crashes. Network issues occur. When your only server dies, your entire application dies.
Cannot Scale: Need more capacity? Upgrading one server (vertical scaling) hits physical limits and costs explode.
Add more servers! But now a new problem appears.
Multiple Servers, New Problems:
Users need to know which server to connect to. Do they manually choose? What if they pick the overloaded server while others sit idle?
How do servers coordinate? How do you ensure even distribution?
Save
EXERCISE
2A load balancer sits between users and servers, acting as the single point of contact. It intelligently distributes incoming requests across multiple servers.
Before Load Balancer:
User → <TopicPreview slug="server">Server</TopicPreview> 1
User → Server 2
User → Server 3
Users must know all server addresses. Manual selection. Chaos.
With Load Balancer:
User → Load Balancer → Server 1
→ Server 2
→ Server 3
Users only know the load balancer address. It handles everything.
Step 1: User types api.example.com in browser
Step 2: DNS resolves to load balancer IP address
Step 3: User request arrives at load balancer
Step 4: Load balancer picks one backend server (using an algorithm)
Step 5: Load balancer forwards request to chosen server
Step 6: Server processes request and responds to load balancer
Step 7: Load balancer sends response back to user
Save
EXERCISE
3The simplest distribution method. Requests cycle through servers in order: Server 1, Server 2, Server 3, back to Server 1. Fair and predictable.
Setup: You have 3 identical behind load balancer.
Request Flow:
Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1 (cycle repeats)
Request 5 → Server 2
Request 6 → Server 3
Pattern continues forever. Each server gets equal requests.
Uniform Distribution: Over time, each server receives approximately the same number of requests.
Simplicity: No complex calculations. Just cycle through the list.
Predictability: You know exactly which server handles the next request.
Identical Servers: All servers have same CPU, memory, and configuration.
Similar Requests: All requests take roughly the same time to process.
Stateless Applications: No need to route specific users to specific servers.
Example: Static content delivery. Serving images or CSS files. Each request is quick and similar.
CDN (Content Delivery Network): Serving static assets like images. All servers identical. All requests similar. Round robin distributes load perfectly.
Save
EXERCISE
4When servers have different capacities, assign weights. More powerful servers receive proportionally more requests.
You have 3 :
Basic round robin sends equal requests to all three. But Server 2 can handle 2x more load!
Result: Server 2 underutilized. Servers 1 and 3 struggle. Inefficient.
Assign weights based on capacity:
Request Distribution:
Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 2 (gets 2nd request due to weight)
Request 4 → Server 3
Request 5 → Server 1 (cycle repeats)
Pattern: 1 → 2 → 2 → 3 (ratio 1:2:1)
Server 2 receives twice as many requests. Perfect for its capacity.
Non-Uniform Infrastructure: Servers have different specs. Mix of old and new hardware.
Gradual Migration: Moving from old servers to new ones. Give new servers higher weight while phasing out old ones.
Cost Optimization: Use cheaper, smaller servers for light load. Reserve powerful servers for heavy lifting.
Save
When a server crashes, how do users know to try a different one?
This chaos needs organization. Enter the load balancer.
Target (2013): Website crashed during Black Friday sale. Single point of failure cost millions in lost sales.
Healthcare.gov launch: Initial deployment could not handle traffic. Poor load distribution caused catastrophic failures.
Your application: Without , growth itself becomes your enemy. Success kills your service.
User never knows which server handled the request. Abstraction complete.
The load balancer has a static IP or domain name. Servers behind it can be added, removed, or replaced without users noticing.
Example:
Monday: 3 servers behind load balancer
Wednesday: 10 servers (traffic spike)
Friday: Back to 3 servers
Users connect to api.example.com the entire time. They see zero changes. Load balancer handles scaling invisibly.
Load balancers are not just for end users. Internal services use them too.
Authentication Service wants data from Profile Service:
Auth API → Profile Load Balancer → Profile Server 1
→ Profile Server 2
→ Profile Server 3
Auth service does not need to know about Profile servers. It talks to Profile load balancer. This decouples services beautifully.
Abstraction: Complexity hidden from users and other services
Flexibility: Add/remove servers without coordination
Resilience: Server fails? Load balancer routes around it
Simplicity: One address to remember, not dozens
Netflix uses round robin for serving video thumbnails across their edge servers.
Assumes Uniformity: If one server is slower or busier, round robin keeps sending requests anyway.
No Intelligence: Does not consider server health or current load.
These limitations lead us to smarter algorithms...
E-commerce Peak Hours:
Normal hours: 3 small servers (weight = 1 each)
Black Friday: Add 2 large servers (weight = 3 each)
Small servers handle 1x load each. Large servers handle 3x load each.
Configuration:
Server 1 (small): weight = 1
Server 2 (small): weight = 1
Server 3 (small): weight = 1
Server 4 (large): weight = 3
Server 5 (large): weight = 3
Total weight = 9. Large servers handle 6/9 (66%) of traffic with only 2/5 (40%) of servers.
Match request distribution to server capacity. Weighted round robin ensures efficient resource utilization when infrastructure is not uniform.
EXERCISE
7Load balancers enable horizontal scaling and high availability - the two pillars of modern internet infrastructure.
Save
The Old Way (No Load Balancer):
You have 1 handling 1000 requests/minute. Traffic doubles to 2000 requests/minute.
Options:
The Load Balancer Way:
Add servers behind load balancer. Done.
Steps:
Zero downtime. Zero user impact. Instant capacity increase.
Startup Launch:
Day 1: 1 server behind load balancer (100 users)
Month 1: 3 servers (traffic growing)
Month 6: 10 servers (going viral)
Black Friday: Temporarily 50 servers
Week after: Scale back to 15 servers
Load balancer makes this trivial. Without it, this scaling would require complex coordination and would break user experience.
Cloud platforms (AWS, Google Cloud) combine load balancers with auto-scaling.
Traffic increases → Metrics trigger scale-up → New servers launch automatically → Load balancer adds them to pool
Traffic decreases → Servers terminate → Load balancer removes them
Zero human intervention. Infrastructure scales itself.
Example: Netflix scales from 5,000 servers during daytime to 15,000 servers during evening peak viewing, back to 5,000 overnight. Fully automated.
Server Failure Without Load Balancer:
Your single server crashes at 2 AM. Application goes down. Users see error pages. You get emergency calls. Rush to fix and restart server.
Downtime: 30 minutes to 2 hours. Lost revenue. Angry users.
Server Failure With Load Balancer:
You have 3 servers behind load balancer.
2 AM: Server 2 crashes.
What happens?
Load balancer health check detects Server 2 not responding. Instantly stops routing traffic to Server 2. All new requests go to Server 1 and Server 3.
Users experience zero downtime. They never know Server 2 crashed.
Next morning: You wake up, see alert, fix Server 2, add it back. Load balancer resumes sending traffic.
Total user-facing downtime: 0 minutes.
Load balancers continuously check server health.
Every 10 seconds: Send request to /health endpoint on each server.
Server responds: Healthy. Keep routing traffic.
Server fails to respond: Unhealthy. Stop routing traffic immediately.
Server recovers: Automatically added back to rotation.
This happens 24/7 automatically. No human needed.
Need to deploy new code? With load balancers, zero downtime.
Process:
At any moment, 2 servers handle traffic while 1 updates. Users never experience downtime.
Before load balancers: Maintenance required scheduling downtime windows. "Site down for maintenance 2-4 AM." Unacceptable for modern applications.
Advanced: Load balancers can route traffic across multiple data centers.
US users → US load balancer → US servers
Europe users → Europe load balancer → Europe servers
One fails? Load balancer redirects all traffic to healthy data center.
Example: AWS uses this for services like S3. Multiple data centers per region. Seamless failover.
Before load balancers: Scaling was painful. Downtime was inevitable.
After load balancers: Scaling is trivial. Downtime is nearly eliminated.
Load balancers turned scaling and availability from hard problems into solved problems. This is why every major application uses them.
EXERCISE
6Route users to the same server consistently using hashing. Critical for session management and caching strategies.
User logs into your application. Session data (shopping cart, preferences) stored in memory.
What if next request goes to a different server?
New server has no session data. User appears logged out. Cart is empty. Terrible experience.
Solution: Sticky sessions. Same user always routes to same server.
Step 1: Choose a routing key (user ID, IP address, session token)
Step 2: Hash the key through a hash function
Step 3: Modulo with number of servers
Step 4: Result determines which server handles request
Example:
3 servers. User ID = 12345.
hash(12345) = 8732
8732 % 3 = 1
→ Route to Server 1
Every request from user 12345 goes through same calculation. Always routes to Server 1.
Deterministic: Same input always produces same output. User 12345 always gets Server 1.
Save
EXERCISE
8Modern applications use load balancers at multiple levels. Understanding production patterns helps you design better systems.
Enterprise applications use load balancers at multiple levels.
Example: E-Commerce Application
Internet
↓
[Global Load Balancer] - Routes by geography
↓
[Regional Load Balancer] - Routes to app <TopicPreview slug="server">servers</TopicPreview>
↓
[App Servers] - Business logic
↓
[<TopicPreview slug="database">Database</TopicPreview> Load Balancer] - Routes to read replicas
↓
[Database Servers]
Layer 1: Geographic distribution (US, Europe, Asia)
Layer 2: Application tier load balancing
Layer 3: Database tier load balancing
Each layer optimizes for different goals.
Application Load Balancer (Layer 7):
Understands HTTP. Can route based on URL path, headers, .
Example:
/api/products → Product Service/api/users → User Service/api/orders → Order ServiceSame load balancer, different backend services based on URL.
Save
Uniform Distribution: Hash functions spread users evenly across servers. No server gets disproportionate load.
E-commerce Shopping Cart:
User adds items to cart. Cart stored in Server 2 memory.
All subsequent requests from this user hash to Server 2.
User can add more items, modify quantities, checkout - all requests hit Server 2.
Session data always available. Smooth shopping experience.
Common strategy: Hash client IP address.
Advantage: No user ID needed. Works for anonymous users.
Disadvantage: Users behind same corporate proxy share IP. All route to same server (imbalanced).
Better for authenticated users.
hash(user_id) % server_count
Even distribution. Each user consistently routed to same server.
Beyond sessions, hash-based routing improves .
User Profile Data: User 12345 requests profile. Server 1 caches it in memory.
Future requests from user 12345 → Server 1 → Instant cache hit.
No query needed. Lightning fast response.
Problem: Server count changes, modulo result changes.
3 servers: hash(12345) % 3 = 1 → Server 1
4 servers: hash(12345) % 4 = 0 → Server 0
User suddenly routes to different server. Session lost!
Solution: Consistent hashing (advanced topic, covered later).
Stateful Applications: Apps storing session data in server memory.
Caching Strategies: Maximizing cache hit rates through user locality.
Connections: Maintaining long-lived connections to specific servers.
Real-world example: Chat applications use hash-based routing to keep user connections on same server, enabling instant message delivery without database queries.
Network Load Balancer (Layer 4):
Routes based on IP and port. Extremely fast. No HTTP inspection.
Best for raw performance. Used for TCP/UDP traffic like databases, gaming, video streaming.
Global Load Balancer (DNS-based):
Routes based on client location. Directs users to nearest .
Example: Netflix uses this to route users to closest edge location.
AWS Elastic Load Balancer: Fully managed. Auto-scales. Integrates with EC2, ECS, Lambda.
Google Cloud Load Balancing: Global load balancing built-in. Routes across regions automatically.
Azure Load Balancer: Integrated with Azure services. Health included.
Key advantage: Managed services. No maintenance, automatic scaling, built-in monitoring.
Question: If load balancer handles all traffic, what happens when it fails?
Answer: Load balancers themselves are highly available.
How:
Cloud providers handle this automatically. AWS ELB runs across multiple zones. One zone fails? Others take over instantly.
Latency: Load balancer adds minimal latency (typically 1-5ms). Negligible compared to network round-trip time.
Throughput: Modern load balancers handle millions of requests per second.
SSL Termination: Load balancers can handle HTTPS encryption/decryption, offloading this work from backend servers.
Load balancers provide valuable metrics:
Request rate: Requests per second to each backend
Error rate: Failed requests percentage
Latency: Response time distribution
Health status: Which servers are healthy/unhealthy
These metrics help identify issues before users notice.
Example: Sudden spike in errors to Server 2? Investigate immediately. Maybe disk is full, memory leak, or bad deployment.
Health check endpoints: Create dedicated /health endpoints. Should check:
Timeout settings: Balance between patience and responsiveness. 30-second timeout is common.
Connection limits: Prevent one server from accepting too many connections.
Session stickiness: Enable only when necessary. Stateless applications do not need it.
Cloud load balancers: Pay for capacity and data transferred. Typically $20-50/month for small applications. Scales with usage.
DIY load balancers (, HAProxy): Free software, but you manage infrastructure, updates, high availability.
Most teams choose managed services: The operational burden of running load balancers yourself is not worth the cost savings.
Load balancers transformed from specialized infrastructure to essential commodity. Every production application uses them.
They enable:
Understanding load balancers deeply is fundamental to designing modern systems.