Consistent Hashing - System Design Foundations Lesson | DevLoom

The Distributed Storage Scenario

You built a key-value store. Think , but distributed across multiple .

Setup:

Proxy (routes requests)
  ↓
Storage Node 0
Storage Node 1  
Storage Node 2

Users send requests: "Store key K1 with value V1" or "Get value for key K3."

Question: Which node should store which key?

Hash-Based Routing: The Simple Approach

Algorithm:

hash(key) % number_of_nodes = node_index

Example:

hash("K1") % 3 = 2 → Store on Node 2
hash("K2") % 3 = 0 → Store on Node 0
hash("K3") % 3 = 1 → Store on Node 1
hash("K4") % 3 = 2 → Store on Node 2
hash("K5") % 3 = 1 → Store on Node 1
hash("K6") % 3 = 0 → Store on Node 0

Distribution:

Node 0: K2, K6
Node 1: K3, K5
Node 2: K1, K4

This works perfectly.

Every key has an owner. All requests for K3 always go to Node 1 because the hash function output never changes.

Why This Matters: Statefulness

Stateless systems (like API servers):

Any server can handle any request. User logs in? Any server works. They hold no data.

If server crashes, send request to another server. No problem.

Stateful systems (like storage nodes):

Data lives on specific servers. K3 exists only on Node 1.

If you send request for K3 to Node 0, it does not have the data. Miss.

Ownership matters.

The Disaster: Removing a Node

Your system is over-provisioned. Traffic dropped. You want to remove Node 2 to save costs.

Before removal:

hash(key) % 3 = node

After removal (now 2 nodes):

hash(key) % 2 = node

The hash function changed.

What Happens to Your Data?

Let us recalculate where every key belongs now:

hash("K1") % 2 = 0 → Now belongs to Node 0 (was Node 2)
hash("K2") % 2 = 0 → Still Node 0 (no change)
hash("K3") % 2 = 1 → Still Node 1 (no change)
hash("K4") % 2 = 0 → Now belongs to Node 0 (was Node 2)
hash("K5") % 2 = 1 → Still Node 1 (no change)
hash("K6") % 2 = 0 → Still Node 0 (no change)

Keys that moved: K1, K4 (2 out of 6 = 33%)

Keys that stayed: K2, K3, K5, K6 (4 out of 6 = 67%)

The Expensive Rebalancing Process

Before you can remove Node 2, you must:

Step 1: Iterate through ALL keys in the system

Step 2: Recalculate ownership with new hash function (% 2 instead of % 3)

Step 3: Move data between nodes:

Copy K1 from Node 2 → Node 0
Copy K4 from Node 2 → Node 0

Step 4: Verify all data transferred correctly

Step 5: Only then remove Node 2

The Real-World Impact

Scenario: 1 billion keys, each 1 KB.

Data to move: 333 million keys = 333 GB of data transfer.

Time: At 100 MB/s network speed = 55 minutes of intensive data movement.

During this time: System under heavy load. Risk of failures. Users may experience slowness.

Cost: Network bandwidth, CPU for hashing, disk I/O. Expensive.

The Core Problem

Traditional hashing tightly couples ownership to the number of nodes.

Change node count → Hash function changes → Ownership changes globally → Massive data movement.

For stateless systems (API servers): No problem. They hold no data.

For stateful systems (, caches): This is catastrophic.

The Million Dollar Question

Can we change the number of nodes without rehashing everything?

Can we minimize data movement to only the affected keys, not ALL keys?

This is what consistent hashing solves.

Why Companies Need This

Amazon DynamoDB: Billions of keys. Adding/removing nodes daily for auto-scaling.

Netflix CDN: Thousands of edge servers. Servers crash. New servers spin up constantly.

Redis Cluster: Distributed cache. Needs to scale without downtime.

All face the same problem: Traditional hashing forces massive data movement.

Consistent hashing: Moves only a tiny fraction of data when nodes change.

This saves millions in operational costs and prevents service disruptions.

The Big Idea

Instead of hash values mapping directly to node numbers, we place both keys and nodes on the same "ring."

Keys find their owner by moving clockwise on the ring.

Building the Ring: Step-by-Step

Step 1: Define the Hash Space

Pick a hash function. Let us use one that outputs 0 to 15 (simplified from real 0 to 2^128).

This range forms a circle:

        0
   15       1
14             2
13             3
  12       4
     11 5
    10   6
      9 7
        8

Numbers wrap around: After 15 comes 0 again. Hence, "ring."

Step 2: Place Nodes on the Ring

Each storage node gets hashed and placed on the ring.

Node 0: hash(Node_0_IP) = 3 → Place at position 3

Node 1: hash(Node_1_IP) = 12 → Place at position 12

Node 2: hash(Node_2_IP) = 8 → Place at position 8

Ring now looks like:

Positions: 0, 1, 2, [3-Node0], 4, 5, 6, 7, [8-Node2], 9, 10, 11, [12-Node1], 13, 14, 15

Step 3: Place Keys on the Ring

User wants to store key K1.

Hash K1: hash("K1") = 0 → Place at position 0

Who owns K1?

From position 0, move clockwise until you hit a node.

First node clockwise from 0? Node 0 at position 3.

K1 is owned by Node 0.

More Examples

Key K2: hash("K2") = 10 → Position 10

Move clockwise from 10 → First node is Node 1 at position 12

K2 owned by Node 1.

Key K3: hash("K3") = 5 → Position 5

Move clockwise from 5 → First node is Node 2 at position 8

K3 owned by Node 2.

Key K4: hash("K4") = 14 → Position 14

Move clockwise from 14 → Pass 15, wrap to 0, 1, 2, hit Node 0 at position 3

K4 owned by Node 0.

Ownership Summary

Node 0 (position 3): Owns keys in range [13, 14, 15, 0, 1, 2]
Node 2 (position 8): Owns keys in range [4, 5, 6, 7]
Node 1 (position 12): Owns keys in range [9, 10, 11]

Each node owns the keys between itself and the previous node (moving counter-clockwise).

The Critical Insight

Ownership is determined by position on the ring, not by the number of nodes.

When you add or remove a node, only keys near that position are affected.

Most keys stay with their original owners.

What Consistent Hashing Actually Is

Not a service. Not a magical distributed system component.

It is a function: Given a key, return which node owns it.

Implementation: A sorted array of nodes. Binary search to find the owner. That is it.

The proxy (the component routing requests) calls this function:

node = consistent_hash.get_owner("K1")
send_request_to(node)

Simple.

Why the Ring Visualization?

Visually intuitive: Easy to see how keys and nodes relate.

Wrapping behavior: After the last node comes the first node again.

But in code? Just a sorted array and binary search. No linked list. No actual ring structure.

The ring is a mental model, not an implementation detail.

What Consistent Hashing Actually Does

Myth: Consistent hashing is a distributed service that magically handles data movement.

Reality: It is a function. A simple function that answers one question.

Question: "Given key K, which node owns it?"

Answer: "Node N."

That is all it does.

Where the Function Lives

In the proxy/router component.

When a request comes in:

user sends: GET key "K1"
  ↓
proxy calls: node = consistent_hash.get_owner("K1")
  ↓
proxy forwards request to that node
  ↓
node returns value
  ↓
proxy sends response to user

No separate service. No external calls. Just a local function.

The Implementation Truth

Visualized as a ring for understanding.

Implemented as a sorted array.

Example:

nodes = [
  {position: 3, node: Node_0},
  {position: 8, node: Node_2},
  {position: 12, node: Node_1}
]

Finding owner:

key_position = hash("K1") = 0

Binary search in nodes array for first position >= 0
→ Found position 3 → Node_0

Return Node_0

Time complexity: O(log N) where N = number of nodes.

Fast. Simple. No magic.

Amazon DynamoDB: The Real Deal

Problem: Distributed NoSQL . Petabytes of data. Thousands of nodes.

Solution: Consistent hashing for .

How it works:

Each table partitioned by hash of partition key.

Partitions distributed across nodes using consistent hashing.

When scaling:

Add node → Only adjacent partition moves.

Remove node → Data migrates to next node.

Result: DynamoDB can scale to millions of requests/second without downtime.

Netflix: Content Delivery at Scale

Problem: 200 million users, thousands of edge worldwide.

Solution: Consistent hashing for routing video content.

How it works:

Each video file identified by unique ID.

Edge servers placed on consistent hash ring.

User requests video → System hashes video ID → Routes to nearest edge server on ring.

When server fails:

Hash ring updated → Affected videos reroute to next server.

Only that server range affected. Rest of system unaffected.

Memcached / Redis Cluster: Distributed

Problem: Cache too large for one server. Need to distribute.

Solution: Consistent hashing to partition cache keys.

How it works:

Application hashes cache key → Determines which cache server to query.

Example:

key = "user:12345:profile"
hash(key) = 8
→ Query cache server at position 8 on ring

When cache server added:

Only keys in affected range move to new server.

Cache hit rate barely affected.

Content Delivery Networks (CDNs)

Problem: Billions of files, thousands of servers globally.

Solution: Consistent hashing to distribute content.

How it works:

File hashed → Stored on server owning that hash range.

User requests file → CDN routes to server owning that file.

When server added/removed:

Minimal content redistribution.

Users see no disruption.

The Common Pattern

All these systems:

Need to distribute data/requests across many servers
Servers frequently added/removed (auto-scaling, failures)
Cannot afford massive data movement on every change
Use consistent hashing to minimize disruption

What Consistent Hashing Does NOT Do

Does not transfer data for you.

You must implement data movement logic.

Does not detect failures.

You need health checks and .

Does not balance load perfectly.

Some nodes may get more keys than others (virtual nodes help with this).

Does not store anything.

It is stateless. Just a function call.

What You Must Implement

Data : Consistent hashing tells you primary owner. You decide replication strategy.

Failure handling: When node fails, you must trigger data movement to next node.

Monitoring: Track data distribution, detect hot spots.

: Use virtual nodes (multiple positions per physical node) for even distribution.

Virtual Nodes: The Load Balancing Trick

Problem: With few nodes, distribution can be uneven.

Example: Node 0 owns range 13-3 (large). Node 1 owns range 9-12 (small).

Solution: Each physical node placed at multiple positions on ring.

Example:

Node 0 at positions: 3, 7, 15
Node 1 at positions: 5, 10, 12
Node 2 at positions: 1, 8, 14

Result: More even distribution of keys across physical nodes.

This is what production systems use.

The Core Takeaway

Consistent hashing is not magic.

It is an elegant algorithm that solves one specific problem: minimizing data movement when cluster size changes.

It is a tool in your distributed systems toolbox.

Use it when you need to distribute data across many nodes and scale frequently.

But remember: You still need to implement the operational aspects—replication, failure handling, monitoring.

Consistent hashing just tells you who owns what. The rest is up to you.

EXERCISE

Minimal Data Movement

See the magic: adding or removing nodes affects only a small fraction of keys. This is why consistent hashing revolutionized distributed systems.

The Original Setup

Ring with 3 nodes (positions 0-15):

Node 0 at position 3
Node 2 at position 8  
Node 1 at position 12

Keys distributed:

K1 at position 0 → owned by Node 0
K2 at position 10 → owned by Node 1
K3 at position 5 → owned by Node 2
K4 at position 14 → owned by Node 0
K5 at position 7 → owned by Node 2
K6 at position 11 → owned by Node 1

Scaling Up: Adding Node 3

Traffic increased. Need to add Node 3.

Step 1: Hash the new node

hash(Node_3_IP) = 1 → Place Node 3 at position 1

Step 2: Update the ring

Node 0 at position 3
Node 2 at position 8
Node 1 at position 12
Node 3 at position 1 (NEW)

Step 3: Determine affected keys

Only keys between Node 3 (position 1) and the previous node (Node 1 at position 12) are affected.

Range affected: Positions 13, 14, 15, 0

Save

DevLoom

DevLoom

Consistent Hashing

Consistent Hashing

Lesson Contents

References

From Course

Share

Topics

The Problem: When Servers Change, Data Moves

The Hash Ring Concept

How Real Systems Use This

The Distributed Storage Scenario

Hash-Based Routing: The Simple Approach

Why This Matters: Statefulness

The Disaster: Removing a Node

What Happens to Your Data?

The Expensive Rebalancing Process

The Real-World Impact

The Core Problem

The Million Dollar Question

Why Companies Need This

The Big Idea

Building the Ring: Step-by-Step

Step 2: Place Nodes on the Ring

Step 3: Place Keys on the Ring

More Examples

Ownership Summary

The Critical Insight

What Consistent Hashing Actually Is

Why the Ring Visualization?

What Consistent Hashing Actually Does

Where the Function Lives

The Implementation Truth

Amazon DynamoDB: The Real Deal

Netflix: Content Delivery at Scale

Memcached / Redis Cluster: Distributed Caching

Content Delivery Networks (CDNs)

The Common Pattern

What Consistent Hashing Does NOT Do

What You Must Implement

Virtual Nodes: The Load Balancing Trick

The Core Takeaway

Minimal Data Movement

The Original Setup

Scaling Up: Adding Node 3

Data Movement Required

Operational Steps

Scaling Down: Removing Node 0

Data Movement Required

The Real-World Scale

Why This Matters

The Elegant Property

Memcached / Redis Cluster: Distributed