Database Sharding

What is Database Sharding?

Sharding splits a large database into smaller pieces (shards) stored on different servers. Each shard contains a subset of the data.

Think of it like splitting a phone book - A-M on one book, N-Z on another. Easier to search smaller books.

Single Database Limits: One database can only handle so much data and traffic before slowing down or running out of storage.

Horizontal Scaling: Sharding lets you add more servers to handle growth instead of making one server bigger.

Decide how to split data (by user ID, location, date). Store each subset on a different server. Applications know which shard to query.

Example: User IDs 1-1000 on Shard A, 1001-2000 on Shard B, 2001-3000 on Shard C.

Range-Based: Split by ranges (IDs 1-1000, 1001-2000). Simple but can create uneven distribution.

Hash-Based: Hash user ID to determine shard. Even distribution but hard to add shards later.

Geographic: Store user data in region closest to them. Great for global apps.

Directory-Based: Lookup table maps keys to shards. Flexible but adds complexity.