Dividing a database into smaller, manageable pieces that can be distributed across multiple servers for better performance and scalability.
Partitioning splits a large database table into smaller pieces (partitions) while keeping the logical structure intact. Each partition contains a subset of data but applications still see one unified table.
Think of it like organizing a massive library into sections - fiction, non-fiction, reference. Finding books becomes faster because you only search relevant sections.
Performance: Queries search smaller datasets. Faster execution.
Manageability: Backup, restore, and maintain smaller pieces instead of massive tables.
Scalability: Distribute partitions across multiple servers. Handle more data and traffic.
Maintenance: Archive old partitions without affecting active data.
Horizontal Partitioning (Sharding): Split rows across partitions. Each partition has all columns but different rows.
Example: Users 1-10000 in Partition A, Users 10001-20000 in Partition B.
Vertical Partitioning: Split columns across partitions. Each partition has all rows but different columns.
Example: User IDs and emails in Partition A, User profiles and photos in Partition B.
Horizontal partitioning is far more common because it handles growth better.
No related topics found.
Range Partitioning: Split by value ranges.
Example: Orders from 2023 in Partition 1, Orders from 2024 in Partition 2.
Pros: Simple, intuitive for time-based data. Cons: Can create uneven distribution. Recent partitions get all writes.
Hash Partitioning: Hash key determines partition.
Example: Hash user ID modulo 4. Results 0-3 map to Partitions 0-3.
Pros: Even distribution. Cons: Cannot efficiently query ranges. Adding partitions requires data movement.
List Partitioning: Explicitly list values for each partition.
Example: USA users in Partition 1, Europe in Partition 2, Asia in Partition 3.
Pros: Control over distribution. Natural for geographic or categorical data. Cons: Manual management as data grows.
Composite Partitioning: Combine strategies.
Example: First partition by region (list), then by date (range) within each region.
Good Partition Key: Evenly distributes data, aligns with common query patterns, rarely changes.
Bad Partition Key: Creates hot spots (one partition gets all traffic), requires frequent rebalancing.
Example Good Keys: User ID (hash), Created date (range), Geographic region (list).
Example Bad Keys: Boolean fields (only 2 partitions), Status fields that change frequently.
Partitioning: Logical division, can be on same server.
Sharding: Physical division across multiple servers.
Sharding is partitioning taken to distributed systems. Terms often used interchangeably.
Partition Pruning: Database queries only relevant partitions.
Query: SELECT * FROM orders WHERE order_date = '2024-01-15'
If partitioned by date, database checks only 2024-01 partition. Skips others.
Cross-Partition Queries: Queries spanning multiple partitions are slower.
Query: SELECT * FROM orders WHERE total > 1000
Must check all partitions. Avoid these when possible.
Rebalancing: Adding partitions means moving data. Expensive operation.
Cross-Partition Joins: Joining data across partitions is slow and complex. Denormalize data to avoid this.
Transactions: Distributed transactions across partitions add complexity.
Complexity: Applications may need partition-awareness for optimal performance.
YouTube: Video metadata partitioned by video ID. Billions of videos distributed across thousands of partitions.
Uber: Trip data partitioned geographically. Trips in San Francisco on different servers than trips in New York.
E-commerce: Order history partitioned by date. Archive old partitions to cold storage.
Large Tables: Tables with hundreds of millions or billions of rows.
Performance Degradation: Queries becoming slow despite indexes.
Known Access Patterns: Most queries filter by specific columns (time, region, user).
Not Needed: Tables with millions of rows often do not need partitioning. Proper indexing usually sufficient.
Manual: Developer creates partitions and manages them.
Automatic: Database creates partitions automatically based on rules.
Modern databases support automatic partitioning. PostgreSQL, MySQL, MongoDB all offer this.
Cosmos DB: Automatic partitioning based on partition key.
DynamoDB: Hash and range key partitioning built-in.
BigQuery: Table partitioning by date for analytics.
Cloud databases handle complexity. Specify partition key, they manage the rest.
Choose partition key carefully: Hard to change later.
Monitor partition size: Detect imbalanced partitions early.
Align with queries: Partition by columns frequently used in WHERE clauses.
Test before production: Partitioning adds complexity. Validate performance gains.
Document strategy: Future developers need to understand partitioning scheme.
Partitioning is a powerful technique for scaling databases but adds complexity. Exhaust simpler options first - indexing, caching, read replicas.
When tables grow to hundreds of millions of rows and queries slow despite optimization, partitioning becomes necessary.
Choose partition keys aligned with query patterns. Design applications aware of partition boundaries. Monitor and maintain partitions over time.