Aarshdeep Singh Chadha

Posted on Dec 4

Database Sharding and Partitioning

#database #systemdesign #webdev #programming

As businesses scale and their applications attract more users, managing database performance becomes critical. Two key techniques often employed to address this challenge are sharding and partitioning. This blog will dive deep into these concepts, their differences, and their practical applications.

What Is Database Sharding?

Sharding is a method of distributing data across multiple machines. When you shard a database, you divide the data into smaller, more manageable chunks, called shards, each of which is stored on a separate database server. This approach helps distribute the workload and prevents any single server from becoming a bottleneck.

Advantages of Sharding:

Handles Large Reads and Writes: Distributes the load across multiple servers, improving performance.
Increases Overall Storage Capacity: Each shard adds its own storage capacity, enabling scalability.
Higher Availability: Failure in one shard doesn’t affect the others, improving system reliability.

Disadvantages of Sharding:

Operational Complexity: Managing multiple shards requires careful design and operational expertise.
Cross-Shard Queries: Queries spanning multiple shards can be expensive and slower.

What Is Database Partitioning?

Partitioning refers to splitting a subset of data within the same database instance. Unlike sharding, partitioning doesn’t distribute data across multiple machines but organizes it logically or physically within a single server.

Types of Partitioning:

Horizontal Partitioning: Divides data rows across tables. For example, customer records with IDs 1–1000 might go into one partition, while IDs 1001–2000 go into another.
Vertical Partitioning: Splits tables by columns. For instance, user profile information might be in one partition, and login details in another.

A Real-World Example: Scaling Your Database

Let’s consider a scenario where you have a database hosted on a server exposed via a port. Users access this database for your application’s operations.

Initial State:

Your server handles 200 WPS (Writes Per Second) efficiently.
All operations run smoothly.

Traffic Spike:

Usage increases, and your database now experiences 500 WPS.
The increased traffic slows down your system.

Step 1: Vertical Scaling

You decide to improve the server’s resources by:

Adding More RAM and Disk Space: This is known as vertical scaling.
Additionally, you add a read replica to handle the increased number of reads.

Why Vertical Scaling May Fail

Vertical scaling involves enhancing the hardware capabilities of a single server, such as adding more RAM, CPU, or storage. While this can provide immediate relief for increasing traffic, it has inherent limitations:

Finite Hardware Capacity: Every machine has a physical limit to how much hardware can be added.
Cost Inefficiency: High-end hardware can be significantly more expensive.
Single Point of Failure: If the server goes down, the entire database becomes unavailable.
Diminishing Returns: Beyond a certain point, the performance gains from additional hardware are marginal.

Pros of Vertical Scaling:

Simple Implementation: Easier to implement compared to horizontal scaling.
No Data Redistribution: No need to redesign the database architecture.
Faster Read/Write Operations: Increased resources directly improve server performance.

Cons of Vertical Scaling:

Limited Scalability: Restricted by hardware limits.
Downtime Required: Upgrading hardware often requires taking the server offline.
High Cost: Advanced hardware configurations can be expensive.

Step 2: Horizontal Scaling

When vertical scaling is insufficient, you turn to horizontal scaling:

For instance, at 2000 WPS, you add another server.
The load is divided: each server handles 1000 WPS.
When you add a data node to the system, it is referred to as a shard.

Why Horizontal Scaling Works

Horizontal scaling involves adding more servers to distribute the load. Each server, or shard, handles a subset of the total data, enabling the system to process higher traffic.

Pros of Horizontal Scaling:

Virtually Unlimited Scalability: Additional servers can be added as needed.
Fault Tolerance: Failure of one server doesn’t affect the entire system.
Cost-Effective: Commodity hardware can often be used instead of high-end servers.

Cons of Horizontal Scaling:

Complexity: Requires rearchitecting the database and managing multiple servers.
Data Distribution Challenges: Properly distributing data across shards is critical to performance.
Cross-Node Communication Overhead: Queries spanning multiple servers can slow down performance.

A Different Example of Horizontal Scaling:

Imagine an e-commerce platform experiencing a surge in traffic during a sale. To handle the increased load:

The product catalog is distributed across multiple shards. For example, one server handles products A-M, while another handles N-Z.
User sessions are load-balanced across multiple application servers.
Each shard contains only the relevant portion of data, ensuring quick read/write operations.

Combining Sharding and Partitioning

In practice, sharding and partitioning are complementary.

Sharding is used to distribute data across multiple servers.
Partitioning organizes data within each shard for better performance and query optimization.

Conclusion

As your application scales, understanding and implementing database sharding and partitioning can help maintain performance and reliability. While these techniques offer significant benefits, they also come with operational challenges. Designing an optimal strategy depends on your application’s specific needs, traffic patterns, and growth trajectory.

By leveraging the strengths of sharding and partitioning, you can create a robust, scalable database architecture capable of handling increasing user demands effectively.

Reference Links:
1.https://www.reddit.com/r/webdev/comments/11gb7g9/whats_the_difference_between_sharding_and/
2.https://www.macrometa.com/distributed-data/sharding-vs-partitioning
3.https://www.youtube.com/watch?v=wXvljefXyEo
4.https://stackoverflow.com/questions/20771435/database-sharding-vs-partitioning
5.https://hazelcast.com/glossary/sharding/

DEV Community