Partitioning and Sharding | System Design Interview Concept

What is Partitioning?

Imagine your business is growing from 0 to millions of users, your database is filled up and you are running out of space quickly. What do you do? You have two choices

  • make the machine and database bigger, aka vertical scaling
  • slice the data into pieces and save into multiple databases and possibly multiple machines, aka horizontal scaling

Vertical scaling can only get you so far. You are limited by the memory and disk size of a single machine. And if something goes wrong with that machine, you are toast. That's why horizontal scaling is often more desirable for large web applications.

We can partition a large database into smaller databases(shards) according to certain rules. Naturally there are two ways to partition the database. We can partition the data by row or by column into smaller databases:

https://algomonster.s3.us-east-2.amazonaws.com/system_design/partition-types.png

Database Sharding

Horizontal database partition or sharding is the mostly commonly used partitioning method in SQL databases. Many modern databases have built-in sharding system.

A shard key is selected to decide which shard a data row should go into. Here's is a figure from MySQL's official documentation on shard key. Imagine a sales database, we can partition it by region, date or simply by hash of customer ids.

https://algomonster.s3.us-east-2.amazonaws.com/system_design/partition-2.gif

In the following sections, we will demo the concept of sharding with a SQL database and a NoSQL database.

Sharding a SQL database - MySQL(MariaDB)

To understand how data partitioning works with SQL database, let's try an real example with MySQL.

Suppose we have a comment system with a table of comments. We can partition the table by the year the comment was created. This method is called RANGE Partitioning.

1CREATE TABLE comments
2  (
3  comment_id INT NOT NULL,
4  page_id INT NOT NULL,
5  user_id INT NOT NULL,
6  content TEXT NOT NULL,
7  created_time DATETIME NOT NULL
8  )
9PARTITION BY RANGE (year(created_time))
10  (
11  PARTITION pold VALUES LESS THAN (2019),
12  PARTITION p19 VALUES LESS THAN (2020),
13  PARTITION p20 VALUES LESS THAN (2021),
14  PARTITION p21 VALUES LESS THAN (2022),
15  PARTITION p22 VALUES LESS THAN (2023)
16  );

Then insert 2 comments, the first comment is from 2009 and the second comment is from 2021.

1INSERT INTO `comments` (
2  `comment_id`, `page_id`, `user_id`,
3  `content`, `created_time`
4)
5VALUES
6  (
7    '1', '1', '1',
8    'The Times 03/Jan/2009 Chancellor on brink of second bailout for banks',
9    '2009-01-03 18:15:05'
10  ),
11  (
12    '2', '2', '2',
13    'Hello algo.monster',
14    '2021-10-11 18:45:02'
15  )

To check the result, we can use the following queries.

1SELECT * FROM comments;
2SELECT * FROM comments PARTITION (pold);
3SELECT * FROM comments PARTITION (p21);

The first query should print all 2 comment. The second query shows the comment from 2009, and the third query shows the comment from 2021. You can try the code here http://sqlfiddle.com/#!9/fccec/3.

In the real world, you would probably be

Sharding a NoSQL database - Redis Cluster

Redis is a key-value NoSQL database with builtin sharding support. Redis official website has a nice tutorial about Redis Cluster.

For scalability, Redis Cluster distribute data to nodes by the CRC16 of the key modulo 16384, so each node is responsible for a subset of the hash slots.

For redundancy, Redis Cluster simply use replica nodes to store copies the data in master nodes as failovers.

Here are the bash commands for running a minimal cluster with three masters and three replicas from the tutorial.

1mkdir redis-cluster
2cd redis-cluster
3mkdir 7000 7001 7002 7003 7004 7005
4
5# create redis config files
6cat >> ./7000/redis.conf << EOF
7port 7000
8daemonize yes
9cluster-enabled yes
10cluster-config-file nodes.conf
11cluster-node-timeout 5000
12appendonly yes
13EOF
14
15cat >> ./7001/redis.conf << EOF
16port 7001
17daemonize yes
18cluster-enabled yes
19cluster-config-file nodes.conf
20cluster-node-timeout 5000
21appendonly yes
22EOF
23
24cat >> ./7002/redis.conf << EOF
25port 7002
26daemonize yes
27cluster-enabled yes
28cluster-config-file nodes.conf
29cluster-node-timeout 5000
30appendonly yes
31EOF
32
33cat >> ./7003/redis.conf << EOF
34port 7003
35daemonize yes
36cluster-enabled yes
37cluster-config-file nodes.conf
38cluster-node-timeout 5000
39appendonly yes
40EOF
41
42cat >> ./7004/redis.conf << EOF
43port 7004
44daemonize yes
45cluster-enabled yes
46cluster-config-file nodes.conf
47cluster-node-timeout 5000
48appendonly yes
49EOF
50
51cat >> ./7005/redis.conf << EOF
52port 7005
53daemonize yes
54cluster-enabled yes
55cluster-config-file nodes.conf
56cluster-node-timeout 5000
57appendonly yes
58EOF
59
60# start redis servers
61cd 7000 && redis-server redis.conf && cd ..
62cd 7001 && redis-server redis.conf && cd ..
63cd 7002 && redis-server redis.conf && cd ..
64cd 7003 && redis-server redis.conf && cd ..
65cd 7004 && redis-server redis.conf && cd ..
66cd 7005 && redis-server redis.conf && cd ..
67
68# create redis cluster
69redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1

You should see a message [OK] All 16384 slots covered after everything is done.

1>>> Performing Cluster Check (using node 127.0.0.1:7000)
2M: 91cf3c5c135cf17c0202ac0041a1842d69633d13 127.0.0.1:7000
3  slots:[0-5460] (5461 slots) master
4  1 additional replica(s)
5M: b1d224ff349351dcc2b770e21226cf35697fffbd 127.0.0.1:7002
6  slots:[10923-16383] (5461 slots) master
7  1 additional replica(s)
8S: fef256a968b814eedaf65a0a4f397e97807d1f16 127.0.0.1:7003
9  slots: (0 slots) slave
10  replicates b1d224ff349351dcc2b770e21226cf35697fffbd
11S: 8dd78cf46e709045ef0b72dddf49ac4f91407309 127.0.0.1:7005
12  slots: (0 slots) slave
13  replicates 063b3ed1f0ea6888401025cc37f24291a631fc65
14S: 1a82aa2d68becbabb878913fbd380f2d288ac053 127.0.0.1:7004
15  slots: (0 slots) slave
16  replicates 91cf3c5c135cf17c0202ac0041a1842d69633d13
17M: 063b3ed1f0ea6888401025cc37f24291a631fc65 127.0.0.1:7001
18  slots:[5461-10922] (5462 slots) master
19  1 additional replica(s)
20[OK] All nodes agree about slots configuration.
21>>> Check for open slots...
22>>> Check slots coverage...
23[OK] All 16384 slots covered.

Then we can use redis-cli -c -p 7000 to connect to the first node of the cluster.

1$ redis-cli -c -p 7000
2127.0.0.1:7000> set algo monster
3OK
4127.0.0.1:7000> set hello world
5OK
6127.0.0.1:7000> set foo bar
7-> Redirected to slot [12182] located at 127.0.0.1:7002
8OK
9127.0.0.1:7002> set a a
10OK
11127.0.0.1:7002> set hello hello
12-> Redirected to slot [866] located at 127.0.0.1:7000
13OK

At first, both algo and hello fit in [0-5460], the hash slots of the node 7000. Then foo has hash 12182, which is covered by the node 7002, so we are redirected and the prompt is changed. When we set hello again, we are redirected back to the node 7000 because that's where the key hello is stored.

As the dataset grows, we might need to add new nodes to provide more storage. In this case, we need to reshard the cluster since we need to rearrange the hash slots.