Partitioning and Sharding | System Design Interview Concept

For more comprehensive system design content and to try out the design questions yourself, check out System Design School.

What is Partitioning?

Imagine your business is growing from 0 to millions of users, and your database is filling up, running out of space quickly. What do you do? You have two choices:

  • make the machine and database bigger, a.k.a. vertical scaling.
  • slice the data into pieces and save it into multiple databases and, possibly, multiple machines, a.k.a. horizontal scaling.

Vertical scaling can only get you so far. You are limited by the memory and disk size of a single machine. And if something goes wrong with that machine, you are toast. That's why horizontal scaling is often more desirable for large web applications.

We can partition a large database into smaller databases (shards) according to certain rules. Naturally, there are two ways to partition the database: by row or by column into smaller databases.

https://algomonster.s3.us-east-2.amazonaws.com/system_design/partition-types.png

Database Sharding

Horizontal database partitioning, or sharding, is the most commonly used partitioning method in SQL databases. Many modern databases have built-in sharding systems.

A shard key is selected to determine which shard a data row should go into. Here is a figure from MySQL's official documentation on shard keys. Imagine a sales database; we can partition it by region, date, or simply by a hash of customer ids.

https://algomonster.s3.us-east-2.amazonaws.com/system_design/partition-2.gif

In the following sections, we will demonstrate the concept of sharding with a SQL database and a NoSQL database.

Sharding a SQL database - MySQL(MariaDB)

To understand how data partitioning works with a SQL database, let's try a real example with MySQL.

Suppose we have a comment system with a table of comments. We can partition the table by the year the comment was created. This method is called RANGE Partitioning.

CREATE TABLE comments
  (
  comment_id INT NOT NULL,
  page_id INT NOT NULL,
  user_id INT NOT NULL,
  content TEXT NOT NULL,
  created_time DATETIME NOT NULL
  )
PARTITION BY RANGE (year(created_time))
  (
  PARTITION pold VALUES LESS THAN (2019),
  PARTITION p19 VALUES LESS THAN (2020),
  PARTITION p20 VALUES LESS THAN (2021),
  PARTITION p21 VALUES LESS THAN (2022),
  PARTITION p22 VALUES LESS THAN (2023)
  );

Then insert 2 comments, the first comment is from 2009 and the second comment is from 2021.

INSERT INTO `comments` (
  `comment_id`, `page_id`, `user_id`,
  `content`, `created_time`
)
VALUES
  (
    '1', '1', '1',
    'The Times 03/Jan/2009 Chancellor on brink of second bailout for banks',
    '2009-01-03 18:15:05'
  ),
  (
    '2', '2', '2',
    'Hello algo.monster',
    '2021-10-11 18:45:02'
  )

To check the result, we can use the following queries.

SELECT * FROM comments;
SELECT * FROM comments PARTITION (pold);
SELECT * FROM comments PARTITION (p21);

The first query should print all 2 comment. The second query shows the comment from 2009, and the third query shows the comment from 2021. You can try the code here http://sqlfiddle.com/#!9/fccec/3.

Sharding a NoSQL database - Redis Cluster

Redis is a key-value NoSQL database with built-in sharding support. Redis' official website has a nice tutorial on Redis Cluster.

For scalability, Redis Cluster distributes data to nodes by the CRC16 of the key modulo 16384, so each node is responsible for a subset of hash slots.

For redundancy, Redis Cluster uses replica nodes to store copies of the data in master nodes as failovers.

Here are the bash commands for running a minimal cluster with three masters and three replicas from the tutorial.

mkdir redis-cluster
cd redis-cluster
mkdir 7000 7001 7002 7003 7004 7005

# create redis config files
cat >> ./7000/redis.conf << EOF
port 7000
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
EOF

cat >> ./7001/redis.conf << EOF
port 7001
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
EOF

cat >> ./7002/redis.conf << EOF
port 7002
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
EOF

cat >> ./7003/redis.conf << EOF
port 7003
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
EOF

cat >> ./7004/redis.conf << EOF
port 7004
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
EOF

cat >> ./7005/redis.conf << EOF
port 7005
daemonize yes
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
EOF

# start redis servers
cd 7000 && redis-server redis.conf && cd ..
cd 7001 && redis-server redis.conf && cd ..
cd 7002 && redis-server redis.conf && cd ..
cd 7003 && redis-server redis.conf && cd ..
cd 7004 && redis-server redis.conf && cd ..
cd 7005 && redis-server redis.conf && cd ..

# create redis cluster
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1

You should see the message [OK] All 16384 slots covered after everything is done.

>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 91cf3c5c135cf17c0202ac0041a1842d69633d13 127.0.0.1:7000
  slots:[0-5460] (5461 slots) master
  1 additional replica(s)
M: b1d224ff349351dcc2b770e21226cf35697fffbd 127.0.0.1:7002
  slots:[10923-16383] (5461 slots) master
  1 additional replica(s)
S: fef256a968b814eedaf65a0a4f397e97807d1f16 127.0.0.1:7003
  slots: (0 slots) slave
  replicates b1d224ff349351dcc2b770e21226cf35697fffbd
S: 8dd78cf46e709045ef0b72dddf49ac4f91407309 127.0.0.1:7005
  slots: (0 slots) slave
  replicates 063b3ed1f0ea6888401025cc37f24291a631fc65
S: 1a82aa2d68becbabb878913fbd380f2d288ac053 127.0.0.1:7004
  slots: (0 slots) slave
  replicates 91cf3c5c135cf17c0202ac0041a1842d69633d13
M: 063b3ed1f0ea6888401025cc37f24291a631fc65 127.0.0.1:7001
  slots:[5461-10922] (5462 slots) master
  1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Then we can use redis-cli -c -p 7000 to connect to the first node of the cluster.

$ redis-cli -c -p 7000
127.0.0.1:7000> set algo monster
OK
127.0.0.1:7000> set hello world
OK
127.0.0.1:7000> set foo bar
-> Redirected to slot [12182] located at 127.0.0.1:7002
OK
127.0.0.1:7002> set a a
OK
127.0.0.1:7002> set hello hello
-> Redirected to slot [866] located at 127.0.0.1:7000
OK

At first, both algo and hello fit in [0-5460], the hash slots of the node 7000. Then foo has hash 12182, which is covered by the node 7002, so we are redirected and the prompt is changed. When we set hello again, we are redirected back to the node 7000 because that's where the key hello is stored.

As the dataset grows, we might need to add new nodes to provide more storage. In this case, we need to reshard the cluster since we need to rearrange the hash slots.

Invest in Yourself
Your new job is waiting. 83% of people that complete the program get a job offer. Unlock unlimited access to all content and features.
Go Pro
Favorite (idle)