Load Balancer | System Design Interview Concept

For more comprehensive system design content and to try out the design questions yourself, check out System Design School.

What is a Load Balancer?

Load balancing refers to the "efficiently distributing incoming network traffic across a group of backend servers". Load balancing is a means of horizontal scaling for web applications.

Layer 4 vs Layer 7 Load Balancing

If we get into details of computer networking, there are two ways to load balance in terms of OSI networking model - Layer 4 and Layer 7 load balancing.

Layer 4 is the Transport Layer. Layer 4 load balancing is done on the packet-level and requires access to lower level routing devices. Layer 7 is Application Layer load balancing and is done entirely on the HTTP level. There are pros and cons to load balancing at each level. If you are interested, you can read more about it here.

Software engineers mostly deal with Layer 7 since web applications we develop live on Layer 7. So don't worry about it if you don't understand the lower level computer networking terminologies (that's more of the jobs of network engineers and maybe SREs). In the following sections, we will explore the Layer 7 load balancing with Nginx.

How Load Balancers Work

In the simplest terms, a load balancer sits in front of a number of web servers and distributes traffic to them according to predefined rules. The servers' addresses (IPs) and the load balancing algorithms are defined in load balancer's config file.

how load balancers work

Types of Load Balancing Algorithm

The following load balancing methods are supported in Nginx:

(default) round-robin: requests to the application servers are distributed in a round-robin fashion. Each server is assigned an equal portion of the traffic and in circular order. For example, R1(Request 1) is sent to Server 1, R2 is sent to Server 2, R3 is sent to Server 3 and R4 is sent to Server 1 (since there is no more next server we go back to the first one) so on so forth.

load balancing algorithms

least-connected: next request is assigned to the server with the least number of active connections,
ip-hash: a hash-function is used to determine what server should be selected for the next request based on the client’s IP address's hash value.

ip hash load balancing algorithm

Load Balancer Demo with Nginx

We will use a demo to illustrate how load balancer works and why we need load balancing. We will use an Nginx load balancer to distribute traffic to three web servers. If you want to try this demo on your computer, you need to install Flask, Nginx.

Here is a simple flask web server. All it does is to return Hello {port number}! as response.

# demo.py
from flask import Flask
import sys

app = Flask('demo')
port = 8000

@app.route('/')
def hello_world():
    return f'Hello, {port}!\n'

if __name__ == '__main__':
    port = sys.argv[1]
    app.run(host='127.0.0.1', port=port)

Now we can start three server instances.

python demo.py 8001
python demo.py 8002
python demo.py 8003

Next, we configure Nginx as a load balancer to distribute traffic to the three web servers. We create a new config file /etc/nginx/conf.d/demo.conf

upstream demo {
    server localhost:8001;
    server localhost:8002;
    server localhost:8003;
}

server {
    listen 8080;

    location / {
      proxy_pass http://demo;
    }
}

Then use sudo systemctl restart nginx to restart the Nginx service.

Now you should be able to access http://localhost:8080/. The default load balancing method is round-robin, so the response our load balancer returns should be Hello, 8001!, Hello, 8002!, and Hello, 8003! in order.

Load Balancer and High Availability

To emulate servers going offline, we stop the web server that runs on 8002 and 8003 and visit our load balancer at http://localhost:8080/. It should be still accessible and always returns Hello, 8001!, the message returned by the first server. This is why load balancer and high availability - two web servers are down and we are still serving requests from the remaining one without the user noticing any downtime. We can scale a service by adding more web servers and put them behind load balancers.

Here is a video demo on Debian 11. Nginx Load Balancing Demo

https://www.nginx.com/resources/glossary/load-balancing/