Build a Python Load Balancer: Step-by-Step Guide

Okay, the title might have revealed what we are going to do in this article. If you’ve come here, then I think you’re also someone like me who always wonders how these things work.

So, let’s not bore you with all the talk and let’s start doing what we want to do here.

While learning system design, I came across the concept of a load balancer. When I read more about it, it sounded very simple, so I thought, let’s build one. Trust me, that was my biggest mistake that day.

To give some context to those who don’t know what a load balancer is: "A load balancer is a server that spreads the workload across your other servers, so no single server gets overloaded."

Let's understand this with a simple example of a food truck. Imagine you own a popular food truck, but you only have one person taking orders. This setup works well if you have a few customers. However, since your food truck is very popular, you start getting more and more customers. As the line grows, the wait time increases because you have only one person preparing food, and you can't serve everyone at once. To solve this problem, you hire two more people and a manager who assigns each person to a different customer, so the workload is evenly distributed. That manager acts as a load balancer.

There are different kinds of load balancers out there, such as:

Round robin - Requests are distributed sequentially to servers in a circular order.
Weighted round robin - Similar to Round Robin, but servers are assigned weights based on their capacity.
Least connection - Directs traffic to the server with the fewest active connections.

There are more types of load balancers, but to limit the scope of this article, we are only going to focus on the round robin version.

That's enough theory, I guess. Let's start implementing this because, in my opinion, this picture will become much clearer when you actually see it in action.

We are going to build our little load balancer in Python, but you can implement this in any of your favorite programming languages. The concept will always be the same regardless of the programming language.

First things first, let's start with the prerequisites:

We are building this in Python, so of course, you will need that. Install it if you do not have it already.

Now open up your terminal and run the command below.

pip install "fastapi[all]" httpx

FastAPI: Our web framework for building the load balancer and backend servers.
uvicorn: The server that will run our FastAPI applications.
httpx: A modern and asynchronous HTTP client we'll use to forward requests and perform health checks.

The load balancer’s main task is to distribute the load that comes from the servers, so let's build our servers first.

In your favorite code editor, create a new file called backend1.py and put this code in it:

from fastapi import FastAPI
import uvicorn

app = FastAPI()

@app.get("/")
def read_root():
    return {"message": "Hello from backend1"}

@app.get("/health):
    return {"status": "ok"}

if __name__ == "__main__":
     uvicorn.run(app, host="0.0.0.0", port=8001)

Now create another and name it backend2.py and put this code:

from fastapi import FastAPI
import uvicorn

app = FastAPI()

@app.get("/")
def read_root():
    return {"message": "Hello from backend2"}

@app.get("/health):
    return {"status": "ok"}

if __name__ == "__main__":
     uvicorn.run(app, host="0.0.0.0", port=8002)

What did we do?

We created two servers.
Each has a main / endpoint that will provide a unique message showing which server is responding.
Both also have a /health endpoint. Our load balancer will use this to check which server is running correctly.

Now let’s build our main load balancer:

Now for the main event, our load balancer takes the request and forwards it to the backend servers.

Create a new file named load_balancer.py:

We defined a variable called HEALTHY_BACKENDS to store the backend server that is running.
We defined a list of BACKENDS.
We created a catch-all route /{path:path} that will capture any incoming request to our load balancer.
Inside this route, we grab the next server, create an httpx client, and forward the original request (including its method, headers, and body) to that server.
We then return the backend server's response to the original client.
The health_check function checks every 10 seconds, iterating through our servers to see which server is currently alive.
lifespan manager: This is FastAPI’s way of running code on startup. We are using it to launch our health_check function in a background task that runs while the application is running.

import uvicorn
from fastapi import FastAP, Request
from fastapi.responses import JSONResponse
import httpx
import itertools


# list of currently healthy backends (updated by health check tasks)
HEALTHY_BACKENDS = []

backends = [
    "http://localhost:8001",
    "http://localhost:8002"
]

# index for round robin load balancing
server_index = 0

async def health_check():
    global HEALTHY_BACKENDS

    while True:
        current_healthy = []

        for backend in backends:
            try:
                async with httpx.AsyncClient(timeout=2) as client:
                    response = client.get(f"{backend}/health")
                    if response.status_code == 200:
                        current_healthy.append(backend)
             except httpx.RequestError:
                # Backend is unreachable or timed out - skip it
                pass

         HEALTHY_BACKENDS = current_healthy
         print(f"Healthy backends: {HEALTHY_BACKENDS}")

         await asyncio.sleep(10)

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Start the health check task in the background
    task = asyncio.create_task(health_check())
    yield
    # Cancel the task when shutting down
    task.cancel()


app = FastAPI(lifespan=lifespan)

@app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def proxy(request: Request, path: str):

    global server_index

    if not HEALTHY_BACKENDS:
        return JSONResponse(content={"error": "No healthy backends"}, status_code=503)

     # Select the next healthy backend using round-robin algorithm
    backend_url = HEALTHY_BACKENDS[server_index % len(HEALTHY_BACKENDS)]
    server_index += 1

    async with httpx.AsyncClient() as client:
        try:
            response = await client.request(
                method=request.method,
                url=f"{backend_url}/{path}",
                headers=request.headers,
                content=await request.body(),
            )
            return response.JSONResponse(
                content=response.content,
                status_code=response.status_code,
                headers=response.headers
            )

        except httpx.RequestError as e:
            return JSONResponse(content={"error": f"Backend server {backend_url} is not healthy"}, status_code=503)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

How to run and test:

Open up three terminals separately
In Terminal 1, run the below command to start our first server:

uvicorn backend1:app --port 8001

In Terminal 2, start the second backend server:
```
 uvicorn backend2:app --port 8002
```
In Terminal 3, start our load balancer:
```
 uvicorn load_balancer:app --port 8000
```
1. You should see a message in this terminal every 10 seconds, like: Healthy servers: ['http://localhost:8001', 'http://localhost:8002']

Testing the Load Balancer

Round Robin: Open your browser and go to http://localhost:8000. You should see {"message":"Hello from Backend Server 1"}. Refresh the page. You should now see {"message":"Hello from Backend Server 2"}. Refresh again, and it's back to Server 1. It works!
Health Check & Failover:
1. Go to Terminal 1 (running backend1) and press Ctrl+C to stop it.
2. Watch Terminal 3 (the load balancer). Within 10 seconds, the health check message will change to: Healthy servers: ['http://localhost:8002'].
3. Now, go back to your browser at http://localhost:8000 and refresh multiple times. Every single request now goes to Server 2. The load balancer has detected the failure and automatically redirected all traffic to the remaining healthy server.
4. Restart backend1 in Terminal 1. Within 10 seconds, the load balancer will detect it's back online, and the Healthy servers list will include both servers again. The load will once again be balanced between them.Congratulations! You've just built a smart, fault-tolerant load balancer in Python.

You can enhance the existing code by adding more features if you wish. I've provided a basic implementation of a round-robin load balancer. You can try adding more backends or implementing a different type of load balancer.

If you enjoyed the post, follow for more!

Step-by-Step Guide to Building a Load Balancer in Python

Testing the Load Balancer

Comments

More from this blog

Understanding LRU Cache: A Simple Explanation

A Guide to Time and Space Complexity in DSA

Understanding DSA: A Beginner's Guide

Ansible Introduction: A Simple Guide for Starters

Command Palette

Testing the Load Balancer

Comments

More from this blog