Scaling Part 1

Your app works great with 100 users. What happens at 10,000? 100,000? A million? At some point, your single server can’t keep up. Scaling is about making your system handle more load without falling over.

Vertical vs Horizontal Scaling

Scaling Types

Vertical scaling (scale UP), give your server more power:

More CPU, more RAM, bigger disk
Simple: no code changes needed
Has a ceiling: you can’t buy an infinitely large machine
Single point of failure

Horizontal scaling (scale OUT), add more servers:

Run 5, 10, 50 copies of your application
Requires a load balancer to distribute traffic
No ceiling: just add more machines
More complex: need to handle shared state

Most production systems use horizontal scaling because it’s cheaper, more resilient, and has no upper limit.

Load Balancers

When you have multiple servers, something needs to decide which server gets each request. That’s a load balancer.

Load Balancer

Strategies:

Round robin, server 1, server 2, server 3, repeat
Least connections, send to whichever server has the fewest active requests
Weighted, powerful servers get more traffic
IP hash, same client always goes to same server (sticky sessions)

The Stateless Requirement

Here’s the catch with horizontal scaling: if user data (sessions, caches) lives in one server’s memory, you can’t send requests to any random server.

The solution: make your servers stateless. All shared data goes to external services:

Sessions: Redis
Cache: Redis
Files: S3
Database: separate server

Now any app server can handle any request. They’re interchangeable. This is why “stateless” is hammered into every backend engineering guide.

Database Scaling

Your app might scale horizontally, but the database is usually the bottleneck. Strategies:

Read replicas, one primary database handles writes, multiple replicas handle reads. Works great for read-heavy apps (most apps are).

Connection pooling, reuse connections, don’t open new ones per request.

Caching, Redis in front of the database for hot data.

Indexing, make sure your frequent queries use indexes.

Quick Performance Wins

Before you add servers, optimize what you have:

Add database indexes on slow queries
Use Redis to cache frequent reads
Enable HTTP response compression (gzip)
Use connection pooling
Optimize N+1 queries (the silent killer)
Serve static assets from a CDN

These alone can easily handle 10x more traffic from a single server.

Wrapping Up

Scale up (bigger machine) is simple but limited
Scale out (more machines) is the production approach
Load balancers distribute traffic
Make your app stateless for horizontal scaling
Optimize before scaling: indexes, caching, connection pooling
Database is usually the bottleneck

Day 19 of 95 | Backend Engineering Series