Scaling Part 1
On this page
Your app works great with 100 users. What happens at 10,000? 100,000? A million? At some point, your single server can’t keep up. Scaling is about making your system handle more load without falling over.
Vertical vs Horizontal Scaling

Vertical scaling (scale UP), give your server more power:
- More CPU, more RAM, bigger disk
- Simple: no code changes needed
- Has a ceiling: you can’t buy an infinitely large machine
- Single point of failure
Horizontal scaling (scale OUT), add more servers:
- Run 5, 10, 50 copies of your application
- Requires a load balancer to distribute traffic
- No ceiling: just add more machines
- More complex: need to handle shared state
Most production systems use horizontal scaling because it’s cheaper, more resilient, and has no upper limit.
Load Balancers
When you have multiple servers, something needs to decide which server gets each request. That’s a load balancer.

Strategies:
- Round robin, server 1, server 2, server 3, repeat
- Least connections, send to whichever server has the fewest active requests
- Weighted, powerful servers get more traffic
- IP hash, same client always goes to same server (sticky sessions)
The Stateless Requirement
Here’s the catch with horizontal scaling: if user data (sessions, caches) lives in one server’s memory, you can’t send requests to any random server.
The solution: make your servers stateless. All shared data goes to external services:
- Sessions: Redis
- Cache: Redis
- Files: S3
- Database: separate server
Now any app server can handle any request. They’re interchangeable. This is why “stateless” is hammered into every backend engineering guide.
Database Scaling
Your app might scale horizontally, but the database is usually the bottleneck. Strategies:
Read replicas, one primary database handles writes, multiple replicas handle reads. Works great for read-heavy apps (most apps are).
Connection pooling, reuse connections, don’t open new ones per request.
Caching, Redis in front of the database for hot data.
Indexing, make sure your frequent queries use indexes.
Quick Performance Wins
Before you add servers, optimize what you have:
- Add database indexes on slow queries
- Use Redis to cache frequent reads
- Enable HTTP response compression (gzip)
- Use connection pooling
- Optimize N+1 queries (the silent killer)
- Serve static assets from a CDN
These alone can easily handle 10x more traffic from a single server.
Wrapping Up
- Scale up (bigger machine) is simple but limited
- Scale out (more machines) is the production approach
- Load balancers distribute traffic
- Make your app stateless for horizontal scaling
- Optimize before scaling: indexes, caching, connection pooling
- Database is usually the bottleneck
Day 19 of 95 | Backend Engineering Series