← Back to articles

Scaling Part 1

· 3 min read · backend · ... views
Share: Y
On this page

Your app works great with 100 users. What happens at 10,000? 100,000? A million? At some point, your single server can’t keep up. Scaling is about making your system handle more load without falling over.

Vertical vs Horizontal Scaling

Scaling Types

Vertical scaling (scale UP), give your server more power:

  • More CPU, more RAM, bigger disk
  • Simple: no code changes needed
  • Has a ceiling: you can’t buy an infinitely large machine
  • Single point of failure

Horizontal scaling (scale OUT), add more servers:

  • Run 5, 10, 50 copies of your application
  • Requires a load balancer to distribute traffic
  • No ceiling: just add more machines
  • More complex: need to handle shared state

Most production systems use horizontal scaling because it’s cheaper, more resilient, and has no upper limit.

Load Balancers

When you have multiple servers, something needs to decide which server gets each request. That’s a load balancer.

Load Balancer

Strategies:

  • Round robin, server 1, server 2, server 3, repeat
  • Least connections, send to whichever server has the fewest active requests
  • Weighted, powerful servers get more traffic
  • IP hash, same client always goes to same server (sticky sessions)

The Stateless Requirement

Here’s the catch with horizontal scaling: if user data (sessions, caches) lives in one server’s memory, you can’t send requests to any random server.

The solution: make your servers stateless. All shared data goes to external services:

  • Sessions: Redis
  • Cache: Redis
  • Files: S3
  • Database: separate server

Now any app server can handle any request. They’re interchangeable. This is why “stateless” is hammered into every backend engineering guide.

Database Scaling

Your app might scale horizontally, but the database is usually the bottleneck. Strategies:

Read replicas, one primary database handles writes, multiple replicas handle reads. Works great for read-heavy apps (most apps are).

Connection pooling, reuse connections, don’t open new ones per request.

Caching, Redis in front of the database for hot data.

Indexing, make sure your frequent queries use indexes.

Quick Performance Wins

Before you add servers, optimize what you have:

  1. Add database indexes on slow queries
  2. Use Redis to cache frequent reads
  3. Enable HTTP response compression (gzip)
  4. Use connection pooling
  5. Optimize N+1 queries (the silent killer)
  6. Serve static assets from a CDN

These alone can easily handle 10x more traffic from a single server.

Wrapping Up

  • Scale up (bigger machine) is simple but limited
  • Scale out (more machines) is the production approach
  • Load balancers distribute traffic
  • Make your app stateless for horizontal scaling
  • Optimize before scaling: indexes, caching, connection pooling
  • Database is usually the bottleneck

Day 19 of 95 | Backend Engineering Series

Enjoyed this article?
Share: Y

Get new articles in your inbox

No spam. Unsubscribe anytime.

Get in touch

Have a question, feedback, or just want to say hi? Reach out.