← Back to articles

Error Handling

· 3 min read · backend · ... views
Share: Y
On this page

Things break. Databases go down. External APIs timeout. Users send garbage data. Memory runs out. Network connections drop. The question isn’t IF your system will encounter errors, it’s how gracefully it handles them when it does.

Types of Errors

Operational errors, expected failures that happen in normal operation:

  • Network timeouts
  • Database connection lost
  • File not found
  • Invalid user input
  • Third-party API returning 500

These are not bugs. They’re inevitable. Your code should handle them gracefully.

Programmer errors, actual bugs:

  • Undefined variable
  • Wrong type passed to function
  • Logic errors

These should crash the process (in development) so you notice and fix them. In production, they should be caught by a global error handler.

The Error Handling Strategy

Error Handling Layers

Layer 1: Local handling Handle errors where they occur. Database query failed? Retry it. File not found? Return a 404.

Layer 2: Propagation If you can’t handle it locally, propagate it up. Throw an exception or return an error. Let the calling code decide.

Layer 3: Global error handler A catch-all at the top of your application. Any unhandled error ends up here. Log it, return a generic 500 to the user, alert the team.

Building Fault-Tolerant Systems

Fault Tolerance

Retries with backoff If a request fails, try again. But don’t spam, wait 1 second, then 2, then 4, then 8. Exponential backoff prevents you from overwhelming a struggling service.

Timeouts Never wait forever. Set timeouts on every external call. If the payment API doesn’t respond in 5 seconds, fail fast and tell the user to try again.

Circuit breakers If a service fails 10 times in a row, stop calling it for 30 seconds. Give it time to recover. This prevents cascade failures.

Fallbacks If the recommendation engine is down, show popular items instead. Degraded experience is better than no experience.

Graceful degradation The system keeps working with reduced functionality rather than crashing entirely. Netflix still works even if the recommendations service is down, you just see a generic homepage.

Error Responses Done Right

For your API consumers:

{
  "error": {
    "code": "RESOURCE_NOT_FOUND",
    "message": "User with ID 42 not found",
    "timestamp": "2024-01-15T10:30:00Z",
    "requestId": "req_abc123"
  }
}

Include a request ID so you can correlate it with your logs. Never expose stack traces or internal details to users.

Wrapping Up

  • Expect failures. Design for them.
  • Handle operational errors gracefully, crash on programmer errors
  • Use retries, timeouts, circuit breakers, and fallbacks
  • Global error handler catches everything that slips through
  • Structured error responses help API consumers
  • Always include request IDs for debugging

Day 14 of 95 | Backend Engineering Series

Enjoyed this article?
Share: Y

Get new articles in your inbox

No spam. Unsubscribe anytime.

Get in touch

Have a question, feedback, or just want to say hi? Reach out.