Python Async API Error Handling: Retries, Timeouts, and Exponential Backoff

APIs fail. Servers time out. Networks drop connections. Rate limits kick in. If your async code treats every failure as permanent, it will abandon requests that would have succeeded on a second attempt. As Marc Brooker writes in the Amazon Builders' Library: retries can increase the load on a system already failing due to overload, so backoff, jitter, and a good retry policy are complementary mechanisms, and none solves the whole problem alone. This article builds retry logic with exponential backoff and jitter from scratch, explains why each layer matters, then shows how the tenacity library handles it in production.

Which Errors Are Worth Retrying

Not all errors are created equal. A 404 means the resource does not exist -- retrying will not change that. A 401 means your credentials are wrong -- sending the same request again is pointless. But a 503 means the server is temporarily overloaded, and a 429 means you hit a rate limit. Both of these are transient: wait a moment and try again, and the request will likely succeed.

Error Type Retry? Reason
HTTP 400 Bad RequestNoRequest is malformed; same request will always fail
HTTP 401 / 403NoAuthentication or authorization failure; fix credentials first
HTTP 404 Not FoundNoResource does not exist
HTTP 429 Too Many RequestsYesRate limited; wait and retry (check Retry-After header)
HTTP 500 Internal Server ErrorMaybeServer bug or transient overload; retry once or twice
HTTP 502 / 503 / 504YesGateway or server temporarily unavailable
Connection timeoutYesNetwork congestion or server slow to respond
DNS resolution failureYesDNS server may be temporarily unreachable
Connection refusedMaybeServer may be restarting; retry with longer delay

The guiding principle: retry on transient failures (network issues, rate limits, server overloads) and fail immediately on permanent errors (bad requests, missing resources, authentication failures).

Understanding httpx Exception Types

When you use httpx for async API calls, errors fall into two categories: exceptions raised before a response is received, and HTTP status errors from the response itself. The full httpx exception hierarchy places all pre-response errors under RequestError, which branches into TransportError (covering TimeoutException, NetworkError, ProtocolError, and ProxyError), DecodingError, and TooManyRedirects. The post-response error, HTTPStatusError, sits in a separate branch and is only raised when you call response.raise_for_status().

import httpx

# Network-level errors (no response received)
# httpx.TimeoutException  -- server took too long
# httpx.ConnectError      -- could not establish connection
# httpx.RequestError      -- base class for all request errors

# HTTP-level errors (response received, but status is 4xx/5xx)
# httpx.HTTPStatusError   -- raised by response.raise_for_status()

async def fetch_with_awareness(client, url):
    try:
        response = await client.get(url)
        response.raise_for_status()
        return response.json()

    except httpx.TimeoutException:
        print("Request timed out -- worth retrying")
        raise

    except httpx.ConnectError:
        print("Could not connect -- worth retrying")
        raise

    except httpx.HTTPStatusError as e:
        status = e.response.status_code
        if status == 429 or status >= 500:
            print(f"Server returned {status} -- worth retrying")
            raise
        else:
            print(f"Server returned {status} -- do not retry")
            raise

This function separates retryable conditions from permanent failures. Network-level errors (TimeoutException, ConnectError) are always worth retrying. HTTP errors depend on the status code.

Building a Retry Function from Scratch

Here is a basic async retry function that catches retryable errors and tries again a fixed number of times with a delay between attempts.

import asyncio
import httpx

RETRYABLE_STATUS_CODES = {429, 500, 502, 503, 504}

async def fetch_with_retries(client, url, max_attempts=3, delay=1.0):
    last_exception = None

    for attempt in range(1, max_attempts + 1):
        try:
            response = await client.get(url)
            response.raise_for_status()
            return response.json()

        except httpx.HTTPStatusError as e:
            last_exception = e
            if e.response.status_code not in RETRYABLE_STATUS_CODES:
                raise  # Permanent error, do not retry

            if e.response.status_code == 429:
                retry_after = float(e.response.headers.get("Retry-After", delay))
                await asyncio.sleep(retry_after)
                continue

        except (httpx.TimeoutException, httpx.ConnectError) as e:
            last_exception = e

        if attempt < max_attempts:
            print(f"Attempt {attempt} failed, retrying in {delay}s...")
            await asyncio.sleep(delay)

    raise last_exception

This works, but the fixed delay between retries is a problem. If the server is overloaded, sending retries at a constant interval does not give it time to recover. That is where exponential backoff comes in.

Why Exponential Backoff Matters

Exponential backoff increases the wait time between retries with each successive attempt. Instead of waiting 1 second every time, you wait 1 second, then 2 seconds, then 4 seconds, then 8 seconds. This gives the remote server progressively more breathing room to recover.

The formula is simple: delay = base * (2 ** attempt). With a base of 1 second, the delays are 1s, 2s, 4s, 8s, 16s. You typically cap the maximum delay to prevent absurdly long waits.

import asyncio
import httpx

RETRYABLE_STATUS_CODES = {429, 500, 502, 503, 504}
RETRYABLE_EXCEPTIONS = (httpx.TimeoutException, httpx.ConnectError)

async def fetch_with_backoff(
    client, url, max_attempts=4, base_delay=1.0, max_delay=30.0
):
    last_exception = None

    for attempt in range(max_attempts):
        try:
            response = await client.get(url)
            response.raise_for_status()
            return response.json()

        except httpx.HTTPStatusError as e:
            last_exception = e
            if e.response.status_code not in RETRYABLE_STATUS_CODES:
                raise

            if e.response.status_code == 429:
                retry_after = e.response.headers.get("Retry-After")
                if retry_after:
                    delay = float(retry_after)
                else:
                    delay = min(base_delay * (2 ** attempt), max_delay)
            else:
                delay = min(base_delay * (2 ** attempt), max_delay)

        except RETRYABLE_EXCEPTIONS as e:
            last_exception = e
            delay = min(base_delay * (2 ** attempt), max_delay)

        if attempt < max_attempts - 1:
            await asyncio.sleep(delay)

    raise last_exception
Note

Always respect the Retry-After header when present. It tells you exactly how long the server wants you to wait. Using your own backoff delay instead of the server's instruction can result in more 429 responses. Note that per RFC 9110 Section 10.2.3, the Retry-After value can be either an integer (delay in seconds) or an HTTP-date string. The code examples in this article handle the integer format; in production, parse both formats to avoid a crash when the server sends a date like Sat, 31 Oct 2026 12:30:00 GMT.

Adding Jitter to Prevent Thundering Herds

Exponential backoff has a subtle problem. If 100 clients all fail at the same time, they will all retry after 1 second, then all retry after 2 seconds, then all retry after 4 seconds -- in perfect synchronization. Each retry wave hits the server as a coordinated burst, which is exactly the load pattern that caused the failure in the first place.

Jitter solves this by adding a random component to the delay. Instead of waiting exactly 2 ** attempt seconds, each client waits a random amount between 0 and 2 ** attempt seconds. This spreads the retries across time and breaks the synchronization.

import random

def backoff_with_jitter(attempt, base_delay=1.0, max_delay=30.0):
    """Full jitter: random value between 0 and the exponential delay."""
    exponential = base_delay * (2 ** attempt)
    capped = min(exponential, max_delay)
    return random.uniform(0, capped)

This is the "full jitter" algorithm described by Marc Brooker, AWS Senior Principal Engineer, in the AWS Architecture Blog and the Amazon Builders' Library. It produces the widest spread of retry times among the three variants Brooker tested (full jitter, equal jitter, and decorrelated jitter). In tenacity, this approach maps to the wait_random_exponential strategy, which the library's own source code documents as corresponding directly to Brooker's Full Jitter algorithm.

"The return on implementation complexity of using jittered backoff is huge, and it should be considered a standard approach for remote clients." -- Marc Brooker, Exponential Backoff and Jitter, AWS Architecture Blog
Pro Tip

An alternative is "equal jitter": wait half the exponential delay plus a random amount up to half. This guarantees a minimum wait time while still providing randomization: half = capped / 2; return half + random.uniform(0, half).

Using asyncio.timeout for Overall Deadlines

Individual retries have per-attempt timeouts (set via httpx.Timeout). But you also need an overall deadline -- a maximum total time you are willing to spend on all retry attempts combined. Python 3.11 introduced asyncio.timeout() for exactly this purpose.

import asyncio
import httpx

async def fetch_with_deadline(client, url, total_timeout=15.0):
    try:
        async with asyncio.timeout(total_timeout):
            return await fetch_with_backoff(client, url, max_attempts=4)
    except TimeoutError:
        raise TimeoutError(
            f"All retry attempts for {url} exceeded {total_timeout}s deadline"
        )

The asyncio.timeout context manager wraps the entire retry loop. If the cumulative time across all attempts exceeds 15 seconds, it cancels the current task and raises TimeoutError regardless of which attempt is currently in progress. This prevents a pathological scenario where each individual attempt stays under the per-request timeout, but the total retry duration is unacceptably long.

Warning

asyncio.timeout() requires Python 3.11 or later. On earlier versions, use asyncio.wait_for(coroutine, timeout=seconds), which wraps a single awaitable rather than an arbitrary code block. The key difference: wait_for creates a new internal task and cancels it on timeout, while asyncio.timeout() cancels the current task directly and converts the resulting CancelledError into a TimeoutError. For wrapping a single coroutine, both work. For wrapping multiple await statements under one deadline, only asyncio.timeout() applies cleanly.

Using tenacity for Production Retries

Building retry logic from scratch is good for understanding the concepts. In production, the tenacity library gives you a battle-tested decorator with built-in support for exponential backoff, jitter, retry conditions, and async functions.

# pip install tenacity
import httpx
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential_jitter,
    retry_if_exception_type,
    retry_if_exception,
)

RETRYABLE_STATUS_CODES = {429, 500, 502, 503, 504}
RETRYABLE_EXCEPTIONS = (httpx.TimeoutException, httpx.ConnectError)

def is_retryable_status_error(exc):
    """Return True only if the HTTPStatusError has a retryable status code."""
    return (
        isinstance(exc, httpx.HTTPStatusError)
        and exc.response.status_code in RETRYABLE_STATUS_CODES
    )

@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=1, max=30),
    retry=(
        retry_if_exception_type(RETRYABLE_EXCEPTIONS)
        | retry_if_exception(is_retryable_status_error)
    ),
    before_sleep=lambda retry_state: print(
        f"Retry #{retry_state.attempt_number} after error: "
        f"{retry_state.outcome.exception()}"
    ),
)
async def fetch_robust(client, url):
    response = await client.get(url)
    response.raise_for_status()
    return response.json()

# Usage
async def main():
    async with httpx.AsyncClient(timeout=10.0) as client:
        data = await fetch_robust(client, "https://api.example.com/data")
        print(data)

The @retry decorator handles everything: it stops after 4 attempts, applies exponential backoff with jitter starting at 1 second and capped at 30 seconds, retries on network-level exceptions and on HTTP status errors with retryable codes (429, 5xx), and logs each retry attempt via the before_sleep callback. Note the distinction between retry_if_exception_type for matching exception classes directly and retry_if_exception for custom predicate functions -- without the custom predicate, every HTTPStatusError (including 400, 401, 404) would be retried, which is the opposite of what you want. The async function works seamlessly because tenacity detects coroutines automatically.

Common Mistake

A frequent error in tenacity-based retry code is using retry_if_exception_type(httpx.HTTPStatusError) instead of a custom predicate with retry_if_exception(). The former retries on all HTTP status errors, including permanent 4xx client errors that will never succeed on retry. Always use a predicate function that checks the status code, as shown above.

Sources and Further Reading

The retry and backoff strategies in this article are grounded in published engineering practices from organizations that operate distributed systems at scale. These are the primary references used:

  • Marc Brooker, "Exponential Backoff and Jitter", AWS Architecture Blog (2015, updated May 2023) -- the original analysis of full jitter, equal jitter, and decorrelated jitter strategies with simulation data.
  • Marc Brooker, "Timeouts, retries, and backoff with jitter", Amazon Builders' Library -- production-level guidance on combining timeouts, retries, and jitter in distributed systems.
  • httpx Exception Documentation, python-httpx.org/exceptions -- the full exception hierarchy for the httpx HTTP client.
  • tenacity Documentation, tenacity.readthedocs.io -- the official docs for the tenacity retry library, including async support and wait strategies.
  • Python 3.11 asyncio.timeout, docs.python.org -- the official documentation for the asyncio.timeout() context manager introduced in Python 3.11.
  • RFC 9110 Section 10.2.3, Retry-After header specification -- the HTTP semantics standard defining the Retry-After response header.

Key Takeaways

  1. Only retry transient errors: Network timeouts, connection failures, HTTP 429, and HTTP 5xx errors are worth retrying. Client errors (4xx except 429) indicate a problem with the request itself and will fail every time regardless of how many retries you attempt.
  2. Use exponential backoff to give servers time to recover: Doubling the delay between each retry (1s, 2s, 4s, 8s) prevents your retries from contributing to the server overload that caused the failure. Cap the maximum delay to keep total wait times reasonable.
  3. Add jitter to break synchronization: When multiple clients retry at the same exponential intervals, they create coordinated bursts. Adding a random component between 0 and the calculated delay spreads retries across time and reduces collision.
  4. Set an overall deadline with asyncio.timeout: Individual per-request timeouts are not enough. Wrap your entire retry loop in asyncio.timeout() (Python 3.11+) to enforce a maximum total duration across all attempts.
  5. Use tenacity in production: The tenacity library provides a decorator-based approach to retries with built-in exponential backoff, jitter, conditional retry logic, and native async support. It handles edge cases that hand-rolled retry loops often miss.

Error handling is not the exciting part of async programming, but it is what separates code that works in development from code that works in production. A well-designed retry strategy with exponential backoff and jitter turns transient failures into invisible hiccups instead of application-breaking crashes.