How does requests-ratelimiter handle 429 responses?

When the server returns a 429 Too Many Requests response, requests-ratelimiter automatically adjusts its internal request log to synchronize with the server-side limit. This means it will slow down subsequent requests without you needing to write any retry logic for rate limit errors.

Can I use requests-ratelimiter in a multi-threaded application?

Yes, but you need a persistent backend. By default, rate limits are tracked in memory and each session tracks independently. For multi-threaded or multi-process environments, use SQLiteBucket or RedisBucket as the bucket_class so that rate limit state is shared across all threads and processes.

Does requests-ratelimiter respect the Retry-After header?

No. The library does not read or parse the Retry-After header from server responses. Its 429 handling adjusts the internal request log based on its own configured rate, not the server's suggested wait time. To respect Retry-After, handle it manually in your code or use urllib3.Retry with respect_retry_after_header=True.

Can I use requests-ratelimiter with async code like httpx or aiohttp?

No. requests-ratelimiter only works with the synchronous requests library and will block the event loop if used in async code. For async HTTP clients, use pyrate-limiter directly. It provides native extras for httpx (RateLimiterTransport and AsyncRateLimiterTransport) and aiohttp (RateLimitedSession) that integrate with async event loops.

Can I combine requests-ratelimiter with urllib3 retry logic?

Yes. Mount a retry-enabled HTTPAdapter on your LimiterSession. Rate limiting controls how fast requests leave your client, while retry logic handles transient failures like 500 and 503 errors. Avoid adding 429 to the retry status_forcelist if you are using the library's built-in 429 synchronization, since both mechanisms would react to the same response.

Why does requests-ratelimiter use the leaky bucket algorithm instead of a token bucket?

The leaky bucket algorithm enforces a steady, evenly-spaced output rate, which is ideal for API consumption where you want predictable request pacing. Token bucket algorithms allow short bursts of traffic, which can trigger server-side rate limits even when the average rate is within bounds. For client-side API throttling, the leaky bucket's consistent pacing tends to produce fewer 429 errors.

How to Use the requests-ratelimiter Library to Throttle Python HTTP Requests

There is a fundamental tension in HTTP client design. Your code wants to move as fast as possible. The server wants you to slow down. Somewhere between those two competing pressures, client-side rate limiting emerges not as a workaround, but as an architectural decision about how your application models its relationship with external services. The requests-ratelimiter library (v0.9.2 as of February 2026) gives you a composable, algorithm-aware way to embed that decision directly into the requests.Session lifecycle -- no manual sleep logic, no ad-hoc retry loops.

This article covers the full feature set of requests-ratelimiter: the drop-in LimiterSession, the more flexible LimiterAdapter for per-host limits, multiple simultaneous rate limits, persistent backends for multi-threaded environments, automatic 429 synchronization, integration with requests-cache so that cached responses do not count against your quota, pairing with urllib3 retry logic, debugging and monitoring the limiter, what the library does not handle (like Retry-After headers), and when to reach for async alternatives instead. It also examines why the library chose the leaky bucket algorithm over other rate limiting strategies, what you give up with that choice, and how to evaluate the broader solution landscape when the leaky bucket is not the right fit.

Installation and Quick Start with LimiterSession

Install the library with pip. It pulls in pyrate-limiter (v4.x) as its underlying rate limiting engine and requests as a peer dependency:

pip install requests-ratelimiter

LimiterSession is the simplest entry point. It is a subclass of requests.Session that automatically pauses between requests to stay within your configured rate. Swap out your existing session and every call through it is throttled:

from requests_ratelimiter import LimiterSession
from time import time

# 5 requests per second, applied to all requests
session = LimiterSession(per_second=5)

start = time()
for i in range(15):
    response = session.get("https://httpbin.org/get")
    print(f"[t+{time() - start:.2f}s] Request {i + 1}: {response.status_code}")

The output shows the first 5 requests completing quickly, then a pause before the next batch. The library handles the timing internally -- your code just calls session.get() as usual, and the session sleeps as needed to stay within the limit. There is no need for manual time.sleep() calls or custom retry logic.

The shorthand parameters cover the common intervals: per_second, per_minute, per_hour, per_day, and per_month. You can use any combination of these simultaneously. The library also accepts a burst parameter (defaulting to 1) that controls how many consecutive requests are allowed before per-second rate-limiting kicks in.

Note

Under the hood, requests-ratelimiter uses the leaky bucket algorithm from pyrate-limiter. This means requests are spaced out evenly rather than being allowed in bursts. That design choice has significant implications for how your client behaves under load -- we explore the tradeoffs in the "Why the Leaky Bucket" section below.

Per-Host Rate Limits with LimiterAdapter

When your application talks to multiple APIs with different rate limits, a single global limit is too restrictive for fast APIs and too generous for slow ones. This is a common architectural problem: your code needs to model the rate constraints of each external service independently, not flatten them into a single shared throttle. LimiterAdapter is a transport adapter that you mount on specific URL prefixes, letting you apply different limits to different hosts:

from requests import Session
from requests_ratelimiter import LimiterAdapter

session = Session()

# GitHub API: 5000 requests per hour
github_adapter = LimiterAdapter(per_hour=5000)
session.mount("https://api.github.com", github_adapter)

# Internal API: 100 requests per second
internal_adapter = LimiterAdapter(per_second=100)
session.mount("https://internal.example.com", internal_adapter)

# Third-party data API: 10 requests per minute
data_adapter = LimiterAdapter(per_minute=10)
session.mount("https://api.slowservice.com", data_adapter)

# Each request is throttled according to its target host
github_data = session.get("https://api.github.com/repos/python/cpython")
internal_data = session.get("https://internal.example.com/users")
external_data = session.get("https://api.slowservice.com/data")

The adapter matching uses the same longest-prefix rule as standard requests transport adapters. A request to https://api.github.com/repos/python/cpython matches the adapter mounted on https://api.github.com. You can mount multiple adapters on increasingly specific prefixes if you need endpoint-level control within the same host.

Think about this for a moment: what you are doing with per-host adapters is creating a local model of the API's server-side rate policy. That model does not have to be perfect -- it just has to be conservative enough that your client rarely triggers a 429. The closer your client-side model mirrors the server's enforcement, the more throughput you get without errors.

Pro Tip

LimiterSession also supports per-host tracking with the per_host=True parameter (which is the default as of v0.9.x). When enabled, requests to different hosts get independent rate limit counters even with a single session. This is simpler than mounting multiple adapters when all hosts share the same rate limit values.

Multiple Rate Limits and Custom Intervals

Many APIs enforce multiple overlapping rate limits -- for example, 10 requests per second and 1000 per hour. This layered enforcement is the server's way of preventing both micro-bursts and sustained high-volume consumption. Your client-side model needs to mirror both layers to avoid hitting either threshold. With the shorthand parameters, you can specify several at once:

# Enforce multiple simultaneous limits
session = LimiterSession(per_second=10, per_hour=1000)

For non-standard intervals, use pyrate-limiter's RequestRate and Limiter objects directly. This gives you full control over the timing:

from pyrate_limiter import Duration, RequestRate, Limiter
from requests_ratelimiter import LimiterSession

# Custom: 50 requests per 30 seconds AND 500 per 10 minutes
rate_30s = RequestRate(50, Duration.SECOND * 30)
rate_10m = RequestRate(500, Duration.MINUTE * 10)

limiter = Limiter(rate_30s, rate_10m)
session = LimiterSession(limiter=limiter)

The Limiter object evaluates all rates on every request. A request is only allowed through if it satisfies every rate in the set. This means the tightest applicable limit always controls the pace -- a behavior that directly models how server-side rate enforcement works in practice.

Persistent Backends: SQLite and Redis

By default, LimiterSession tracks rate limits in memory using a simple list. This works for single-process scripts, but falls apart in multi-threaded web applications or multiprocess workers where each process has its own memory space. The failure mode is subtle and dangerous: each process believes it is the only consumer of the rate limit, so four worker processes each configured at 5 requests per second will collectively send 20 requests per second to the API.

For shared rate limit state, use SQLite or Redis as the backend:

from pyrate_limiter import SQLiteBucket, RedisBucket
from requests_ratelimiter import LimiterSession

# SQLite backend -- persists across restarts, shared across threads
session_sqlite = LimiterSession(
    per_second=5,
    bucket_class=SQLiteBucket,
)

# Redis backend -- shared across processes and servers
session_redis = LimiterSession(
    per_second=5,
    bucket_class=RedisBucket,
    bucket_kwargs={"redis_pool": your_redis_pool},
)

SQLite is the pragmatic choice for single-machine deployments with multiple threads or processes. It uses file-level locking to coordinate access and persists rate limit state across application restarts. Redis is the choice for distributed deployments where multiple servers need to share a single rate limit counter. For multiprocessing-specific scenarios, pyrate-limiter v4.x also provides a MultiprocessBucket with file-locking optimized for cross-process coordination without the overhead of a full SQLite database.

Warning

When using the in-memory backend (the default), each LimiterSession instance tracks its own rate limit independently. If you create two sessions with per_second=5, your application can send 10 requests per second total. For a shared global limit, use a single session instance or switch to a persistent backend. This is the number one source of rate limit violations in production code that uses this library.

Automatic 429 Handling

One of the library's quieter features is automatic synchronization with server-side rate limits. If a server returns a 429 response, requests-ratelimiter adjusts its internal request log to catch up to the server's limit. This means subsequent requests will be slowed down automatically -- the library adapts without you writing any retry logic.

This is a meaningful architectural detail worth pausing on. The client-side rate limit you configure is an estimate of the server's enforcement. The 429 synchronization acts as a feedback loop: when the estimate is wrong, the library self-corrects by observing the server's behavior. It is closed-loop control applied to HTTP throughput.

You can customize which status codes trigger this behavior:

# Default: adjust on 429 responses
session = LimiterSession(per_second=5, limit_statuses=[429])

# Some APIs return 500 instead of 429 for rate limits
session = LimiterSession(per_second=5, limit_statuses=[429, 500])

# Disable automatic adjustment entirely
session = LimiterSession(per_second=5, limit_statuses=[])

This feature is valuable when you are unsure of the exact server-side rate limit, or when the documented limit does not match the enforced limit -- which is more common than you might think. Many APIs enforce stricter internal limits than what their documentation advertises, or apply different limits based on your account tier, time of day, or endpoint.

Combining with requests-cache

If you are using requests-cache to cache API responses, you can combine it with requests-ratelimiter using the LimiterMixin or the built-in CachedLimiterSession. The key benefit is that cache hits do not count against your rate limit -- a critical detail that separates naive caching from intelligent request management:

from requests_ratelimiter import CachedLimiterSession
from pyrate_limiter import SQLiteBucket

# Combined caching + rate limiting with shared SQLite backend
session = CachedLimiterSession(
    per_second=5,
    cache_name="api_cache.db",
    bucket_class=SQLiteBucket,
    bucket_kwargs={
        "path": "api_cache.db",
        "isolation_level": "EXCLUSIVE",
        "check_same_thread": False,
    },
)

# First call: hits the API, counts against rate limit
response = session.get("https://api.example.com/data/123")

# Second call: served from cache, does NOT count against rate limit
response = session.get("https://api.example.com/data/123")

This is a powerful pattern for API clients that repeatedly request the same resources. The cache serves repeated requests instantly while the rate limiter only throttles genuine network calls. Using SQLite for both the cache and the rate limit bucket keeps everything in a single database file. The compound effect matters: if 40% of your requests are cache hits, you are effectively getting 40% more useful throughput from the same rate limit allocation.

Pro Tip

You can also use the LimiterMixin to add rate limiting to any custom session class. This is useful when you are working with other requests-based libraries that provide their own session subclasses -- just inherit from both LimiterMixin and the library's session class. The mixin ordering matters: LimiterMixin should appear before the other session class in the MRO so that rate limiting wraps the outer call path.

Combining with urllib3 Retry Logic

Rate limiting prevents you from sending too many requests. Retry logic handles what happens when a request fails for transient reasons -- network timeouts, 502 errors, or temporary server-side problems. These are separate concerns that operate on different failure domains, and using both together makes your HTTP client resilient in two different ways.

The requests library uses urllib3 under the hood, and urllib3.util.retry.Retry provides configurable retry behavior. You can layer it onto a LimiterSession by mounting a retry-enabled adapter alongside the rate limiter:

from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from requests_ratelimiter import LimiterSession

# Configure retry strategy
retry_strategy = Retry(
    total=3,
    status_forcelist=[500, 502, 503, 504],
    backoff_factor=1,
    respect_retry_after_header=True,
)

# Create a rate-limited session
session = LimiterSession(per_second=5)

# Mount a retry-enabled adapter on top
retry_adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", retry_adapter)
session.mount("http://", retry_adapter)

# Requests are now rate-limited AND automatically retried on failure
response = session.get("https://api.example.com/data")

There is an important ordering detail here. The rate limiter controls how fast requests leave your client. The retry adapter controls what happens after a request returns a failing status code. When a retried request fires again, it still passes through the rate limiter, so retries do not blow past your configured rate. This means the two features compose cleanly without interfering with each other.

Warning

Do not add 429 to the status_forcelist in your Retry configuration if you are also using requests-ratelimiter's built-in 429 synchronization. Both mechanisms would react to the same 429 response, potentially causing the retry adapter to fire a new request immediately while the rate limiter is still adjusting. Let the rate limiter own 429 handling and let the retry adapter handle server errors like 500 and 503.

Debugging and Monitoring the Limiter

When a rate-limited script runs slower than expected -- or faster than you intended -- it helps to see what the limiter is doing. The library itself does not produce log output, but you can get visibility by enabling urllib3's debug logging and adding your own timing instrumentation.

Start with urllib3 logging to see when requests are sent at the network level:

import logging
from requests_ratelimiter import LimiterSession

# Enable urllib3 debug logs to see request timing
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("urllib3").setLevel(logging.DEBUG)

session = LimiterSession(per_second=2)

for i in range(6):
    response = session.get("https://httpbin.org/get")
    print(f"Request {i + 1}: {response.status_code}")

The timestamps on each log line will show you the pauses the limiter introduces between batches. For more structured monitoring, wrap your calls in a timing decorator that records the delay before each request:

from time import time
from requests_ratelimiter import LimiterSession

session = LimiterSession(per_second=5)
request_log = []

for i in range(20):
    t_before = time()
    response = session.get("https://httpbin.org/get")
    t_after = time()
    elapsed = t_after - t_before
    request_log.append({"request": i + 1, "elapsed": elapsed, "status": response.status_code})

# Find requests where the limiter added noticeable delay
throttled = [r for r in request_log if r["elapsed"] > 0.5]
print(f"Throttled {len(throttled)} of {len(request_log)} requests")

If you are using a persistent backend like SQLite, you can also inspect the database directly to see the current state of the rate limit buckets. This is useful for confirming that multiple processes are sharing the same limit counter as expected.

Retry-After Headers and What the Library Does Not Do

When a server returns a 429 response, it often includes a Retry-After header telling the client exactly how long to wait before sending the next request. This header can contain either a number of seconds or a specific date-time value. It is the server's direct instruction to the client about timing.

requests-ratelimiter does not read or respect the Retry-After header. Its 429 handling works differently: when it receives a 429 status code, it adjusts its internal request log to slow down subsequent requests based on its own configured rate. This means the library synchronizes its internal pace but does not defer to the server's suggested wait time. This is a deliberate design boundary, not an oversight -- the library treats rate limiting as a client-side scheduling problem and leaves server-side negotiation to other layers.

If respecting the Retry-After header is important for your use case (and it often is with APIs that send specific backoff instructions), you need to handle it yourself. One approach is to check for the header after each response and sleep accordingly:

import time
from requests_ratelimiter import LimiterSession

session = LimiterSession(per_second=5, limit_statuses=[])

def request_with_retry_after(session, url, max_retries=3):
    for attempt in range(max_retries):
        response = session.get(url)
        if response.status_code == 429:
            retry_after = response.headers.get("Retry-After")
            if retry_after:
                wait = int(retry_after)
                print(f"Server says wait {wait}s (attempt {attempt + 1})")
                time.sleep(wait)
                continue
        return response
    return response

response = request_with_retry_after(session, "https://api.example.com/data")

Note the limit_statuses=[] in the session constructor. This disables the library's built-in 429 synchronization so it does not conflict with your manual Retry-After handling. You can alternatively set respect_retry_after_header=True in a urllib3.Retry configuration as shown in the previous section -- that approach handles Retry-After at the adapter level instead.

Beyond Retry-After, there are a few other things the library does not handle that are worth knowing about:

Capability	Status
`Retry-After` header parsing	Not supported. Handle manually or via `urllib3.Retry`.
`X-RateLimit-*` header reading	Not supported. The library does not inspect response headers to adapt its rate dynamically.
Weighted requests	`pyrate-limiter` supports item weights, but `requests-ratelimiter` does not expose a way to assign weight per HTTP request.
Burst tolerance	The leaky bucket algorithm spaces requests evenly. The `burst` parameter controls initial burst size, but does not change the fundamental algorithm.
Async HTTP clients	Only works with the synchronous `requests` library. See the async section for alternatives.
Dynamic rate adaptation	Cannot adjust its configured rate at runtime based on server feedback. The rate is fixed at session creation time.

Why the Leaky Bucket -- and When You Need Something Else

The algorithm choice in requests-ratelimiter is not arbitrary. The leaky bucket enforces a steady, evenly-paced output rate: if you configure 5 requests per second, the library spaces them roughly 200 milliseconds apart. This constant drip is excellent for API consumption because it produces the smoothest possible request pattern -- servers prefer predictable, steady traffic over bursty patterns that spike and idle.

But the leaky bucket has a real cost. It cannot accommodate legitimate bursts. If an API allows 100 requests per minute and you need to fetch 30 records immediately at startup, the leaky bucket forces you to wait 18 seconds (30 requests at 600ms intervals) even though the server would happily accept all 30 in a single second. This is where understanding the alternative algorithms becomes essential:

A token bucket algorithm maintains a pool of "tokens" that refill at a steady rate. Each request consumes a token. When tokens are available, requests pass immediately -- allowing bursts up to the bucket capacity. When the bucket is empty, requests wait for tokens to regenerate. This gives you burst tolerance while still enforcing a long-term average rate. For client-side API consumption where the server's rate limit is expressed as "N requests per time window" (which is how many APIs work), the token bucket is often a better behavioral match.

A sliding window algorithm tracks actual request timestamps within a rolling time window. When a new request arrives, the algorithm counts how many requests occurred within the trailing window. If the count exceeds the limit, the request is delayed. This approach gives the most accurate rate enforcement and avoids the boundary-doubling problem of fixed windows, but it requires more memory per client because every timestamp must be stored.

A fixed window counter is the simplest approach: count requests within a fixed time block (say, each calendar minute) and reject when the count exceeds the limit. It is trivial to implement but has a well-known weakness at window boundaries -- a client can send the maximum number of requests at the end of one window and the maximum again at the start of the next, effectively doubling the rate for a brief period.

Algorithm	Burst Behavior	Memory Cost	Best For
Leaky Bucket	No bursts; steady output	Low (queue + leak rate)	APIs that penalize any burst traffic
Token Bucket	Bursts up to bucket capacity	Low (token count + timestamp)	APIs with "N per window" limits
Sliding Window Log	Depends on implementation	High (stores all timestamps)	Precision-critical rate enforcement
Sliding Window Counter	Weighted approximation	Low (two counters)	Balance of accuracy and efficiency
Fixed Window Counter	Boundary doubling risk	Minimal (one counter)	Simple scripts with low precision needs

If requests-ratelimiter's leaky bucket approach does not match your use case, you can use pyrate-limiter directly and configure a different bucket strategy, or reach for a different library entirely. The key insight is that the algorithm choice should be driven by the server's rate enforcement model, not your client's convenience. Choosing the wrong algorithm is like bringing the wrong key to a lock -- it might fit some of the time, but you will eventually get locked out.

Async Code and the Boundaries of requests-ratelimiter

requests-ratelimiter wraps the synchronous requests library. If your application uses asyncio with httpx or aiohttp, this library is not the right tool. It will block the event loop during its sleep pauses, which defeats the purpose of async I/O.

The good news is that pyrate-limiter -- the same rate limiting engine under the hood -- provides native extras for async HTTP clients as of version 4.x. You can use these directly without needing requests-ratelimiter at all:

# httpx (sync or async) with pyrate-limiter
from pyrate_limiter import limiter_factory, Duration
from pyrate_limiter.extras.httpx_limiter import (
    RateLimiterTransport,
    AsyncRateLimiterTransport,
)
import httpx

limiter = limiter_factory.create_inmemory_limiter(
    rate_per_duration=5, duration=Duration.SECOND
)

# Synchronous httpx
with httpx.Client(transport=RateLimiterTransport(limiter=limiter)) as client:
    response = client.get("https://httpbin.org/get")

# Asynchronous httpx
async with httpx.AsyncClient(
    transport=AsyncRateLimiterTransport(limiter=limiter)
) as client:
    response = await client.get("https://httpbin.org/get")

# aiohttp with pyrate-limiter
from pyrate_limiter import limiter_factory, Duration
from pyrate_limiter.extras.aiohttp_limiter import RateLimitedSession

limiter = limiter_factory.create_inmemory_limiter(
    rate_per_duration=5, duration=Duration.SECOND
)

session = RateLimitedSession(limiter)
# Use session.get(), session.post(), etc. as normal aiohttp calls

The async approach matters when you are making many requests concurrently. A synchronous rate-limited loop with requests sends one request at a time, waits for the response, then sleeps if needed before the next. An async rate-limited client can have multiple requests in-flight simultaneously while still respecting the rate limit -- the limiter gates how fast new requests are initiated, not how many can be waiting for responses at once. This distinction is critical for applications where network latency dominates: an API with 200ms response time and a 10 requests/second limit can only achieve 5 effective requests/second synchronously but the full 10 requests/second asynchronously.

Note

Install the async extras separately: pip install pyrate-limiter[httpx] for httpx support, or pip install pyrate-limiter[aiohttp] for aiohttp. These bring in the transport wrappers and async bucket implementations. The same SQLite and Redis backends are available for persistent rate limit state in async contexts.

Evaluating Alternatives: Beyond the Obvious Choices

The Python rate limiting ecosystem is broader than a single library, and the right choice depends on where rate limiting sits in your architecture. Here is a framework for thinking about the alternatives:

When You Need Retry Logic With Backoff, Not Rate Limiting

Many developers reach for rate limiting when what they need is intelligent retry behavior. If your primary problem is recovering from 429 and 5xx errors rather than preventing them, Tenacity is a more direct solution. It provides decorator-based retry logic with configurable exponential backoff, jitter, and stop conditions. The difference is architectural: rate limiting is proactive (preventing overload), while retry with backoff is reactive (recovering from it). For applications calling APIs with unpredictable rate limit enforcement, combining both approaches -- requests-ratelimiter for proactive pacing and Tenacity for reactive recovery -- covers both failure modes.

When You Need Per-Domain Control in Async Code

For httpx-based applications that need per-domain rate limiting (different limits for different API hosts), the httpx-limiter package (v0.5.0, January 2026) offers transport-level limiting with a repository pattern for domain-based rate configuration. It supports both aiolimiter and pyrate-limiter as backends, giving you a choice between lightweight single-rate limiting and multi-rate support. This is a more targeted solution than using pyrate-limiter's httpx extras directly when you need domain-level granularity.

When You Need Distributed Rate Limiting With Precision

For distributed systems where multiple services or workers consume a shared API quota, consider the Upstash rate limiting SDK, which provides serverless Redis-backed rate limiting with fixed window, sliding window, and token bucket algorithms. The advantage over requests-ratelimiter with a Redis backend is that Upstash handles the Redis infrastructure and provides atomic operations optimized for rate limiting, including a block_until_ready method that blocks until capacity is available rather than rejecting immediately. It also supports tiered rate limiting -- different limits for different user classes -- which is useful when your application serves both free and paid API consumers.

When You Need Algorithm Flexibility With Minimal Code

The ratelimiter package takes a different approach entirely: it provides a context manager and decorator interface rather than a session wrapper. This makes it agnostic to the HTTP library you use. You can wrap any callable -- not just HTTP requests -- with rate limiting, which is useful when you need to throttle database queries, file system operations, or any other resource-constrained operation alongside API calls. The tradeoff is that it does not integrate with requests' transport adapter model, so you lose per-host granularity and 429 synchronization.

When the Real Problem Is Architecture, Not Library Choice

If you find yourself building increasingly complex rate limiting configurations -- layered limits, per-endpoint adapters, persistent backends, retry logic, Retry-After parsing -- consider whether client-side rate limiting is the right layer for this complexity. An API gateway (like Kong, NGINX, or a cloud provider's API Gateway service) can enforce rate limits at the infrastructure level, decoupling the concern from your application code entirely. This is especially relevant when multiple applications or microservices consume the same external API: centralizing rate limit enforcement at the gateway ensures a single shared quota rather than each service independently estimating its share.

Key Takeaways

LimiterSession is a drop-in replacement for requests.Session: Two lines of code -- import and instantiate -- and every request through the session is automatically throttled. No manual sleep calls needed.
LimiterAdapter gives per-host granularity: Mount different adapters on different URL prefixes to apply distinct rate limits to each API you consume. The longest prefix match determines which adapter handles each request.
Multiple rate limits can be enforced simultaneously: Use the shorthand parameters (per_second, per_minute, per_hour) together, or build custom Limiter objects with pyrate-limiter's RequestRate for non-standard intervals.
Use persistent backends for multi-threaded code: The default in-memory backend is per-session. For shared rate limits across threads or processes, use SQLiteBucket, RedisBucket, or MultiprocessBucket.
Automatic 429 handling creates a feedback loop: The library adjusts its internal state when the server returns a 429, functioning as closed-loop control that self-corrects when your configured rate mismatches the server's enforcement.
Combine with requests-cache so cached responses do not consume quota: CachedLimiterSession integrates both libraries and ensures only genuine network requests count against the rate limit.
Layer urllib3 retry logic for transient failures: Rate limiting and retries solve different problems in different failure domains. Use both together, but keep 429 handling with the rate limiter and server errors with the retry adapter.
The library does not parse Retry-After headers: If the server sends a specific wait time, you need to handle it yourself or delegate to urllib3.Retry with respect_retry_after_header=True.
The algorithm choice matters: The leaky bucket enforces steady pacing but prevents bursts. If your API's rate limits are window-based and your workload is bursty, evaluate whether a token bucket or sliding window approach would give you better throughput within the same limits.
For async code, use pyrate-limiter directly: requests-ratelimiter is synchronous only. The pyrate-limiter package provides native extras for httpx and aiohttp that integrate with async event loops without blocking.
Evaluate the full solution landscape: Tenacity for retry-with-backoff, httpx-limiter for async per-domain control, Upstash for distributed precision, or an API gateway when the complexity outgrows client-side code. The right tool depends on where rate limiting sits in your architecture.

Client-side rate limiting is not about working around API restrictions -- it is about building a client that models the server's constraints and operates within them by design. A script that proactively throttles itself stays under the radar, avoids 429 errors, and runs more reliably than one that hammers the server and relies on retry loops to recover. requests-ratelimiter makes this effortless to add to any synchronous Python project that uses the requests library, and pyrate-limiter's async extras extend the same engine to modern async HTTP clients when your architecture demands it. But remember: the library is a tool, and the algorithm behind the tool shapes how your client behaves. Choose the algorithm that matches the server's enforcement model, not just the library with the simplest API.