Python API Rate Limiting with Redis: Building a Distributed Sliding Window Limiter

Fixed window rate limiters have a well-known flaw: a client can fire the maximum allowed number of requests at the end of one window and again at the start of the next, effectively doubling the intended limit. The sliding window algorithm eliminates this problem entirely, and Redis gives you the shared state you need to enforce it across a distributed fleet of servers.

This article covers two distinct sliding window strategies: the sliding window log, which stores a timestamp for every request in a Redis sorted set for perfect accuracy, and the sliding window counter, which uses a weighted average of two fixed-window counters for memory efficiency. You will implement both in Python, wrap the critical operations in Lua scripts for atomicity, and learn when to choose one over the other.

Why Fixed Windows Break at the Edges

Consider a rate limit of 100 requests per minute using a fixed window counter. The counter resets at the top of each minute. A client sends 100 requests at 11:00:59 -- all are allowed because the current window has capacity. One second later, at 11:01:00, a new window begins and the counter resets to zero. The client sends another 100 requests immediately. From the rate limiter's perspective, both windows are within their limits. But from a system load perspective, 200 requests arrived within two seconds.

This is the boundary-crossing problem, and it exists in every fixed window implementation. You can mitigate it by adding a secondary shorter-duration limit (such as 5 requests per second alongside 100 per minute), but that adds complexity and still does not solve the underlying issue. The sliding window approach solves it by design: it evaluates a continuously rolling time range rather than a fixed interval with a hard reset.

Note

Redis is a strong fit for distributed rate limiting because its in-memory architecture handles the high call volumes rate limiters generate, its atomic operations prevent race conditions, and its key expiration feature automatically cleans up stale data. Managed services like Amazon ElastiCache or Redis Cloud handle failover if you need high availability.

The Sliding Window Log: Precision with Redis Sorted Sets

The sliding window log records the exact timestamp of every request. When a new request arrives, you remove all timestamps older than the window, count what remains, and check against the limit. Redis sorted sets are purpose-built for this pattern: the ZADD command inserts a member with a score (the timestamp), ZREMRANGEBYSCORE prunes entries outside the window, and ZCARD counts what is left.

import time
import uuid
import redis
from typing import Tuple, Dict


class SlidingWindowLog:
    """
    Precise sliding window rate limiter using Redis sorted sets.

    Each request timestamp is stored as a member in a sorted set.
    This provides exact counting but uses more memory under high
    request volumes.
    """

    def __init__(
        self,
        redis_client: redis.Redis,
        limit: int,
        window_seconds: int,
        key_prefix: str = "ratelimit:swl",
    ):
        self.redis = redis_client
        self.limit = limit
        self.window_seconds = window_seconds
        self.key_prefix = key_prefix

    def is_allowed(self, client_id: str) -> Tuple[bool, Dict]:
        """
        Check if a request is allowed under the sliding window.

        Returns (allowed, info_dict) where info_dict contains
        limit, remaining, and retry_after values.
        """
        key = f"{self.key_prefix}:{client_id}"
        now = time.time()
        window_start = now - self.window_seconds

        # Use a pipeline to batch Redis commands
        pipe = self.redis.pipeline()
        pipe.zremrangebyscore(key, "-inf", window_start)
        pipe.zcard(key)
        results = pipe.execute()

        current_count = results[1]

        if current_count < self.limit:
            # Generate a unique member to avoid collisions
            member = f"{now}:{uuid.uuid4().hex[:8]}"
            pipe = self.redis.pipeline()
            pipe.zadd(key, {member: now})
            pipe.expire(key, self.window_seconds + 1)
            pipe.execute()

            return True, {
                "limit": self.limit,
                "remaining": self.limit - current_count - 1,
                "retry_after": 0,
            }

        # Calculate when the oldest entry will expire
        oldest = self.redis.zrange(key, 0, 0, withscores=True)
        retry_after = 0
        if oldest:
            retry_after = oldest[0][1] + self.window_seconds - now

        return False, {
            "limit": self.limit,
            "remaining": 0,
            "retry_after": round(max(0, retry_after), 2),
        }

The uuid.uuid4().hex[:8] appended to each member ensures uniqueness. This is important because Redis sorted sets do not store duplicate members. If two requests arrive at the exact same timestamp (which happens routinely in high-traffic systems), the second one would overwrite the first without the unique suffix, and you would undercount requests.

Warning

The pipeline-based implementation above has a race condition. Between checking the count and adding the new entry, another request could slip in and push the count over the limit. For production use, wrap the entire sequence in a Lua script (covered in Section 4) to guarantee atomicity.

The memory cost of this approach scales linearly with request volume. If you allow 10,000 requests per hour per client, each sorted set can hold up to 10,000 entries. For a service with 50,000 active clients, that is potentially 500 million entries in Redis. This is why the sliding window counter exists as a lighter-weight alternative.

The Sliding Window Counter: Memory-Efficient Approximation

The sliding window counter is a hybrid that blends two fixed-window counters to approximate a sliding window. Instead of storing every timestamp, it keeps a request count for the current window and the previous window. When a request arrives, it calculates a weighted estimate using this formula:

Estimated Count = (Previous Window Count × Overlap Percentage) + Current Window Count

The overlap percentage is the fraction of the previous window that still falls within the sliding window. If you are 20 seconds into a 60-second window, the previous window's weight is (60 - 20) / 60 = 0.667. This means roughly two-thirds of the previous window's traffic still counts against the limit.

class SlidingWindowCounter:
    """
    Memory-efficient sliding window using weighted counters.

    Uses only two Redis keys per client (current + previous window)
    instead of storing individual request timestamps. Trades slight
    imprecision for dramatically lower memory usage.
    """

    def __init__(
        self,
        redis_client: redis.Redis,
        limit: int,
        window_seconds: int,
        key_prefix: str = "ratelimit:swc",
    ):
        self.redis = redis_client
        self.limit = limit
        self.window_seconds = window_seconds
        self.key_prefix = key_prefix

    def _get_window_keys(self, client_id: str, now: float):
        """Calculate current and previous window keys."""
        current_window = int(now // self.window_seconds)
        previous_window = current_window - 1
        elapsed = now - (current_window * self.window_seconds)
        weight = 1 - (elapsed / self.window_seconds)

        current_key = f"{self.key_prefix}:{client_id}:{current_window}"
        previous_key = f"{self.key_prefix}:{client_id}:{previous_window}"

        return current_key, previous_key, weight

    def is_allowed(self, client_id: str) -> Tuple[bool, Dict]:
        """Check if a request is allowed using weighted counters."""
        now = time.time()
        current_key, previous_key, prev_weight = self._get_window_keys(
            client_id, now
        )

        # Fetch both counters
        pipe = self.redis.pipeline()
        pipe.get(current_key)
        pipe.get(previous_key)
        results = pipe.execute()

        current_count = int(results[0] or 0)
        previous_count = int(results[1] or 0)

        # Calculate the weighted estimate
        estimated = (previous_count * prev_weight) + current_count

        if estimated < self.limit:
            # Increment the current window counter
            pipe = self.redis.pipeline()
            pipe.incr(current_key)
            # TTL covers current window + one extra window for overlap
            pipe.expire(current_key, self.window_seconds * 2)
            pipe.execute()

            return True, {
                "limit": self.limit,
                "remaining": int(self.limit - estimated - 1),
                "retry_after": 0,
            }

        return False, {
            "limit": self.limit,
            "remaining": 0,
            "retry_after": round(self.window_seconds - (
                now - int(now // self.window_seconds) * self.window_seconds
            ), 2),
        }

The memory difference is dramatic. Instead of storing up to 10,000 sorted set entries per client, you store exactly two small integer keys. The TTL on the current window key is set to twice the window duration so that it survives long enough to serve as the previous window's counter for the next period.

Pro Tip

The sliding window counter is the approach Figma uses in production for their API rate limiting. It provides near-exact accuracy for the vast majority of traffic patterns while using a fraction of the memory that the log-based approach requires.

Making It Atomic with Lua Scripts

Both implementations above use Redis pipelines, which batch commands but do not guarantee atomicity. Between checking the count and adding the new entry, a concurrent request from the same client on a different server could read the same count, and both requests would be allowed through. In a distributed system, this race condition is a real concern.

Redis Lua scripts solve this because they execute atomically on the server side. No other command can run between the steps of the script. Here is an atomic version of the sliding window log:

class AtomicSlidingWindowLog:
    """
    Sliding window log with atomic Lua script execution.

    The entire check-prune-count-add sequence runs as a single
    atomic operation inside Redis, eliminating race conditions.
    """

    LUA_SCRIPT = """
    local key = KEYS[1]
    local window_size = tonumber(ARGV[1])
    local limit = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
    local request_id = ARGV[4]
    local window_start = now - window_size

    -- Remove entries outside the sliding window
    redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

    -- Count remaining entries
    local count = redis.call('ZCARD', key)

    if count < limit then
        -- Add the new request
        redis.call('ZADD', key, now, request_id)
        redis.call('EXPIRE', key, math.ceil(window_size) + 1)
        return {1, limit - count - 1, 0}
    else
        -- Calculate retry_after from oldest entry
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retry_after = 0
        if #oldest > 0 then
            retry_after = tonumber(oldest[2]) + window_size - now
        end
        return {0, 0, tostring(retry_after)}
    end
    """

    def __init__(
        self,
        redis_client: redis.Redis,
        limit: int,
        window_seconds: int,
        key_prefix: str = "ratelimit:aswl",
    ):
        self.redis = redis_client
        self.limit = limit
        self.window_seconds = window_seconds
        self.key_prefix = key_prefix
        self._script = self.redis.register_script(self.LUA_SCRIPT)

    def is_allowed(self, client_id: str) -> Tuple[bool, Dict]:
        """Atomically check and record a request."""
        key = f"{self.key_prefix}:{client_id}"
        now = time.time()
        request_id = f"{now}:{uuid.uuid4().hex[:8]}"

        result = self._script(
            keys=[key],
            args=[self.window_seconds, self.limit, now, request_id],
        )

        allowed = bool(result[0])
        remaining = int(result[1])
        retry_after = float(result[2]) if not allowed else 0

        return allowed, {
            "limit": self.limit,
            "remaining": remaining,
            "retry_after": round(max(0, retry_after), 2),
        }

The register_script() method compiles the Lua script once and caches it on the Redis server using its SHA hash. Subsequent calls use EVALSHA instead of EVAL, avoiding the overhead of transmitting the full script text with every request. For a rate limiter that runs on every single API call, this optimization matters.

Note

All keys used inside a Lua script should be passed via the KEYS array rather than constructed dynamically within the script. This is a Redis requirement for compatibility with Redis Cluster, which routes commands to specific nodes based on key hashing.

Production Hardening: Failure Modes and Monitoring

A rate limiter that depends on an external service introduces a new failure mode. If Redis goes down, every request will throw an exception. You need a deliberate strategy for handling this.

The two standard options are fail-open and fail-closed. Fail-open allows all requests when Redis is unavailable, preserving user experience at the cost of temporary loss of rate limiting. Fail-closed rejects all requests, prioritizing security but potentially taking your entire API offline. The right choice depends on what your API does. A public content API might fail-open, while a payment processing endpoint should fail-closed.

import logging

logger = logging.getLogger(__name__)


class ResilientSlidingWindow:
    """Rate limiter with configurable failure behavior."""

    def __init__(
        self,
        redis_client: redis.Redis,
        limit: int,
        window_seconds: int,
        fail_open: bool = True,
    ):
        self.limiter = AtomicSlidingWindowLog(
            redis_client, limit, window_seconds
        )
        self.fail_open = fail_open

    def is_allowed(self, client_id: str) -> Tuple[bool, Dict]:
        """Check rate limit with graceful Redis failure handling."""
        try:
            return self.limiter.is_allowed(client_id)
        except redis.RedisError as exc:
            logger.error(
                "Redis unavailable for rate limiting: %s", exc,
                extra={"client_id": client_id},
            )
            if self.fail_open:
                return True, {
                    "limit": self.limiter.limit,
                    "remaining": -1,
                    "retry_after": 0,
                }
            return False, {
                "limit": self.limiter.limit,
                "remaining": 0,
                "retry_after": 5,
            }

Monitoring is the other critical piece. You should track the rate of 429 responses per client, Redis latency on rate limit operations, and the ratio of allowed vs. rejected requests. A sudden spike in rejections for a single client might indicate a misconfigured integration. A spike across all clients might mean your limits are too aggressive or that you are experiencing a coordinated attack.

Choosing Between the Two Approaches

Factor Sliding Window Log Sliding Window Counter
Precision Exact -- every request is individually tracked Approximate -- uses weighted average estimation
Memory per client O(n) where n = requests per window O(1) -- two counters regardless of volume
Redis data structure Sorted set (ZADD, ZCARD, ZREMRANGEBYSCORE) Simple strings (INCR, GET)
Best use case Low-to-medium volume APIs, payment processing, authentication endpoints High-volume APIs, public endpoints, services with many distinct clients
Boundary accuracy Perfect -- no edge-case bursts possible Excellent -- rare off-by-one during transitions

For the majority of APIs, the sliding window counter is the better default. Its memory footprint stays constant regardless of traffic volume, it requires simpler Redis commands, and the weighted approximation is close enough to exact for all but the strictest compliance requirements. Reserve the log-based approach for scenarios where every single request must be individually tracked -- rate limiting on financial transactions, authentication attempts, or audit-sensitive operations where precision is a hard requirement.

Window 1 (prev) Window 2 (current) SLIDING WINDOW overlap weight: 0.42 current weight: 1.0 now estimated = (prev_count × overlap_weight) + current_count outside window (ignored) overlap region (weighted) current window (full count)
Sliding window counter -- requests from the previous window are weighted by the overlap fraction, then added to the current window count. Requests outside the sliding window are ignored entirely.

Key Takeaways

  1. Sliding windows eliminate the boundary-crossing burst problem: Unlike fixed windows that reset abruptly, sliding windows evaluate a continuously moving time range. This prevents clients from exploiting the reset point to send double the intended volume.
  2. The log approach gives precision; the counter approach gives efficiency: Use Redis sorted sets and per-request timestamps when you need exact enforcement. Use weighted dual-window counters when you need to scale to high volumes with constant memory per client.
  3. Lua scripts are essential for atomicity in distributed environments: Without them, the gap between reading the count and writing the new entry creates a race condition that concurrent requests can exploit. Lua scripts execute as a single atomic operation on the Redis server.
  4. Plan for Redis failures before they happen: Decide on a fail-open or fail-closed strategy based on your API's security requirements, wrap all Redis operations in exception handlers, and log failures for monitoring.
  5. Always return rate limit headers: Include X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After in every response so clients can self-throttle before they hit the limit.

Rate limiting is not just a performance concern -- it is a security control. A properly implemented sliding window limiter protects your API from abuse, prevents runaway client scripts from exhausting your resources, and ensures fair access across all consumers. Redis gives you the speed and shared state to enforce these limits consistently across every server in your fleet, whether you choose the precision of sorted set logs or the efficiency of weighted counters.

back to articles