How to Rate Limit a Flask REST API in Python Using flask-limiter and Redis

Every API endpoint is a door, and rate limiting decides who gets to knock how often. Flask does not include rate limiting out of the box, but flask-limiter adds it with a single decorator. You set a human-readable limit string like "100 per minute", point it at a Redis backend, and every route in your API is protected. From there you can fine-tune limits per endpoint, exempt health checks, charge expensive operations at a higher cost, build meta-limit guardrails against denial-of-service patterns, and implement tiered access based on user roles -- all without restarting the application.

This article walks through flask-limiter from installation to production-hardened configuration. You will set up global defaults, apply per-route overrides, connect Redis for multi-worker deployments, choose between the three available strategies, implement dynamic limits based on user authentication, weight expensive requests with cost functions, layer meta limits for DoS defense, configure in-memory fallback for Redis outages, handle reverse proxy IP extraction, and build proper 429 error responses with rate limit headers.

Installation and Basic Setup

Install flask-limiter along with the Redis client library:

pip install Flask-Limiter redis

The minimal setup requires creating a Limiter instance, passing it a key function that identifies clients, and attaching it to your Flask app:

from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

app = Flask(__name__)

limiter = Limiter(
    get_remote_address,
    app=app,
    default_limits=["200 per day", "50 per hour"],
    storage_uri="memory://",
)


@app.route("/api/data")
def get_data():
    return {"items": ["a", "b", "c"]}


@app.route("/api/status")
def status():
    return {"status": "ok"}

The default_limits parameter applies to every route in the application. Both endpoints above are limited to 200 requests per day and 50 per hour per client IP. The limits are specified using a human-readable string syntax: [count] per [n] [second|minute|hour|day|month|year]. You can stack multiple limits with a list -- the tightest applicable limit controls the rate.

The storage_uri="memory://" setting stores rate limit counters in memory, which works for development and single-process deployments. For production with multiple Gunicorn workers or multiple servers, you need Redis. But understanding the in-memory mode is valuable: it gives you a zero-dependency starting point where the rate limiter's behavior is easy to reason about before you introduce the complexity of a shared backend.

Per-Route Limits and Exemptions

Different endpoints deserve different limits. A login endpoint should be tightly restricted to prevent brute-force attacks, while a read-only data endpoint can afford to be more generous. This is the same principle behind defense in depth: you do not apply one uniform policy to every surface. The @limiter.limit() decorator overrides the default for a specific route:

@app.route("/auth/login", methods=["POST"])
@limiter.limit("5 per minute")
def login():
    """Tight limit to prevent brute-force attempts."""
    return {"token": "..."}


@app.route("/api/search")
@limiter.limit("30 per minute")
def search():
    """Moderate limit for search queries."""
    return {"results": []}


@app.route("/api/feed")
@limiter.limit("100 per minute")
def feed():
    """Generous limit for lightweight reads."""
    return {"feed": []}


@app.route("/health")
@limiter.exempt
def health():
    """No rate limit -- called constantly by load balancers."""
    return {"healthy": True}

The @limiter.exempt decorator is critical for infrastructure endpoints. Health checks are called every few seconds by load balancers and monitoring systems. If they get rate limited, your orchestrator thinks the service is down and may restart it unnecessarily. The same logic applies to Prometheus metrics endpoints and internal readiness probes -- any endpoint whose consumer is infrastructure rather than a human or external client should be exempt.

Pro Tip

You can apply limits to entire Flask Blueprints by using limiter.limit() as a decorator on the blueprint's before_request handler, or by passing the blueprint to limiter.limit() with per_method=True to differentiate between GET and POST on the same route. This matters because a GET that reads data and a POST that writes data represent fundamentally different server costs.

Connecting Redis for Distributed Rate Limiting

When your Flask app runs behind Gunicorn with multiple workers, each worker process has its own memory space. An in-memory rate limiter in worker A has no knowledge of requests handled by worker B, effectively multiplying the client's allowed rate by the number of workers. Redis solves this by providing a shared counter store that all workers read from and write to atomically:

import os

app = Flask(__name__)

limiter = Limiter(
    get_remote_address,
    app=app,
    default_limits=["200 per day", "50 per hour"],
    storage_uri=os.environ.get("REDIS_URL", "redis://localhost:6379"),
    storage_options={"socket_connect_timeout": 30},
    strategy="fixed-window",
)

The storage_options dictionary passes keyword arguments directly to the Redis client. Setting socket_connect_timeout prevents your application from hanging indefinitely if Redis is unreachable. For Redis Cluster deployments, use the redis+cluster:// URI scheme with your cluster node addresses. Flask-limiter also supports Memcached (memcached://) and MongoDB (mongodb://) as alternative backends, but Redis is the standard choice for its atomic operations and built-in key expiration.

The reason atomicity matters here is subtle. Rate limiting is a check-then-act operation: read the current count, decide if the request is allowed, then increment. Without atomic operations, two concurrent requests can both read the same count, both pass, and both increment -- allowing a burst that exceeds the limit. Redis provides this atomicity through Lua scripting internally, so flask-limiter never has a race condition on the counter.

Warning

If Redis becomes unavailable, flask-limiter will raise exceptions by default, effectively rejecting all requests. This is the "fail closed" behavior -- secure but harsh. For a safer production configuration, see the in-memory fallback section below.

Choosing a Strategy: Fixed, Moving, or Sliding Window

flask-limiter supports three rate limiting strategies through the strategy parameter. Each represents a different tradeoff between precision, memory cost, and burst tolerance:

Strategy How It Works Boundary Bursts Memory Usage
fixed-window Counts requests in discrete time intervals. Counter resets at each boundary. Up to 2x rate at edges Lowest -- one counter per client
moving-window Tracks exact timestamps for every request. True sliding window. None -- exact enforcement Highest -- stores every timestamp
sliding-window-counter Approximates a sliding window using weighted fixed-window counters from the current and previous intervals. Rare off-by-one Low -- two counters per client

The boundary burst problem with fixed-window is worth understanding concretely. If you set a limit of 100 requests per minute and a client sends 100 requests in the last second of minute 1, the window resets, and they send another 100 in the first second of minute 2 -- that is 200 requests in two seconds. The limit was never technically exceeded in either window, but the server experienced a 200-request burst. For APIs where burst tolerance is acceptable, fixed-window is the right choice. For authentication endpoints where you cannot tolerate any boundary exploit, moving-window gives exact enforcement at the cost of storing every request timestamp in Redis.

The sliding-window-counter strategy offers a practical middle ground. It keeps only two counters per client (current window and previous window) and estimates the effective count using a weighted average based on how far into the current window the request falls. The result is nearly as accurate as moving-window with memory usage close to fixed-window.

Custom Key Functions and Dynamic Limits

The default get_remote_address key function identifies clients by IP. For authenticated APIs, rate limiting by user ID or API key is more accurate and prevents a shared office IP from being penalized for one user's behavior:

from flask import request, g


def get_api_key():
    """Identify clients by API key, falling back to IP."""
    api_key = request.headers.get("X-API-Key")
    if api_key:
        return f"apikey:{api_key}"
    return f"ip:{get_remote_address()}"


limiter = Limiter(
    get_api_key,
    app=app,
    default_limits=["100 per minute"],
    storage_uri="redis://localhost:6379",
)

The prefix pattern (apikey: vs ip:) in the key function is deliberate. It prevents a collision where an API key value happens to match an IP address string. The key function's return value becomes the Redis key suffix, and namespacing it by identity type ensures that a user authenticating with an API key and an unauthenticated user from the same IP maintain separate counters.

For tiered access where different user roles get different limits, you can pass a callable to @limiter.limit() that returns a limit string dynamically:

TIER_LIMITS = {
    "free": "20 per minute",
    "basic": "100 per minute",
    "premium": "500 per minute",
}


def get_user_limit():
    """Return rate limit string based on the user's subscription tier."""
    user = getattr(g, "current_user", None)
    if user and hasattr(user, "tier"):
        return TIER_LIMITS.get(user.tier, TIER_LIMITS["free"])
    return TIER_LIMITS["free"]


@app.route("/api/generate")
@limiter.limit(get_user_limit)
def generate():
    """Limit varies based on the authenticated user's plan."""
    return {"output": "..."}

The callable is evaluated on every request, so changes to a user's tier take effect immediately without restarting the application. This is a powerful pattern: your billing system updates the user's tier in your database, and the very next request picks up the new limit. No cache invalidation, no configuration reload, no deployment.

Cost Functions, Conditional Deduction, and Meta Limits

Not every request costs the same. A lightweight health check and a 50 MB file upload both count as one request under a flat rate limit, but they impose vastly different loads on your server. The cost parameter lets you weight requests so expensive operations consume more of the client's budget:

from flask import request


def upload_cost():
    """Charge 5 units for large uploads, 1 for everything else."""
    content_length = request.content_length or 0
    if content_length > 10 * 1024 * 1024:  # 10 MB
        return 5
    return 1


@app.route("/api/upload", methods=["POST"])
@limiter.limit("100 per minute", cost=upload_cost)
def upload():
    """Large uploads consume 5x the rate limit budget."""
    return {"status": "uploaded"}

With this configuration, a client has a budget of 100 units per minute. Small uploads cost 1 unit each, so a client making only small uploads can do 100 per minute. But a 15 MB upload costs 5 units, meaning a client uploading large files can only do 20 per minute. The limit is the same; the cost varies by what the request demands from your infrastructure.

Conditional deduction takes this further. Sometimes you only want to charge against the rate limit when a request succeeds, or when it fails -- not both. The deduct_when parameter accepts a callable that receives the Flask response object and returns a boolean:

@app.route("/api/process")
@limiter.limit(
    "10 per minute",
    deduct_when=lambda response: response.status_code == 200,
)
def process():
    """Only successful requests count against the limit."""
    return {"result": "..."}

This is particularly useful for endpoints where clients may need to retry due to validation errors. Penalizing a client for a 400 Bad Request that was caused by malformed input -- and then blocking them when they try to fix their request and resubmit -- creates a frustrating experience. With deduct_when, only the requests that consumed real server resources count.

Meta limits add a second layer of enforcement that operates above individual route limits. Think of them as a circuit breaker for abusive clients. A client who repeatedly hits different rate limits across your API -- maybe probing endpoints or running an automated scan -- will trigger the meta limit even if they never exceed any single route's threshold:

limiter = Limiter(
    get_remote_address,
    app=app,
    default_limits=["200 per day", "50 per hour"],
    meta_limits=["5 per minute"],
    storage_uri="redis://localhost:6379",
)

With this configuration, if a client breaches any rate limit five times within a minute, every subsequent request is rejected for the remainder of that minute -- across all endpoints. Meta limits catch the pattern of abuse rather than the individual violation.

Resilience: In-Memory Fallback and Failure Modes

Your rate limiter depends on Redis, and Redis can go down. When that happens, you face a choice: reject all traffic (fail closed) or allow all traffic (fail open). Neither is ideal. flask-limiter provides a third option: fall back to in-memory rate limiting with a separate set of limits while Redis is unavailable:

limiter = Limiter(
    get_remote_address,
    app=app,
    default_limits=["200 per day", "50 per hour"],
    storage_uri="redis://localhost:6379",
    in_memory_fallback=["100 per day", "25 per hour"],
    in_memory_fallback_enabled=True,
    swallow_errors=True,
)

The in_memory_fallback list specifies a tighter set of limits that apply when Redis is unreachable. Why tighter? Because in-memory limits are per-worker, not shared. Each Gunicorn worker enforces its own counters independently, so the effective rate is multiplied by the number of workers. Setting the fallback limits lower than the normal limits compensates for this multiplication.

The swallow_errors=True setting prevents storage exceptions from propagating to the client. Without it, a Redis connection failure raises an exception that returns a 500 error to every request. With it, the error is logged and the fallback kicks in. This approach gives you the best of both worlds: shared, accurate rate limiting when Redis is healthy, and degraded-but-functional protection when it is not.

Note

The in-memory fallback does not transfer state. When Redis goes down, every client's counter resets to zero in the local memory store. When Redis comes back, the counters revert to whatever Redis had -- which may also be stale. Design your fallback limits conservatively to account for this gap.

Behind a Reverse Proxy: Getting the Real Client IP

In production, your Flask app almost certainly sits behind a reverse proxy like Nginx, an AWS Application Load Balancer, or Cloudflare. The proxy terminates the client's connection and opens a new one to your application, so request.remote_addr returns the proxy's IP -- not the client's. Without addressing this, every client on the internet shares a single rate limit bucket.

The standard solution is to trust the X-Forwarded-For header, but this requires care. The header is a comma-separated list of IPs, and any client can spoof it by adding arbitrary values. You must configure Flask to trust only the proxies you control:

from werkzeug.middleware.proxy_fix import ProxyFix

app = Flask(__name__)
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1)

The x_for=1 parameter tells Werkzeug to trust one level of X-Forwarded-For, meaning it takes the last IP added by a trusted proxy. If you have two proxies in the chain (for example, Cloudflare in front of Nginx), set x_for=2. Setting x_for too high lets clients inject a fake IP by stuffing the header; setting it too low makes everyone appear to come from the inner proxy's IP.

After applying ProxyFix, get_remote_address() will return the correct client IP, and your rate limits will bucket correctly per client. This is a foundational step that must be done before any IP-based rate limiting is meaningful in a proxied environment.

Warning

Never trust X-Forwarded-For blindly. If your application is directly exposed to the internet without a proxy, a client can set the header to any value and bypass your rate limiter entirely. Only use ProxyFix when you control the proxy layer and can guarantee the header's integrity.

Error Handling and Rate Limit Headers

When a client exceeds the rate limit, flask-limiter returns a 429 response by default. You can customize the response body and ensure proper headers are included:

from flask import jsonify
import logging

logger = logging.getLogger(__name__)


@app.errorhandler(429)
def ratelimit_handler(e):
    """Return a JSON response with rate limit details."""
    logger.warning(
        "Rate limit exceeded: %s from %s",
        e.description,
        get_remote_address(),
    )
    return jsonify({
        "error": "Rate limit exceeded",
        "message": str(e.description),
        "retry_after": e.retry_after,
    }), 429

You can also handle breaches at the decorator level using the on_breach callback, which receives a RequestLimit object and can return a custom Response. This is useful when different routes need different 429 behavior -- for example, an API endpoint returning JSON versus a web page returning HTML:

from flask import make_response


def api_breach_handler(request_limit):
    """Custom breach response for API routes."""
    response = make_response(
        jsonify({
            "error": "Rate limit exceeded",
            "limit": str(request_limit.limit),
        }),
        429,
    )
    return response


@app.route("/api/expensive")
@limiter.limit("10 per minute", on_breach=api_breach_handler)
def expensive():
    return {"result": "..."}

To include rate limit headers on every response (not just 429s), enable the built-in header injection:

app.config["RATELIMIT_HEADERS_ENABLED"] = True

This adds X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers to every response, giving clients visibility into their current usage. Well-designed clients use these headers to self-throttle before they hit the 429 wall. This transforms rate limiting from a punitive mechanism into a collaborative protocol -- the server communicates its capacity, and the client adjusts its behavior accordingly.

You can also customize the header names if your API follows a different convention:

from flask_limiter import HEADERS

limiter = Limiter(
    get_remote_address,
    app=app,
    default_limits=["100 per minute"],
    storage_uri="redis://localhost:6379",
    header_name_mapping={
        HEADERS.LIMIT: "X-My-Limit",
        HEADERS.RESET: "X-My-Reset",
        HEADERS.REMAINING: "X-My-Remaining",
    },
)
Note

Logging rate limit breaches is essential for monitoring. A sudden spike in 429s for a single client might indicate a misconfigured integration. A spike across all clients might mean your limits are too aggressive for legitimate traffic patterns. Track these events in your monitoring system and set alerts for abnormal patterns.

Key Takeaways

  1. flask-limiter adds rate limiting with a single decorator: Set default_limits for global protection and @limiter.limit() for per-route overrides. The human-readable string syntax like "100 per minute" makes limits self-documenting.
  2. Use Redis for any multi-worker deployment: In-memory storage gives each Gunicorn worker its own counters, effectively multiplying the allowed rate. Redis provides shared state across all workers and servers with atomic counter operations.
  3. Choose your strategy based on precision needs: fixed-window is simplest with the least overhead. moving-window gives exact enforcement at the cost of memory. sliding-window-counter balances the two with weighted counters.
  4. Exempt health check and infrastructure endpoints: Use @limiter.exempt on endpoints called by load balancers and monitoring systems. Rate limiting them creates false downtime alerts.
  5. Dynamic limits enable tiered access: Pass a callable to @limiter.limit() that returns different limit strings based on the user's subscription tier. Changes take effect immediately with no restart needed.
  6. Weight expensive operations with cost functions: Use the cost parameter to charge heavy requests at a higher rate. Combine with deduct_when to only count requests that consume real resources.
  7. Layer meta limits for DoS defense: Meta limits catch clients who repeatedly breach individual limits across your API, acting as a circuit breaker for abusive behavior patterns.
  8. Configure in-memory fallback for Redis outages: Set in_memory_fallback with conservative limits and swallow_errors=True so your API degrades gracefully instead of failing completely.
  9. Handle reverse proxy IP extraction correctly: Use ProxyFix with the correct x_for depth. Getting this wrong means every client shares one bucket or clients can spoof their identity.
  10. Enable rate limit headers on every response: Set RATELIMIT_HEADERS_ENABLED = True so clients can see their remaining quota and self-regulate before hitting the limit.

Rate limiting is a foundational security control for any public-facing API, but it is also a design decision that communicates how you expect your API to be consumed. flask-limiter makes it straightforward to add to a Flask application, from a simple global limit for development to a multi-strategy, Redis-backed, tier-aware, cost-weighted, fallback-resilient configuration for production. The key is to think about your limits as part of your API contract -- they define the relationship between your server's capacity and your clients' expectations, and they should be designed with the same care as your endpoint schemas.