How to Deploy a FastAPI Application With Docker and Uvicorn

Q: Do I need Gunicorn to deploy FastAPI in Docker?

Not necessarily. Uvicorn now supports the --workers flag to manage multiple worker processes and restart dead ones. For a single-container deployment, running Uvicorn with --workers is sufficient. Gunicorn adds more mature process management features like graceful restarts and max-requests recycling, which are useful on a single server without an orchestrator. If you are using Kubernetes or another container orchestrator that handles replication, a single Uvicorn process per container is the recommended approach.

Q: Why should I use a multi-stage Docker build for FastAPI?

A multi-stage build installs dependencies and build tools in a temporary stage, then copies only the installed packages and application code into a slim final image. This produces a much smaller image (often under 150MB instead of over 1GB) because build tools like gcc and development headers are discarded. Smaller images pull faster, use less storage, and have a smaller attack surface.

Q: Should I use --reload in production?

No. The --reload flag is for local development only. It watches for file changes and restarts the server, which adds overhead and is not designed for production workloads. In production, use --workers for concurrency and let your orchestrator or process manager handle restarts.

Q: How many Uvicorn workers should I run in a Docker container?

A common formula is (2 x CPU cores) + 1. For a container with 2 CPU cores, that means 5 workers. If you are running behind Kubernetes or another orchestrator that handles horizontal scaling, use a single worker per container and scale by adding more replicas instead.

Running uvicorn main:app --reload works perfectly for development. But that single-process dev server is not designed for production traffic. It has no process management, no graceful restarts, and no isolation from the host environment. Docker solves the isolation and portability problem. Uvicorn (with optional Gunicorn) solves the concurrency problem. Together, they give you a containerized FastAPI deployment that starts the same way on your laptop, in CI, and on a cloud server. This guide walks through writing a production-ready Dockerfile, configuring workers, handling environment variables, adding health checks, and running everything with Docker Compose.

Why Docker for FastAPI

Docker packages your application, its dependencies, and its runtime into a single container image. That image runs identically everywhere: on your development machine, in a CI/CD pipeline, and on a production server. There is no more "it works on my machine" because the machine is inside the container.

For FastAPI specifically, Docker handles the Python version, installed packages, system libraries, and server configuration in one reproducible unit. When you need to scale, you spin up more containers behind a load balancer rather than manually configuring additional servers.

A Minimal Production Dockerfile

The official FastAPI documentation now recommends building your own Dockerfile from scratch rather than using pre-built base images. The deprecated tiangolo/uvicorn-gunicorn-fastapi image is no longer needed because Uvicorn now supports managing multiple workers natively with the --workers flag.

FROM python:3.12-slim

WORKDIR /code

# Copy dependency file first for Docker layer caching
COPY ./requirements.txt /code/requirements.txt

# Install dependencies
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt

# Copy application code
COPY ./app /code/app

# Run with multiple workers
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

There is an important optimization in this Dockerfile: the requirements.txt is copied and installed before the application code. Docker builds images in layers, and each layer is cached. When you change your application code but not your dependencies, Docker reuses the cached dependency layer and only rebuilds the code copy step. This makes subsequent builds significantly faster.

Note

Always use the exec form of CMD (with brackets) instead of the shell form. The exec form ensures that Uvicorn receives signals like SIGTERM directly, which allows FastAPI's lifespan events to run during shutdown. The shell form wraps the command in /bin/sh -c, which swallows signals and causes containers to take 10 seconds to stop instead of shutting down gracefully.

Multi-Stage Builds for Smaller Images

A multi-stage build installs dependencies in a temporary stage that includes build tools, then copies only the installed packages into a slim final image. This produces significantly smaller images.

# Stage 1: Build dependencies
FROM python:3.12-slim AS builder

WORKDIR /code

RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc libpq-dev \
    && rm -rf /var/lib/apt/lists/*

RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade -r requirements.txt

# Stage 2: Production image
FROM python:3.12-slim

WORKDIR /code

# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Install only runtime libraries (not build tools)
RUN apt-get update && apt-get install -y --no-install-recommends \
    libpq5 \
    && rm -rf /var/lib/apt/lists/*

COPY ./app /code/app

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

The build tools (gcc, libpq-dev) are only needed to compile packages like psycopg2. They do not exist in the final image, which keeps it small and reduces the attack surface. A typical multi-stage FastAPI image comes in under 150MB compared to over 1GB for a naive build.

Uvicorn Workers vs Gunicorn + Uvicorn

There are two approaches for running multiple worker processes, and the right choice depends on your deployment environment.

Option 1: Uvicorn with --workers (recommended for Kubernetes)

uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4

Uvicorn now supports spawning and managing multiple worker processes natively. It restarts workers that die and distributes requests across them. This is the approach recommended by the FastAPI documentation for containerized deployments where the orchestrator (Kubernetes, ECS, Cloud Run) handles horizontal scaling. Run one Uvicorn process per container, scale by adding replicas.

Option 2: Gunicorn with UvicornWorker (single-server deployments)

gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Gunicorn provides more mature process management features: graceful restarts, max_requests for recycling workers to prevent memory leaks, and configurable timeouts. This is useful when running on a single server without an orchestrator.

Approach	Best For	Worker Formula
`uvicorn --workers`	Kubernetes, ECS, Cloud Run (orchestrator scales containers)	1 worker per container, scale with replicas
`gunicorn -k uvicorn.workers.UvicornWorker`	Single server, VM, or bare metal without orchestrator	(2 x CPU cores) + 1

Common Mistake

Never use --reload in production. It is a development feature that watches for file changes and restarts the server. It adds overhead, is not designed for concurrent traffic, and can cause unexpected behavior in containerized environments.

Handling Environment Variables

Secrets and configuration values should never be hard-coded in your application or Dockerfile. Pass them as environment variables at runtime.

docker run -d \
  --name fastapi-app \
  -p 8000:8000 \
  -e DATABASE_URL="postgresql://user:pass@db:5432/mydb" \
  -e SECRET_KEY="your-secret-key" \
  -e LOG_LEVEL="info" \
  fastapi-app:latest

Inside your application, read these values using Pydantic Settings:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    DATABASE_URL: str
    SECRET_KEY: str
    LOG_LEVEL: str = "info"

    model_config = {"env_file": ".env"}

settings = Settings()

During development, values come from a .env file. In production, they come from environment variables set by your orchestrator, CI/CD pipeline, or Docker run command. The application code is identical in both environments.

Adding a .dockerignore File

A .dockerignore file tells Docker which files to exclude when copying your project into the image. This keeps the image clean, small, and free of sensitive files.

__pycache__
*.pyc
.git
.gitignore
.env
.venv
venv
tests/
*.md
.dockerignore
Dockerfile

Without a .dockerignore, Docker copies everything in the build context into the image, including your .git directory, virtual environment, test files, and potentially your .env file with production secrets. Always create this file.

Health Check Endpoints

A health check endpoint lets Docker, Kubernetes, and load balancers verify that your service is running and ready to accept requests.

@app.get("/health")
def health_check():
    return {"status": "healthy"}

In your Dockerfile, add a HEALTHCHECK instruction so Docker monitors the container:

HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

If the health check fails three consecutive times, Docker marks the container as unhealthy. Orchestrators like Docker Swarm or Kubernetes use this signal to restart the container or route traffic away from it.

Pro Tip

If your container uses a minimal base image without curl, install it in the Dockerfile or use a Python-based health check instead: CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')".

Running With Docker Compose

Docker Compose lets you define your FastAPI application alongside services like PostgreSQL and Redis in a single file. This is useful for local development and staging environments.

# docker-compose.yml
services:
  web:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://user:pass@db:5432/mydb
      - SECRET_KEY=dev-secret-key
    depends_on:
      - db
    restart: unless-stopped

  db:
    image: postgres:16-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=mydb
    expose:
      - "5432"

volumes:
  postgres_data:

Run it with:

docker compose up --build

The depends_on directive ensures PostgreSQL starts before the web container. The volumes section persists database data across container restarts. The restart: unless-stopped policy automatically restarts the container if it crashes.

Frequently Asked Questions

Do I need Gunicorn to deploy FastAPI in Docker?

Not necessarily. Uvicorn now supports the --workers flag natively, which spawns multiple processes and restarts dead ones. For Kubernetes or other orchestrators that manage replication at the cluster level, a single Uvicorn process per container is the recommended approach. Gunicorn adds value on single-server deployments where you need features like max_requests for worker recycling and more granular timeout controls.

Why should I use a multi-stage Docker build for FastAPI?

A multi-stage build installs build tools (like gcc) in a temporary stage and copies only the compiled dependencies into the final image. This produces much smaller images—often under 150MB instead of over 1GB. Smaller images pull faster, consume less storage, and have a reduced attack surface because build tools are not present in the production container.

Should I use --reload in production?

No. The --reload flag is for local development. It watches the filesystem for changes and restarts the server, which adds overhead, is not thread-safe under load, and can cause unexpected behavior in containerized deployments. Use --workers for concurrency in production.

How many Uvicorn workers should I run in a Docker container?

A common formula is (2 x CPU cores) + 1. For a container allocated 2 CPU cores, that means 5 workers. If you are using Kubernetes or another orchestrator, the recommended pattern is 1 worker per container and scaling horizontally by increasing the replica count. This gives you cleaner resource isolation and more predictable scaling behavior.

Key Takeaways

Build your own Dockerfile: The pre-built tiangolo/uvicorn-gunicorn-fastapi image is deprecated. Write a Dockerfile from scratch using python:3.12-slim and uvicorn --workers. It is just as simple and gives you full control.
Use layer caching: Copy requirements.txt and install dependencies before copying application code. This ensures Docker reuses the dependency layer when only your code changes.
Use multi-stage builds for production: Install build tools in a temporary stage and copy only the compiled packages into the final slim image. This keeps images small and secure.
Choose workers based on your infrastructure: Use uvicorn --workers for orchestrated environments (Kubernetes, ECS). Use gunicorn -k uvicorn.workers.UvicornWorker for single-server deployments that need mature process management.
Never hard-code secrets: Pass configuration through environment variables. Use Pydantic Settings to read them with type validation. Keep .env files out of images with .dockerignore.

Deploying FastAPI with Docker and Uvicorn comes down to a handful of well-understood steps: write a Dockerfile that caches dependency layers, configure workers for your deployment environment, pass secrets through environment variables, add a health check, and use Docker Compose or an orchestrator to manage the infrastructure. The patterns in this guide work whether you are deploying to a single VPS or a Kubernetes cluster. Start simple, and add complexity only when your traffic demands it.