Running uvicorn main:app --reload works perfectly for development. But that single-process dev server is not designed for production traffic. It has no process management, no graceful restarts, and no isolation from the host environment. Docker solves the isolation and portability problem. Uvicorn (with optional Gunicorn) solves the concurrency problem. Together, they give you a containerized FastAPI deployment that starts the same way on your laptop, in CI, and on a cloud server. This guide walks through writing a production-ready Dockerfile, configuring workers, handling environment variables, adding health checks, and running everything with Docker Compose.
Why Docker for FastAPI
Docker packages your application, its dependencies, and its runtime into a single container image. That image runs identically everywhere: on your development machine, in a CI/CD pipeline, and on a production server. There is no more "it works on my machine" because the machine is inside the container.
For FastAPI specifically, Docker handles the Python version, installed packages, system libraries, and server configuration in one reproducible unit. When you need to scale, you spin up more containers behind a load balancer rather than manually configuring additional servers.
A Minimal Production Dockerfile
The official FastAPI documentation now recommends building your own Dockerfile from scratch rather than using pre-built base images. The deprecated tiangolo/uvicorn-gunicorn-fastapi image is no longer needed because Uvicorn now supports managing multiple workers natively with the --workers flag.
FROM python:3.12-slim
WORKDIR /code
# Copy dependency file first for Docker layer caching
COPY ./requirements.txt /code/requirements.txt
# Install dependencies
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
# Copy application code
COPY ./app /code/app
# Run with multiple workers
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
There is an important optimization in this Dockerfile: the requirements.txt is copied and installed before the application code. Docker builds images in layers, and each layer is cached. When you change your application code but not your dependencies, Docker reuses the cached dependency layer and only rebuilds the code copy step. This makes subsequent builds significantly faster.
Always use the exec form of CMD (with brackets) instead of the shell form. The exec form ensures that Uvicorn receives signals like SIGTERM directly, which allows FastAPI's lifespan events to run during shutdown. The shell form wraps the command in /bin/sh -c, which swallows signals and causes containers to take 10 seconds to stop instead of shutting down gracefully.
Multi-Stage Builds for Smaller Images
A multi-stage build installs dependencies in a temporary stage that includes build tools, then copies only the installed packages into a slim final image. This produces significantly smaller images.
# Stage 1: Build dependencies
FROM python:3.12-slim AS builder
WORKDIR /code
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc libpq-dev \
&& rm -rf /var/lib/apt/lists/*
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade -r requirements.txt
# Stage 2: Production image
FROM python:3.12-slim
WORKDIR /code
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install only runtime libraries (not build tools)
RUN apt-get update && apt-get install -y --no-install-recommends \
libpq5 \
&& rm -rf /var/lib/apt/lists/*
COPY ./app /code/app
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
The build tools (gcc, libpq-dev) are only needed to compile packages like psycopg2. They do not exist in the final image, which keeps it small and reduces the attack surface. A typical multi-stage FastAPI image comes in under 150MB compared to over 1GB for a naive build.
Uvicorn Workers vs Gunicorn + Uvicorn
There are two approaches for running multiple worker processes, and the right choice depends on your deployment environment.
Option 1: Uvicorn with --workers (recommended for Kubernetes)
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
Uvicorn now supports spawning and managing multiple worker processes natively. It restarts workers that die and distributes requests across them. This is the approach recommended by the FastAPI documentation for containerized deployments where the orchestrator (Kubernetes, ECS, Cloud Run) handles horizontal scaling. Run one Uvicorn process per container, scale by adding replicas.
Option 2: Gunicorn with UvicornWorker (single-server deployments)
gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
Gunicorn provides more mature process management features: graceful restarts, max_requests for recycling workers to prevent memory leaks, and configurable timeouts. This is useful when running on a single server without an orchestrator.
| Approach | Best For | Worker Formula |
|---|---|---|
uvicorn --workers |
Kubernetes, ECS, Cloud Run (orchestrator scales containers) | 1 worker per container, scale with replicas |
gunicorn -k uvicorn.workers.UvicornWorker |
Single server, VM, or bare metal without orchestrator | (2 x CPU cores) + 1 |
Never use --reload in production. It is a development feature that watches for file changes and restarts the server. It adds overhead, is not designed for concurrent traffic, and can cause unexpected behavior in containerized environments.
Handling Environment Variables
Secrets and configuration values should never be hard-coded in your application or Dockerfile. Pass them as environment variables at runtime.
docker run -d \
--name fastapi-app \
-p 8000:8000 \
-e DATABASE_URL="postgresql://user:pass@db:5432/mydb" \
-e SECRET_KEY="your-secret-key" \
-e LOG_LEVEL="info" \
fastapi-app:latest
Inside your application, read these values using Pydantic Settings:
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
DATABASE_URL: str
SECRET_KEY: str
LOG_LEVEL: str = "info"
model_config = {"env_file": ".env"}
settings = Settings()
During development, values come from a .env file. In production, they come from environment variables set by your orchestrator, CI/CD pipeline, or Docker run command. The application code is identical in both environments.
Adding a .dockerignore File
A .dockerignore file tells Docker which files to exclude when copying your project into the image. This keeps the image clean, small, and free of sensitive files.
__pycache__
*.pyc
.git
.gitignore
.env
.venv
venv
tests/
*.md
.dockerignore
Dockerfile
Without a .dockerignore, Docker copies everything in the build context into the image, including your .git directory, virtual environment, test files, and potentially your .env file with production secrets. Always create this file.
Health Check Endpoints
A health check endpoint lets Docker, Kubernetes, and load balancers verify that your service is running and ready to accept requests.
@app.get("/health")
def health_check():
return {"status": "healthy"}
In your Dockerfile, add a HEALTHCHECK instruction so Docker monitors the container:
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
If the health check fails three consecutive times, Docker marks the container as unhealthy. Orchestrators like Docker Swarm or Kubernetes use this signal to restart the container or route traffic away from it.
If your container uses a minimal base image without curl, install it in the Dockerfile or use a Python-based health check instead: CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')".
Running With Docker Compose
Docker Compose lets you define your FastAPI application alongside services like PostgreSQL and Redis in a single file. This is useful for local development and staging environments.
# docker-compose.yml
services:
web:
build: .
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/mydb
- SECRET_KEY=dev-secret-key
depends_on:
- db
restart: unless-stopped
db:
image: postgres:16-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=mydb
expose:
- "5432"
volumes:
postgres_data:
Run it with:
docker compose up --build
The depends_on directive ensures PostgreSQL starts before the web container. The volumes section persists database data across container restarts. The restart: unless-stopped policy automatically restarts the container if it crashes.
Frequently Asked Questions
Do I need Gunicorn to deploy FastAPI in Docker?
Not necessarily. Uvicorn now supports the --workers flag natively, which spawns multiple processes and restarts dead ones. For Kubernetes or other orchestrators that manage replication at the cluster level, a single Uvicorn process per container is the recommended approach. Gunicorn adds value on single-server deployments where you need features like max_requests for worker recycling and more granular timeout controls.
Why should I use a multi-stage Docker build for FastAPI?
A multi-stage build installs build tools (like gcc) in a temporary stage and copies only the compiled dependencies into the final image. This produces much smaller images—often under 150MB instead of over 1GB. Smaller images pull faster, consume less storage, and have a reduced attack surface because build tools are not present in the production container.
Should I use --reload in production?
No. The --reload flag is for local development. It watches the filesystem for changes and restarts the server, which adds overhead, is not thread-safe under load, and can cause unexpected behavior in containerized deployments. Use --workers for concurrency in production.
How many Uvicorn workers should I run in a Docker container?
A common formula is (2 x CPU cores) + 1. For a container allocated 2 CPU cores, that means 5 workers. If you are using Kubernetes or another orchestrator, the recommended pattern is 1 worker per container and scaling horizontally by increasing the replica count. This gives you cleaner resource isolation and more predictable scaling behavior.
Key Takeaways
- Build your own Dockerfile: The pre-built
tiangolo/uvicorn-gunicorn-fastapiimage is deprecated. Write a Dockerfile from scratch usingpython:3.12-slimanduvicorn --workers. It is just as simple and gives you full control. - Use layer caching: Copy
requirements.txtand install dependencies before copying application code. This ensures Docker reuses the dependency layer when only your code changes. - Use multi-stage builds for production: Install build tools in a temporary stage and copy only the compiled packages into the final slim image. This keeps images small and secure.
- Choose workers based on your infrastructure: Use
uvicorn --workersfor orchestrated environments (Kubernetes, ECS). Usegunicorn -k uvicorn.workers.UvicornWorkerfor single-server deployments that need mature process management. - Never hard-code secrets: Pass configuration through environment variables. Use Pydantic Settings to read them with type validation. Keep
.envfiles out of images with.dockerignore.
Deploying FastAPI with Docker and Uvicorn comes down to a handful of well-understood steps: write a Dockerfile that caches dependency layers, configure workers for your deployment environment, pass secrets through environment variables, add a health check, and use Docker Compose or an orchestrator to manage the infrastructure. The patterns in this guide work whether you are deploying to a single VPS or a Kubernetes cluster. Start simple, and add complexity only when your traffic demands it.