GraphQL gives clients the power to request exactly the data they need, but that flexibility comes with a hidden cost. Without careful handling, a single nested query can trigger dozens or even hundreds of redundant database calls. DataLoader is the standard solution to this problem, and Python has several mature implementations ready for production use.
In a typical REST API, each endpoint returns a fixed structure, so the number of database queries is predictable. GraphQL works differently. Because each field in a schema is resolved independently, nested relationships can multiply the number of backend requests in ways that are not immediately obvious. This article walks through the problem, explains how DataLoader addresses it, and shows working Python implementations across the major GraphQL frameworks.
What Is the N+1 Problem in GraphQL
The N+1 problem occurs when a query fetches a list of N items and then, for each item, makes an additional request to resolve a related field. Consider a schema with User and Post types where each post has an author. A query requesting all posts along with their authors would first run one query to fetch the posts (the "1"), then run a separate query for each post's author (the "N"). If there are 40 posts, that means 41 total database calls—even if many of those posts share the same author.
# A naive resolver that causes the N+1 problem
# This runs once per post, creating N separate queries
async def resolve_author(post, info):
return await db.fetch_user(post.author_id)
This happens because GraphQL resolves each field independently. The resolver for the author field has no awareness of whether the same user was already fetched by another resolver running in the same query. There is no built-in coordination between sibling resolvers, and no automatic batching.
The N+1 problem is not unique to GraphQL. ORM tools like Django's QuerySet and SQLAlchemy's relationship loading face the same issue. The difference is that GraphQL's resolver architecture makes N+1 queries particularly easy to introduce without realizing it.
How DataLoader Solves It
DataLoader is a utility that sits between your resolvers and your data source. It uses two techniques to reduce redundant queries: batching and caching.
Batching works by collecting all the individual .load(key) calls that happen during a single tick of the event loop, then passing all of those keys to a single batch function. Instead of 40 separate SELECT * FROM users WHERE id = ? queries, the batch function receives all 40 IDs at once and executes a single SELECT * FROM users WHERE id IN (?, ?, ...) query.
Caching ensures that within a single request, if the same key is requested more than once, the DataLoader returns the previously loaded value without hitting the database again. If posts 3, 7, and 15 all share the same author, that author is fetched exactly once.
# A batch loading function receives all keys at once
async def batch_load_users(keys):
# One query instead of N queries
users = await db.fetch_users_by_ids(keys)
# Build a lookup map
user_map = {user.id: user for user in users}
# Return results in the same order as the input keys
return [user_map.get(key) for key in keys]
A batch function must follow two strict rules. First, the returned list must be the same length as the input list of keys. Second, each position in the returned list must correspond to the same position in the input keys. If a key has no matching value, return None (or an exception) in that position rather than omitting it.
DataLoader With Strawberry GraphQL
Strawberry is a modern Python GraphQL library that uses type annotations to define schemas. It ships with a built-in DataLoader implementation, so there is no need to install a separate package. Strawberry requires Python 3.10 or later, and its current stable release on PyPI (as of March 2026) is version 0.308.x.
Here is a complete example that defines a User type, a batch loading function, and a resolver that uses the DataLoader through the request context:
from typing import List, Union, Any, Optional
import strawberry
from strawberry.asgi import GraphQL
from strawberry.dataloader import DataLoader
from starlette.requests import Request
from starlette.websockets import WebSocket
from starlette.responses import Response
@strawberry.type
class User:
id: strawberry.ID
name: str
# Simulated database lookup
USERS_DB = {
"1": User(id="1", name="Alice"),
"2": User(id="2", name="Bob"),
"3": User(id="3", name="Charlie"),
}
async def load_users(keys: List[str]) -> List[Optional[User]]:
"""Batch function: receives all keys, returns users in order."""
print(f"Batch loading users: {keys}")
return [USERS_DB.get(key) for key in keys]
class MyGraphQL(GraphQL):
async def get_context(
self,
request: Union[Request, WebSocket],
response: Optional[Response],
) -> Any:
return {"user_loader": DataLoader(load_fn=load_users)}
@strawberry.type
class Query:
@strawberry.field
async def get_user(self, info: strawberry.Info, id: strawberry.ID) -> Optional[User]:
return await info.context["user_loader"].load(id)
schema = strawberry.Schema(query=Query)
app = MyGraphQL(schema)
The key pattern here is creating a fresh DataLoader instance inside get_context. This means each incoming request gets its own loader with its own cache. If you create the DataLoader outside of the context (for example, at module level), its cache will persist across requests. That leads to stale data and potential security issues where one user's data leaks into another user's response.
Strawberry also supports load_many for fetching multiple keys at once: [user_a, user_b] = await loader.load_many([1, 2]). This is useful when a resolver already has a list of IDs and wants to batch them in a single call.
When the batch function encounters a key that does not exist, you can return an Exception object in that position instead of None. The DataLoader will then raise that exception when the caller awaits the result for that specific key, while other keys in the same batch resolve normally.
Custom Cache Implementations
Strawberry's DataLoader accepts a cache_map argument that lets you swap in a custom caching layer. The custom cache must implement the AbstractCache interface with get, set, delete, and clear methods. This is useful if you want to integrate with an external caching system like Redis or add TTL-based expiration:
from strawberry.dataloader import DataLoader, AbstractCache
from typing import Any, Union
class RedisCache(AbstractCache):
def __init__(self, redis_client):
self.redis = redis_client
def get(self, key: Any) -> Union[Any, None]:
return self.redis.get(f"user:{key}")
def set(self, key: Any, value: Any) -> None:
self.redis.set(f"user:{key}", value, ex=300)
def delete(self, key: Any) -> None:
self.redis.delete(f"user:{key}")
def clear(self) -> None:
# Clear all keys matching the pattern
for key in self.redis.scan_iter("user:*"):
self.redis.delete(key)
# Use the custom cache with the DataLoader
loader = DataLoader(load_fn=load_users, cache_map=RedisCache(redis_client))
DataLoader With Graphene and aiodataloader
Graphene is another popular Python GraphQL framework. Unlike Strawberry, Graphene does not ship with a built-in DataLoader. Instead, it relies on the aiodataloader package, which is a standalone asyncio-based port of the original JavaScript DataLoader library.
Install it with pip install aiodataloader, then create a loader by subclassing DataLoader and implementing the batch_load_fn method:
from aiodataloader import DataLoader
class UserLoader(DataLoader):
async def batch_load_fn(self, keys):
# Fetch all users in one query
users = await fetch_users_by_ids(keys)
user_map = {user.id: user for user in users}
return [user_map.get(key) for key in keys]
# Create a loader instance per request
user_loader = UserLoader()
# Use it in resolvers
user = await user_loader.load(42)
The aiodataloader library follows the same contract as the original JavaScript implementation. It collects all .load() calls that happen within a single tick of the asyncio event loop, then dispatches them together as a single batch. The library also supports cache_key_fn for custom key hashing, cache_map for external caching, and methods like .prime() for pre-populating the cache and .clear() for invalidating entries.
Cross-Priming Between Loaders
When an entity can be looked up by multiple fields (for example, by ID or by username), you can prime one loader's cache from the results of another. This prevents duplicate fetches across different access patterns:
async def user_by_id_batch_fn(ids):
users = await fetch_users_by_ids(ids)
for user in users:
# Prime the username loader's cache
username_loader.prime(user.username, user)
return users
user_by_id_loader = DataLoader(user_by_id_batch_fn)
async def username_batch_fn(names):
users = await fetch_users_by_usernames(names)
for user in users:
# Prime the ID loader's cache
user_by_id_loader.prime(user.id, user)
return users
username_loader = DataLoader(username_batch_fn)
Synchronous DataLoaders for Django
One challenge with standard DataLoader implementations is that they require an async execution environment. Many Django projects run synchronously, and wrapping everything in async/await is not always practical. The graphql-sync-dataloaders package solves this by providing a SyncDataLoader class that works in synchronous resolvers.
from typing import List
from graphql_sync_dataloaders import SyncDataLoader
from myapp.models import User
def load_users(keys: List[int]) -> List[User]:
qs = User.objects.filter(id__in=keys)
user_map = {user.id: user for user in qs}
return [user_map.get(key) for key in keys]
user_loader = SyncDataLoader(load_users)
This library works with both Graphene-Django and Strawberry. When using it with Strawberry, you pass DeferredExecutionContext as the execution_context_class argument to your schema:
import strawberry
from graphql_sync_dataloaders import DeferredExecutionContext
schema = strawberry.Schema(
query=Query,
execution_context_class=DeferredExecutionContext,
)
The SyncDataLoader.load method returns a SyncFuture object rather than an awaitable. This future collects all load requests within the same execution phase and dispatches them as a batch once the GraphQL executor moves to the next resolution level. You can chain results using .then(), similar to how JavaScript Promises work.
The graphql-sync-dataloaders package is especially useful for teams migrating a large Django codebase to GraphQL. It lets you adopt DataLoader patterns without converting your entire application to async.
Common Pitfalls and Best Practices
Create New Instances Per Request
The single most important rule with DataLoaders is to create a new instance for every incoming request. If a DataLoader instance is shared across requests, its cache will serve stale data and potentially leak information between users. The standard pattern is to attach loaders to the GraphQL context object, which is rebuilt for each request.
Maintain Key-Value Order
The batch function must return results in the exact same order as the input keys. A common mistake is to return query results directly from a database without re-ordering them. Databases do not guarantee that an IN query returns rows in the same order as the input list. Always build a lookup dictionary and map the keys back in order:
async def batch_load_fn(keys):
results = await db.fetch_many(keys)
# Build a map for O(1) lookup
result_map = {r.id: r for r in results}
# Return in key order, with None for missing keys
return [result_map.get(key) for key in keys]
Handle Missing Keys Gracefully
If a key does not exist in the data source, the batch function should return None in that position—not skip it. Skipping a key would misalign the results and cause data to be assigned to the wrong requestor. Alternatively, you can return an Exception instance in that position if you want the corresponding .load() call to raise an error.
Avoid Overly Broad Batch Functions
Each DataLoader instance should handle a single type of data access pattern. Do not try to build one loader that fetches users, posts, and comments. Instead, create separate loaders like user_loader, post_loader, and comment_loader. This keeps batch functions simple and maintainable.
DataLoader Is Not a Persistent Cache
A DataLoader's built-in cache is scoped to a single request. It is not a replacement for application-level caching with tools like Redis or Memcached. Think of it as request-level deduplication rather than long-term storage.
Use Django's django.db.connection.queries or SQLAlchemy's query logging to verify that your DataLoader is working. Before adding a loader, count the queries for a typical nested query. After adding it, the count should drop dramatically. If it does not, check that your batch function is receiving all expected keys.
Key Takeaways
- The N+1 problem is GraphQL's most common performance trap. Each field resolves independently, so nested queries can multiply database calls exponentially without any visible errors.
- DataLoader batches and caches within a single request. It collects all individual
.load()calls from one event loop tick and passes them to a single batch function, reducing N queries to one. - Strawberry includes a DataLoader out of the box. Import it from
strawberry.dataloaderand attach instances to your request context. It supports custom caches, error handling per key, andload_manyfor bulk operations. - Graphene uses aiodataloader. Subclass
DataLoader, implementbatch_load_fn, and use cross-priming to avoid duplicate lookups across different access patterns. - Synchronous Django projects have graphql-sync-dataloaders. It provides a
SyncDataLoaderclass that works without async, making it straightforward to adopt in existing Django codebases. - Always create new DataLoader instances per request. Shared instances leak cached data between users. The batch function must return results in key order with
Nonefor missing entries.
DataLoader is a small utility with a large impact on GraphQL performance. Whether you are building with Strawberry, Graphene, or a hybrid Django setup, integrating DataLoaders into your resolvers is one of the highest-value optimizations you can make. Start by identifying the resolvers that hit the database inside loops, wrap them in a batch function, and watch your query count drop.