Python's @dataclass decorator, introduced in Python 3.7 via PEP 557, eliminates the repetitive work of writing __init__, __repr__, and __eq__ for every class you create to hold data. You annotate your fields, apply the decorator, and Python generates the rest — with enough configuration options to cover frozen instances, memory-efficient slots, keyword-only construction, init-only variables (InitVar), and post-initialization validation.
One pattern shows up constantly across every Python codebase: classes that exist primarily to hold and pass around structured data. User records, API responses, configuration objects, domain events — all of them need an __init__ to accept arguments, a __repr__ so they print usefully during debugging, and an __eq__ so two instances with the same field values compare as equal. Before Python 3.7, writing all of that by hand for every data-holding class was the price of admission. These days, if you are working through python tutorials or building production systems, @dataclass is frequently the right answer — provided you understand what it generates and what it does not.
The Problem @dataclass Solves#
Consider a class that stores basic information about a software package. Without @dataclass, a correct and useful implementation looks like this:
class Package:
def __init__(self, name: str, version: str, size_kb: float):
self.name = name
self.version = version
self.size_kb = size_kb
def __repr__(self):
return (
f"Package(name={self.name!r}, "
f"version={self.version!r}, "
f"size_kb={self.size_kb!r})"
)
def __eq__(self, other):
if not isinstance(other, Package):
return NotImplemented
return (
self.name == other.name
and self.version == other.version
and self.size_kb == other.size_kb
)
Every field is written three times: once in __init__'s signature, once in the assignment, and once in both __repr__ and __eq__. Now add a fourth field. Or a fifth. Or refactor a field name. Every change cascades through three separate method bodies, and it is easy to miss one.
With @dataclass, the same class reduces to this:
from dataclasses import dataclass
@dataclass
class Package:
name: str
version: str
size_kb: float
That is the entire class. Python reads the PEP 526 type annotations, identifies them as fields, and generates __init__, __repr__, and __eq__ automatically. The decorator returns the original class — no new class is created, no metaclass is involved, and nothing prevents you from inheriting from it, adding your own methods, or using it as you would any other Python class.
Type annotations in dataclasses are required for field discovery but are not enforced at runtime. Passing size_kb="big" will not raise a TypeError from the dataclass machinery. If you need runtime type coercion or validation, see the section on __post_init__ or the comparison with Pydantic later in this article.
How the Decorator Works Under the Hood#
When Python executes the @dataclass line, it inspects the class's __annotations__ dictionary — the ordered mapping that PEP 526 guarantees is populated in declaration order. Every annotated name that is not a ClassVar or InitVar is treated as a field. The decorator then constructs method source code dynamically using those field names and types, compiles it, and attaches the resulting functions to the class.
The generated __init__ accepts all fields as parameters, in the order they are declared. Fields with default values or field(default=...) become optional parameters and must appear after fields without defaults — the same constraint that applies to regular function signatures. Fields annotated as ClassVar[T] are excluded entirely; they remain class-level attributes and are never treated as instance fields.
That last point is worth making concrete, because ClassVar is easy to misread as just a type hint rather than a signal to the decorator. Here is what the distinction looks like in practice:
from dataclasses import dataclass
from typing import ClassVar
@dataclass
class Sensor:
# ClassVar — shared across all instances, excluded from __init__ and __repr__
unit_system: ClassVar[str] = "metric"
# Normal instance fields — included in generated methods
name: str
reading: float
s1 = Sensor("temp_01", 21.5)
s2 = Sensor("pressure_01", 101.3)
print(s1) # Sensor(name='temp_01', reading=21.5) — unit_system absent
print(s1 == s2) # False — only instance fields are compared
Sensor.unit_system = "imperial"
print(s2.unit_system) # imperial — class-level change affects all instances
Annotating a field as ClassVar tells both @dataclass and static type checkers like mypy that the attribute belongs to the class, not the instance. The decorator skips it entirely — it will not appear in __init__, __repr__, or __eq__. If you annotate a class counter, a shared default configuration, or a registry dictionary and forget to use ClassVar, the decorator will pull it into the constructor signature, which is rarely what you want.
"Data Classes can be thought of as mutable namedtuples with defaults." — PEP 557, Eric V. Smith (Python Software Foundation, peps.python.org)
The generated __repr__ produces output in the form ClassName(field1=value1, field2=value2, ...), which makes instances immediately readable in a REPL or log output. The generated __eq__ compares two instances of the same type by comparing their fields as an ordered tuple — if either operand is a different type, it returns NotImplemented rather than False, which is correct Python equality protocol.
Understanding the fundamentals of Python classes and objects makes the dataclass decorator's behavior much easier to reason about, since everything it produces is equivalent to code you could write yourself.
Decorator Parameters: The Full Signature#
Used without parentheses, @dataclass applies all defaults. With parentheses, you can adjust which methods get generated and how the class behaves. As of Python 3.11, the complete signature is:
@dataclass(
init=True,
repr=True,
eq=True,
order=False,
unsafe_hash=False,
frozen=False,
match_args=True, # Python 3.10+
kw_only=False, # Python 3.10+
slots=False, # Python 3.10+
weakref_slot=False # Python 3.11+
)
class MyClass:
...
The most consequential parameters are these:
| Parameter | Default | What It Controls |
|---|---|---|
frozen |
False |
Adds __setattr__ and __delattr__ that raise FrozenInstanceError, emulating immutability. Also sets __hash__ based on all fields. |
order |
False |
Generates __lt__, __le__, __gt__, __ge__ by comparing fields as a tuple in declaration order. Requires eq=True. |
eq |
True |
Generates __eq__. When frozen=False and unsafe_hash=False, this also sets __hash__ to None, making instances unhashable by default. |
slots |
False (3.10+) |
Creates a new class with __slots__ set to the field names. Reduces per-instance memory and speeds up attribute access. |
kw_only |
False (3.10+) |
Marks every field as keyword-only in the generated __init__, preventing positional argument ambiguity. |
unsafe_hash |
False |
Forces generation of __hash__ even when eq=True and frozen=False. Use with care: mutating a hashed instance leads to undefined behavior in sets and dicts. |
The order=True parameter deserves a practical example, because its behavior — comparing fields as a tuple in declaration order — is exactly what makes it useful for sortable value objects and exactly what makes it dangerous if your field order does not match your intended sort semantics:
from dataclasses import dataclass
@dataclass(order=True)
class Version:
major: int
minor: int
patch: int
versions = [Version(1, 10, 0), Version(2, 0, 0), Version(1, 9, 3)]
print(sorted(versions))
# [Version(major=1, minor=9, patch=3),
# Version(major=1, minor=10, patch=0),
# Version(major=2, minor=0, patch=0)]
print(Version(1, 9, 3) < Version(1, 10, 0)) # True
print(Version(2, 0, 0) > Version(1, 99, 99)) # True
Comparison works by evaluating (self.major, self.minor, self.patch) < (other.major, other.minor, other.patch) — a straightforward tuple comparison. If you later reorder the fields (say, to put patch first for display reasons), the sort order silently changes with them. For anything more nuanced than field-tuple ordering, define __lt__ manually and leave order=False.
By default, @dataclass sets eq=True, which generates __eq__. Python's data model rule is that if a class defines __eq__, its __hash__ is set to None unless you explicitly provide one — making instances unhashable. This means you cannot use a plain mutable dataclass as a dictionary key or put it in a set:
@dataclass
class Tag:
name: str
t = Tag("python")
{t} # TypeError: unhashable type: 'Tag'
# Fix 1: frozen=True generates a correct __hash__ automatically
@dataclass(frozen=True)
class Tag:
name: str
# Fix 2: unsafe_hash=True forces __hash__ generation on a mutable class
# Use only if you can guarantee instances are not mutated while in a set or dict
@dataclass(unsafe_hash=True)
class Tag:
name: str
The frozen=True route is the right answer for value objects. unsafe_hash=True exists for rare cases where you need hashability but cannot make the class immutable — the name "unsafe" is deliberate.
A common pattern for value objects — things like money amounts, coordinates, or identifiers — is @dataclass(frozen=True). Frozen instances are hashable, safe to use as dictionary keys, and clearly communicate that two instances with the same fields are semantically identical and should not be mutated after creation.
from dataclasses import dataclass
@dataclass(frozen=True)
class Money:
amount: int # in cents
currency: str = "USD"
price = Money(1999, "USD")
price.amount = 2099 # raises FrozenInstanceError
# Usable as a dict key because __hash__ is generated
ledger = {price: "product_001"}
field() — Fine-Grained Control Over Each Attribute#
The field() function, imported alongside dataclass, lets you configure individual fields when the class-level annotation syntax is not enough. Its most important parameters are default, default_factory, repr, compare, hash, init, and metadata.
The single most common use of field() is providing mutable default values. Because Python evaluates default argument values once at function definition time, sharing a mutable default like [] across instances creates the classic shared-list bug. The default_factory parameter solves this correctly:
from dataclasses import dataclass, field
@dataclass
class ShoppingCart:
owner: str
items: list[str] = field(default_factory=list)
# Each instance gets its own fresh list — not a shared one
cart1 = ShoppingCart("alice")
cart2 = ShoppingCart("bob")
cart1.items.append("keyboard")
print(cart2.items) # [] — unaffected
Assigning a mutable default value directly — for example, items: list[str] = [] — raises a ValueError at class definition time. Python's dataclass machinery detects this and refuses to proceed. Use field(default_factory=list) instead.
Other useful field() options include:
repr=False— excludes the field from the generated__repr__. Useful for sensitive data like passwords or tokens.compare=False— excludes the field from__eq__and any ordering methods. Useful for metadata fields likecreated_attimestamps that should not affect logical equality.init=False— excludes the field from__init__entirely. The field must be set in__post_init__or given adefault/default_factory.metadata— an arbitrary mapping you can attach to the field for inspection by third-party tools, serialization libraries, or your own introspection code viadataclasses.fields().
The developer below is building a dataclass to model an order in a shopping system. The code runs without an immediate crash in a naive implementation, but it contains a classic Python gotcha that will cause data to bleed between instances. Click on the line you think is the problem.
from dataclasses import dataclass
@dataclass
class Order:
customer: str
items: list[str] = []
total_usd: float = 0.0
o1 = Order("alice")
o2 = Order("bob")
o1.items.append("keyboard")
print(o2.items) # Expected: [] --- What does this actually print?
@dataclass decorator itself is not at fault here. It is doing exactly what it should. The problem lies in how a default value is specified for one of the fields. Take another look at the field declarations.items: list[str] = [] — means every instance shares the same list object. When you append to o1.items, you are modifying the single list that both o1 and o2 point to, so print(o2.items) outputs ['keyboard'] instead of []. The fix is items: list[str] = field(default_factory=list), which calls list() fresh for each new instance. In practice, Python's @dataclass machinery will raise a ValueError at class definition time if it detects a bare mutable default — so this particular version of the bug gets caught early. But the underlying concept of shared mutable state is one of Python's most enduring pitfalls and the reason default_factory exists.o1.items.append("keyboard") is perfectly valid Python. The problem is not this line — it is that by the time this line runs, o1.items and o2.items are already pointing to the exact same list object in memory. Trace the problem back to where the default value was declared on the class.__post_init__: Validation and Derived Fields#
The generated __init__ calls self.__post_init__() after all fields are set, if the method exists. This is where you add validation logic, normalize inputs, or compute derived fields that depend on the constructor arguments.
from dataclasses import dataclass, field
@dataclass
class BoundingBox:
x_min: float
y_min: float
x_max: float
y_max: float
area: float = field(init=False, repr=False)
def __post_init__(self):
if self.x_max <= self.x_min or self.y_max <= self.y_min:
raise ValueError(
f"Invalid bounding box: ({self.x_min},{self.y_min}) "
f"to ({self.x_max},{self.y_max})"
)
# Derived field: computed once at construction time
self.area = (self.x_max - self.x_min) * (self.y_max - self.y_min)
box = BoundingBox(0.0, 0.0, 100.0, 50.0)
print(box.area) # 5000.0
print(box) # BoundingBox(x_min=0.0, y_min=0.0, x_max=100.0, y_max=50.0)
Note that area is declared with field(init=False, repr=False): it is excluded from the constructor signature (it cannot be passed in) and from the repr (it is derivable from the other fields, so including it would be noise). Its value is then set in __post_init__.
For frozen=True dataclasses, __post_init__ must use object.__setattr__(self, 'field_name', value) to set fields, since the normal assignment syntax will raise FrozenInstanceError — even inside __post_init__ itself.
Keep __post_init__ fast and focused. It runs on every instantiation, so expensive operations — database lookups, network calls, file I/O — belong in factory classmethods, not here. For more on how @classmethod works with factory patterns, see the dedicated article in this series.
InitVar: Parameters That Only Exist at Initialization Time
There is a related feature that fills a gap __post_init__ alone cannot: InitVar[T]. When you annotate a field with InitVar[T], the generated __init__ includes that name as a parameter, passes it to __post_init__ as an argument, but does not store it as an instance attribute. This is the correct pattern for passing constructor arguments that are only needed during initialization — database connections, configuration objects, raw input that gets transformed — without leaking those values as persistent state on the instance.
import hashlib
from dataclasses import dataclass, field, InitVar
@dataclass
class HashedPassword:
username: str
# raw_password is passed to __init__ but never stored on the instance
raw_password: InitVar[str]
password_hash: str = field(init=False, repr=False)
def __post_init__(self, raw_password: str):
# raw_password arrives here as an argument, not self.raw_password
self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()
user = HashedPassword("alice", "hunter2")
print(user.username) # alice
print(user.password_hash) # sha256 hex string
# user.raw_password → AttributeError — it was never stored
This pattern keeps sensitive constructor inputs from persisting on the object and appearing in __repr__ output or serialization. It is documented in the Python standard library reference for init-only variables (docs.python.org) and is substantially underused compared to __post_init__ alone.
dataclasses.replace() works by calling __init__ with the current field values plus your overrides. This means fields marked init=False are not carried over automatically — they are recomputed by __post_init__ on the new instance. If your init=False field is expensive to compute or has side effects, be aware that replace() triggers that computation again. The official documentation notes this behavior explicitly: "It is expected that init=False fields will be rarely and judiciously used" (Python docs, dataclasses.replace).
Python 3.10+ Features: slots, kw_only, and match_args#
Python 3.10 delivered the two most-requested dataclass features: slots support and keyword-only fields. These bring standard-library dataclasses closer to what the third-party attrs library had offered for years.
slots=True
Normal Python objects store instance attributes in a per-instance dictionary (__dict__). That dictionary has overhead — both in memory and in attribute lookup speed. When you set slots=True, the decorator generates a new class with __slots__ defined as the field names, eliminating __dict__ entirely. Published benchmarks — including those from Real Python's dataclass guide (realpython.com) and independent profiling on Towards Data Science — put the attribute-access speedup somewhere in the 20–55% range depending on the benchmark setup, hardware, and Python version. The memory savings are equally concrete: a three-field slot class measured by Real Python using Pympler consumed 248 bytes per instance versus 440 bytes for its non-slot equivalent — a reduction of roughly 44%. These gains compound across large numbers of instances, which is where slots=True pays the biggest dividend.
from dataclasses import dataclass
@dataclass(slots=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
# p.__dict__ raises AttributeError — no instance dict
# p.z = 3.0 would also raise AttributeError
There is an important implementation detail here: because __slots__ must be defined at class creation time and @dataclass runs after the class body is parsed, the decorator cannot modify the original class in place. Instead it creates a new class with the computed __slots__ and returns that. The original class is discarded. This means the class returned by @dataclass(slots=True) is technically a different object than the one that appeared in the source, though from user code it is indistinguishable.
kw_only=True
With many fields, positional argument order becomes a maintenance liability. If you add a field in the middle of a dataclass, every call site that uses positional arguments breaks. Setting kw_only=True at the class level forces every field to be passed by keyword:
from dataclasses import dataclass
@dataclass(kw_only=True)
class ServerConfig:
host: str
port: int
timeout_seconds: float = 30.0
max_connections: int = 100
# Must use keyword arguments — positional args raise TypeError
cfg = ServerConfig(host="0.0.0.0", port=8080)
# ServerConfig(host='0.0.0.0', port=8080, timeout_seconds=30.0, max_connections=100)
You can also selectively mark individual fields as keyword-only using the KW_ONLY sentinel. Any field declared after a _: KW_ONLY annotation becomes keyword-only, while fields before it remain positional:
from dataclasses import dataclass, KW_ONLY
@dataclass
class LogEntry:
message: str # positional
level: str # positional
_: KW_ONLY
timestamp: float # keyword-only
source: str = "" # keyword-only
entry = LogEntry("disk full", "ERROR", timestamp=1711929600.0)
The kw_only feature also elegantly resolves one of the most awkward inheritance problems in dataclasses: when a base class has fields with defaults and a subclass needs to add fields without defaults, the generated __init__ would be invalid under Python's signature rules (non-default parameters cannot follow default parameters). Making all fields keyword-only removes that constraint entirely.
match_args and Structural Pattern Matching
Also added in Python 3.10, match_args=True (the default) generates a __match_args__ tuple containing the field names in declaration order. This enables clean structural pattern matching against dataclass instances:
from dataclasses import dataclass
@dataclass
class Event:
kind: str
payload: dict
def handle(event: Event):
match event:
case Event(kind="login", payload={"user_id": uid}):
print(f"User {uid} logged in")
case Event(kind="logout"):
print("User logged out")
case _:
print(f"Unhandled event: {event.kind}")
Inheritance with Dataclasses#
Dataclasses participate in normal Python inheritance. When the decorator processes a class, it first collects fields from all dataclass base classes in reverse MRO order, then appends the current class's own fields. The combined field list determines the generated method signatures.
The most important constraint: a field with a default value in a base class means all subsequent fields — including those in subclasses — must also have defaults. Without kw_only=True, this makes it impossible to add required fields in a subclass of a dataclass that already has optional fields. The keyword-only approach resolves this. For a broader look at when to choose composition over inheritance in Python, that article covers the tradeoffs that apply here as much as anywhere else in OOP design.
from dataclasses import dataclass
@dataclass(kw_only=True)
class Animal:
name: str
species: str = "unknown"
@dataclass(kw_only=True)
class Pet(Animal):
owner: str # required, no default — valid because kw_only
vaccinated: bool = False
pet = Pet(name="Mochi", owner="dana", species="cat")
print(pet)
# Pet(name='Mochi', species='cat', owner='dana', vaccinated=False)
You can also use dataclasses.replace() to create a shallow copy of a dataclass instance with selected fields overridden, without mutation:
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class Config:
host: str
port: int
debug: bool = False
base = Config("localhost", 5432)
test = replace(base, host="testdb", debug=True)
# Config(host='testdb', port=5432, debug=True)
# base is unchanged
@dataclass vs Alternatives: Choosing the Right Tool#
Before reaching for @dataclass, it helps to understand where it fits relative to the other structured-data options in Python's ecosystem.
| Option | Runtime Validation | Standard Library | Mutable | Hashable by Default | Best For |
|---|---|---|---|---|---|
@dataclass |
No (annotations only) | Yes (3.7+) | Yes (default) | No (unless frozen) | Internal domain models, DTOs, value objects with frozen=True |
typing.NamedTuple |
No | Yes | No | Yes | Immutable records that must unpack like a tuple or work in positional APIs |
TypedDict |
No | Yes (3.8+) | Yes | N/A (it is a dict) | Dict-shaped data meant for JSON passthrough or typed **kwargs; no methods, no __init__ |
Pydantic BaseModel |
Yes — coerces and validates every field | No (third-party) | Configurable | No (unless frozen=True) |
API request/response bodies, config files, CLI input — any boundary where data types cannot be trusted |
Pydantic dataclasses |
Yes — same validation engine as BaseModel |
No (third-party) | Yes (default) | No (unless frozen) | Drop-in for stdlib @dataclass when you want runtime validation without migrating to BaseModel's full API surface |
| attrs | Via @attr.s(validator=...) or attrs.validators |
No (third-party) | Configurable | Configurable | Complex validation and conversion pipelines; predates and inspired the stdlib @dataclass |
msgspec.Struct |
Yes — fast, zero-copy JSON/MessagePack decoding with type enforcement | No (third-party) | No (immutable by default) | Yes | High-throughput serialization paths where Pydantic's flexibility is more than needed and raw speed matters |
types.SimpleNamespace |
No | Yes | Yes | No | Throwaway attribute containers in tests or short scripts where a full class definition is overkill |
The surface-level framing — "use @dataclass for internal data, use Pydantic for external" — is accurate but undersells several real decision points.
The first is the distinction between TypedDict and @dataclass when interoperability with plain dicts matters. A TypedDict is not a class you instantiate — it is a structural type annotation over a regular Python dict. That means you cannot add methods to it, it produces no __repr__, and it carries no runtime overhead. If your data needs to round-trip through json.dumps(), merge with other dicts, or get passed as **kwargs, TypedDict stays out of the way. A dataclass in that same path requires you to call dataclasses.asdict() first. Choosing between them is not just about "which is cleaner" — it is about whether you want an object or a dict at runtime.
The second underappreciated distinction is between typing.NamedTuple and @dataclass(frozen=True). Both give you immutable, hashable, annotated data containers, and on paper they look interchangeable. In practice, NamedTuple instances also unpack positionally like tuples — x, y = point works, isinstance(point, tuple) returns True, and indexing by position (point[0]) is valid. That makes NamedTuple the right choice when your code hands the object to something that expects a sequence, including csv.writer, sqlite3 row values, or any C-level API that iterates positionally. Frozen dataclasses are not tuples and do not unpack that way.
The third involves Pydantic's two modes: BaseModel and pydantic.dataclasses.dataclass. Pydantic ships a drop-in replacement for @dataclass that runs the same validation engine as BaseModel but keeps the standard dataclass API surface — meaning dataclasses.fields(), dataclasses.asdict(), and other stdlib introspection tools still work. This is useful when you are migrating an existing codebase that uses stdlib dataclasses and want to add validation without rewriting every class to inherit from BaseModel.
Finally, msgspec.Struct is worth knowing for high-throughput serialization work. It enforces types at runtime during JSON and MessagePack decoding, similar to Pydantic, but with significantly lower overhead per object — it achieves this by generating C-level struct layouts rather than Python attribute dictionaries. In scenarios where thousands of objects are decoded per second from a message queue or API stream, the difference is measurable. It is not a general-purpose replacement for @dataclass or Pydantic, but it occupies a real niche that neither covers well.
This distinction between internal and external data is also central to thinking about Python's type hints and dynamic typing more broadly.
The dataclasses module also provides several utility functions worth knowing:
dataclasses.asdict(instance)— recursively converts a dataclass instance to a plain dictionary. Nested dataclasses are converted too.dataclasses.astuple(instance)— recursively converts to a tuple.dataclasses.fields(class_or_instance)— returns a tuple ofFieldobjects, each carrying the field's name, type, default, metadata, and flags.dataclasses.is_dataclass(obj)— returnsTrueifobjis a dataclass class or an instance of one.dataclasses.replace(instance, **changes)— returns a new instance with the specified fields replaced (respectsfrozen=Truesince it calls__init__rather than mutating).
These are not just convenience aliases — they are the practical interface for working with dataclasses programmatically. asdict() is particularly useful for Python object serialization, because it handles nested dataclasses recursively without any extra work:
import json
from dataclasses import dataclass, field, asdict, astuple, fields
@dataclass
class Address:
street: str
city: str
@dataclass
class Employee:
name: str
address: Address
skills: list = field(default_factory=list, metadata={"source": "hr_system"})
emp = Employee("Dana", Address("1 Main St", "Springfield"), ["Python", "SQL"])
# asdict() recurses into nested dataclasses
print(json.dumps(asdict(emp), indent=2))
# {
# "name": "Dana",
# "address": {"street": "1 Main St", "city": "Springfield"},
# "skills": ["Python", "SQL"]
# }
# astuple() also recurses — useful for CSV row generation
print(astuple(emp))
# ('Dana', ('1 Main St', 'Springfield'), ['Python', 'SQL'])
# fields() gives you programmatic access to field-level configuration
for f in fields(emp):
print(f.name, f.metadata)
# name {}
# address {}
# skills {'source': 'hr_system'}
The metadata parameter on field() — shown here on skills — is an immutable mapping you can use to attach arbitrary annotations to a field without affecting behavior. Serialization libraries, schema generators, and documentation tools that build on top of dataclasses frequently use this to store things like alias names, validation rules, or UI labels. You read it back via dataclasses.fields(), which returns Field objects that expose the metadata mapping directly. If you find yourself inspecting field metadata programmatically, the class-level __dataclass_fields__ attribute — a dictionary mapping field names to their Field objects — is the same data accessible without the function call, and you will encounter it frequently when debugging or writing tools that consume dataclass definitions.
If you are working with Python attributes in depth, the fields() function is particularly useful: it gives you programmatic access to every field's configuration, which powers serialization libraries, schema generators, and documentation tools that build on top of dataclasses.
Frequently Asked Questions#
What does the @dataclass decorator actually do?
The @dataclass decorator, introduced in Python 3.7 via PEP 557, inspects a class's type-annotated fields and automatically generates __init__, __repr__, and __eq__ methods. With additional parameters it can also generate ordering methods (__lt__, __le__, etc.), enforce immutability (frozen=True), eliminate the per-instance __dict__ for memory efficiency (slots=True), and require keyword-only construction (kw_only=True).
Why can't I use a list or dict as a default value in a dataclass?
Assigning a mutable object like [] or {} directly as a field default raises a ValueError at class definition time. The @dataclass machinery detects this because sharing a single mutable default across all instances would cause the classic shared-state bug. The correct fix is field(default_factory=list) or field(default_factory=dict), which calls the factory function once per new instance to produce an independent copy.
What is the difference between frozen=True and slots=True?
frozen=True emulates immutability by generating __setattr__ and __delattr__ methods that raise FrozenInstanceError on any assignment after construction; it also generates a __hash__ method so the instance can be used as a dictionary key. slots=True is a memory and performance optimization that replaces the per-instance __dict__ with a fixed-size slot array, reducing memory consumption and speeding up attribute access — but does not affect mutability. The two can be combined: @dataclass(frozen=True, slots=True) gives you both a hashable, immutable-style value object and the memory efficiency of slots.
What is InitVar in Python dataclasses?
InitVar[T] is a special annotation that instructs @dataclass to include a parameter in the generated __init__ signature but not store it as an instance attribute. Instead, the value is passed as a positional argument to __post_init__, where you can use it for validation, transformation, or computing derived fields without the raw input remaining accessible on the constructed object afterward.
When should I use Pydantic instead of @dataclass?
Use @dataclass for internal data structures where callers are trusted to pass the correct types — domain models, DTOs, configuration objects shared within your own codebase. Use Pydantic (BaseModel) when data crosses a trust boundary: HTTP request bodies, JSON config files, command-line input, or any situation where types must be coerced and validated at runtime rather than assumed. PEP 557 itself notes that dataclasses are not intended as a replacement for libraries that provide runtime type validation.
Does @dataclass enforce type annotations at runtime?
No. The @dataclass decorator reads type annotations to discover field names and generate method signatures, but it does not enforce those types at runtime. Passing a string where an int is annotated will not raise any error from the dataclass machinery. For runtime type coercion and validation, use __post_init__ with explicit checks, or switch to Pydantic, which validates and coerces every field at construction time.
Why is my dataclass unhashable?
The @dataclass decorator sets eq=True by default, which generates __eq__. Python's data model rule is that defining __eq__ causes __hash__ to be set to None unless a __hash__ is explicitly provided. This makes plain mutable dataclass instances unhashable — you cannot use them as dictionary keys or put them in sets. The fix is to use frozen=True, which generates a correct __hash__ automatically, or unsafe_hash=True if you need hashability on a mutable class and can guarantee the instance will not be mutated while stored in a set or dict.
What is ClassVar in a Python dataclass?
ClassVar[T] is a typing annotation that tells @dataclass to treat an annotated name as a class-level attribute rather than an instance field. The decorator skips ClassVar fields entirely — they do not appear in the generated __init__, __repr__, or __eq__. Without ClassVar, any annotated name in the class body is treated as an instance field and pulled into the constructor signature. Use ClassVar for shared class-level state like counters, registries, or default configuration objects.
Key Takeaways#
- @dataclass inspects annotations, not types: Every name in
__annotations__that is not aClassVarorInitVarbecomes a field. The types are used for generated method signatures and tooling, not for runtime enforcement. - Use field(default_factory=...) for mutable defaults: Assigning a mutable object directly as a field default raises a
ValueError. Thedefault_factoryparameter ensures each instance gets an independent copy. - Use InitVar[T] for constructor-only parameters: When a value is needed during initialization but should not be stored as an instance attribute — raw passwords, temporary database handles, untransformed input — annotate it as
InitVar[T]rather than relying on naming conventions or manual deletion in__post_init__. - frozen=True gives you hashable value objects: Frozen dataclasses emulate immutability by raising
FrozenInstanceErroron assignment, and automatically generate a correct__hash__based on all fields. Combine withslots=Truefor maximum efficiency. - slots=True and kw_only=True are Python 3.10+ quality-of-life upgrades: Slots reduce memory overhead and speed up attribute access by replacing the per-instance
__dict__with a fixed-size slot array; keyword-only fields eliminate positional-argument ordering problems, particularly in inheritance hierarchies. - @dataclass is not a validation layer: For data crossing trust boundaries, pair dataclasses with
__post_init__validation or switch to Pydantic, which coerces and validates every field at runtime. PEP 557 itself explicitly states that dataclasses are not intended as a replacement for libraries requiring runtime type validation (peps.python.org/pep-0557). - Use ClassVar to exclude class-level attributes from generated methods: Annotating a field with
ClassVar[T]signals to both@dataclassand static type checkers that the attribute belongs to the class, not the instance. Without it, the decorator will pull shared class attributes into the constructor signature. - eq=True makes instances unhashable by default: When
@dataclassgenerates__eq__, Python sets__hash__toNoneunless you also usefrozen=Trueorunsafe_hash=True. Plain mutable dataclass instances cannot be used as dictionary keys or set members until you address this explicitly.
The @dataclass decorator is one of the clearest examples of a decorator doing meaningful work: a single annotation transforms a plain class definition into a fully equipped data container with generated initialization, representation, equality, optional ordering, and optionally immutability and memory efficiency — all without inheritance, metaclasses, or repetitive boilerplate. The design goal, as stated directly in PEP 557, was to support static type checkers while staying out of the business of runtime validation — which is why understanding the boundaries of @dataclass (and where InitVar, __post_init__, and Pydantic each pick up the slack) matters as much as knowing the decorator's parameters. Understanding it well is also a useful lens for understanding what Python decorators can and cannot do, making it a natural companion to the broader study of how Python decorators work from first principles.