Python's Descriptor Protocol: The Hidden Mechanism That Powers Almost Everything

Every time you use @property, call a method on an object, or use @staticmethod, you are relying on Python's descriptor protocol. It is one of the most powerful and least understood features baked into the language, and it has been quietly running the show since Python 2.2.

If you want to truly understand how Python works under the hood -- not just what it does, but why it does it -- you need to understand descriptors. This article will take you from zero to a working understanding of the full descriptor protocol. We will look at the official method signatures, the critical distinction between data and non-data descriptors, how Python's attribute lookup chain actually works, and build real-world examples that go far beyond toy demonstrations. Everything is connected back to the relevant PEPs, the official documentation, and the words of the people who designed this system.

What Is a Descriptor?

The Python documentation, in the Descriptor HowTo Guide authored by Raymond Hettinger, defines the concept with precision:

"Descriptors are a powerful, general purpose protocol. They are the mechanism behind properties, methods, static methods, class methods, and super(). They are used throughout Python itself. Descriptors simplify the underlying C code and offer a flexible set of new tools for everyday Python programs." — Descriptor HowTo Guide, Raymond Hettinger (docs.python.org/3/howto/descriptor.html)

At its core, a descriptor is any object that defines at least one of the following methods:

__get__(self, obj, objtype=None)   # Called on attribute access
__set__(self, obj, value)          # Called on attribute assignment
__delete__(self, obj)              # Called on attribute deletion
__set_name__(self, owner, name)    # Called at class creation time (Python 3.6+)

That is the entire protocol. If an object defines any of those first three methods and is stored as a class variable, Python will intercept normal attribute access and call the descriptor's method instead. The descriptor does not work when stored on an instance -- it must live on the class itself.

The Origin Story: PEP 252 and Python 2.2

Descriptors were not an afterthought. They were introduced as part of a fundamental redesign of Python's type system in Python 2.2, formalized in PEP 252 -- Making Types Look More Like Classes, authored by Guido van Rossum. Before Python 2.2, there was a hard distinction between built-in types (like dict and list) and user-defined classes. You could not subclass built-in types. PEP 252 and its companion PEP 253 unified them, and descriptors were the mechanism that made it possible.

As PEP 252 explains: "This PEP also introduces a new approach to specifying attributes, using attribute descriptors, or descriptors for short. Descriptors unify and generalize several different common mechanisms used for describing attributes: a descriptor can describe a method, a typed field in the object structure, or a generalized attribute represented by getter and setter functions."

The Python 2.2 "What's New" documentation summarized the design philosophy this way: "The one big idea underlying the new class model is that an API for describing the attributes of an object using descriptors has been formalized." This was not a minor enhancement. It was the architectural foundation that made modern Python's object model possible.

Data Descriptors vs. Non-Data Descriptors

Not all descriptors are created equal. The distinction between data descriptors and non-data descriptors is one of the most important concepts in Python's attribute resolution system, and getting it wrong leads to subtle, hard-to-debug issues.

The official Descriptor HowTo Guide states the rule clearly: "If an object defines __set__() or __delete__(), it is considered a data descriptor. Descriptors that only define __get__() are called non-data descriptors."

This distinction matters because it changes the precedence in Python's attribute lookup chain:

Data descriptors take precedence over instance dictionaries.
Instance __dict__ entries take precedence over non-data descriptors.
Non-data descriptors can be overridden by instance attributes.

This is why property objects (which are data descriptors, defining both __get__ and __set__) cannot be shadowed by setting an instance attribute, while regular methods (which are non-data descriptors, defining only __get__) can be overridden on a per-instance basis. Let's see this in action:

class NonDataDescriptor:
    """Only defines __get__, so it's a non-data descriptor."""
    def __get__(self, obj, objtype=None):
        return "from the descriptor"

class DataDescriptor:
    """Defines __get__ and __set__, so it's a data descriptor."""
    def __get__(self, obj, objtype=None):
        return "from the descriptor"
    
    def __set__(self, obj, value):
        print(f"DataDescriptor.__set__ called with {value!r}")

class MyClass:
    non_data = NonDataDescriptor()
    data = DataDescriptor()

obj = MyClass()

# Non-data descriptor: instance dict can override it
obj.__dict__["non_data"] = "from the instance"
print(obj.non_data)  # "from the instance" -- instance wins

# Data descriptor: instance dict CANNOT override it
obj.__dict__["data"] = "from the instance"
print(obj.data)      # "from the descriptor" -- descriptor wins

Note

This is not a quirk. It is a deliberate design decision that allows Python methods to be overridable on individual instances (useful for monkey-patching and testing), while properties and other managed attributes remain firmly in control of the class.

How Python's Attribute Lookup Actually Works

When you write obj.x, Python does not just look up x in a dictionary. The attribute lookup is orchestrated by the __getattribute__ method on the object's type, and it follows a specific chain. The Descriptor HowTo Guide documents the process: "The mechanism for descriptors is embedded in the __getattribute__() methods for object, type, and super()."

Here is the lookup order for obj.x where obj is an instance of MyClass:

Python checks type(obj).__mro__ for a data descriptor named x. If found, its __get__ is called and that result is returned.
Python checks obj.__dict__ for an entry named x. If found, that value is returned directly.
Python checks type(obj).__mro__ for a non-data descriptor or a plain class attribute named x. If a non-data descriptor is found, its __get__ is called. If a plain value is found, it is returned.
If nothing was found, AttributeError is raised (or __getattr__ is called, if defined).

This hierarchy is the reason data descriptors cannot be shadowed by instance attributes -- they win at Step 1, before the instance dictionary is ever consulted. For attribute assignment (obj.x = value), the logic is different. If x resolves to a data descriptor on the class, then the descriptor's __set__ method is called. Otherwise, the value is stored directly in obj.__dict__.

How `property` Works Under the Hood

The @property decorator is the most commonly used descriptor in Python, yet many developers do not realize it is a descriptor at all. The official documentation includes a pure-Python equivalent that reveals the internals. Here is a condensed version:

class Property:
    """Simplified pure-Python equivalent of the built-in property."""
    
    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc
    
    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)
    
    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)
    
    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

Notice that Property defines __get__, __set__, and __delete__, making it a full data descriptor. When you write obj.name on a class that uses @property, Python's __getattribute__ finds the Property instance on the class, sees that it is a data descriptor, and calls Property.__get__(obj, type(obj)) instead of looking in the instance dictionary.

Pro Tip

The check if obj is None: return self is an important convention. It means that accessing the property through the class itself (e.g., MyClass.name) returns the descriptor object rather than calling the getter. This is how frameworks introspect property objects.

How Methods Become Bound: Functions Are Descriptors

Here is a fact that surprises many Python developers: every ordinary function object is a non-data descriptor. Functions implement __get__, and that is how Python turns a plain function into a bound method.

class Greeter:
    def hello(self):
        return f"Hello from {self!r}"

# Access the function directly from the class dict
print(type(Greeter.__dict__["hello"]))  # <class 'function'>

# Access it through an instance -- __get__ is called
obj = Greeter()
print(type(obj.hello))  # <class 'method'>

When you write obj.hello, Python finds hello in Greeter.__dict__, sees it is a descriptor (it has __get__), and calls hello.__get__(obj, Greeter). That call returns a bound method object that has self already baked in. The function's __get__ method is doing the binding.

This is the same mechanism that staticmethod and classmethod exploit. A staticmethod descriptor's __get__ simply returns the raw function without binding. A classmethod descriptor's __get__ returns a bound method where the first argument is the class instead of the instance.

class StaticMethodDescriptor:
    """Simplified equivalent of built-in staticmethod."""
    def __init__(self, func):
        self.func = func
    
    def __get__(self, obj, objtype=None):
        return self.func  # No binding at all

class ClassMethodDescriptor:
    """Simplified equivalent of built-in classmethod."""
    def __init__(self, func):
        self.func = func
    
    def __get__(self, obj, objtype=None):
        if objtype is None:
            objtype = type(obj)
        def wrapper(*args, **kwargs):
            return self.func(objtype, *args, **kwargs)
        return wrapper

All three -- regular methods, static methods, and class methods -- are different behaviors produced by different descriptors implementing __get__ in different ways. The descriptor protocol is the single unifying mechanism.

`__set_name__`: The Python 3.6 Addition (PEP 487)

One of the longstanding frustrations with writing descriptors was that the descriptor object had no way of knowing the name of the attribute it was assigned to. You had to pass the name manually, which was clunky and error-prone.

PEP 487 -- Simpler customisation of class creation, authored by Martin Teichmann and implemented in Python 3.6, solved this by adding the __set_name__ method to the descriptor protocol. As the Python 3.6 "What's New" documentation states: "PEP 487 extends the descriptor protocol to include the new optional __set_name__() method. Whenever a new class is defined, the new method will be called on all descriptors included in the definition, providing them with a reference to the class being defined and the name given to the descriptor within the class namespace."

This means you can now write self-aware descriptors:

class Validated:
    """A descriptor that validates values before storing them."""
    
    def __set_name__(self, owner, name):
        self.public_name = name
        self.private_name = f"_{name}"
    
    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        return getattr(obj, self.private_name, None)
    
    def __set__(self, obj, value):
        self.validate(value)
        setattr(obj, self.private_name, value)
    
    def validate(self, value):
        pass  # Subclasses override this


class PositiveNumber(Validated):
    def validate(self, value):
        if not isinstance(value, (int, float)):
            raise TypeError(f"{self.public_name!r} must be a number")
        if value <= 0:
            raise ValueError(f"{self.public_name!r} must be positive")


class Product:
    price = PositiveNumber()
    weight = PositiveNumber()
    
    def __init__(self, name, price, weight):
        self.name = name
        self.price = price
        self.weight = weight

When Python processes the Product class definition, it automatically calls PositiveNumber.__set_name__(Product, "price") and PositiveNumber.__set_name__(Product, "weight"). Each descriptor now knows its own attribute name and can use it for storage and error messages without any manual configuration.

laptop = Product("Laptop", 999.99, 2.5)
print(laptop.price)   # 999.99
print(laptop.weight)  # 2.5

try:
    laptop.price = -50
except ValueError as e:
    print(e)  # 'price' must be positive

try:
    laptop.weight = "heavy"
except TypeError as e:
    print(e)  # 'weight' must be a number

Before PEP 487, achieving this required metaclasses, which PEP 487 itself acknowledges as a barrier: "Understanding Python's metaclasses requires a deep understanding of the type system and the class construction process. This is legitimately seen as challenging."

Building a Cached Property Descriptor

One of the most practical uses of descriptors is implementing cached (or lazy) properties, where an expensive computation runs once and its result is stored for future access. Python 3.8 added functools.cached_property to the standard library for exactly this purpose, but building one yourself is an excellent way to cement your understanding of the protocol.

class CachedProperty:
    """A non-data descriptor that caches the result in the instance dict."""
    
    def __init__(self, func):
        self.func = func
        self.attrname = None
        self.__doc__ = func.__doc__
    
    def __set_name__(self, owner, name):
        self.attrname = name
    
    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        
        # Check the instance dict first
        # (Since we're a non-data descriptor, the instance dict
        # would normally win -- but on first access, there's no
        # entry yet, so we get called.)
        value = self.func(obj)
        
        # Store the computed value directly in the instance dict.
        # On subsequent accesses, the instance dict entry takes
        # precedence over this non-data descriptor, so __get__
        # is never called again.
        obj.__dict__[self.attrname] = value
        return value


class DataAnalyzer:
    def __init__(self, raw_data):
        self.raw_data = raw_data
    
    @CachedProperty
    def processed(self):
        """Expensive computation that should only run once."""
        print("Processing data...")
        return [x ** 2 for x in self.raw_data]

analyzer = DataAnalyzer([1, 2, 3, 4, 5])
print(analyzer.processed)  # "Processing data..." then [1, 4, 9, 16, 25]
print(analyzer.processed)  # [1, 4, 9, 16, 25] -- no "Processing" message

This works because CachedProperty is a non-data descriptor (it only defines __get__, not __set__). The first time analyzer.processed is accessed, there is no "processed" key in analyzer.__dict__, so the descriptor's __get__ runs. It computes the value and stores it directly in the instance dictionary. On every subsequent access, the instance dictionary entry is found at Step 2 of the lookup chain and the descriptor is never consulted again.

Pro Tip

This is a beautiful example of how the data vs. non-data distinction is not just theoretical -- it enables elegant, zero-overhead caching patterns. The non-data descriptor essentially removes itself from future lookups by writing to the instance dict on first access.

The descriptor protocol has been refined through several PEPs over Python's history.

PEP 252 (Python 2.2) -- The original formalization of descriptors as part of the type/class unification, authored by Guido van Rossum. This PEP established the __get__, __set__, and __delete__ methods and defined the precedence rules for data vs. non-data descriptors.

PEP 487 (Python 3.6) -- Added __set_name__ to the descriptor protocol and __init_subclass__ to the class creation process, authored by Martin Teichmann. This eliminated the most common reason for writing custom metaclasses and made descriptors dramatically more practical by giving them automatic access to their own attribute names.

PEP 544 (Python 3.8) -- While not directly about descriptors, PEP 544 introduced structural subtyping (Protocols) to the type system. The PEP explicitly references the descriptor protocol as one of Python's foundational protocols, alongside the iterator protocol and the context manager protocol, noting that it uses the same duck-typing philosophy.

PEP 549 (Deferred) -- Proposed extending the descriptor protocol to work on instance attributes, not just class attributes. The primary motivation was enabling @property to work directly on modules (which are instances of types.ModuleType). While this PEP was not accepted, it demonstrates ongoing interest in expanding where descriptors can operate.

The Classic Pitfall: Shared State Between Instances

There is one critical detail that trips up many developers writing their first descriptor: since a descriptor is instantiated once per class (not once per instance), storing state on the descriptor itself means all instances share that state.

class BrokenDescriptor:
    """DO NOT DO THIS -- state is shared across all instances."""
    def __get__(self, obj, objtype=None):
        return self._value
    
    def __set__(self, obj, value):
        self._value = value  # Stored on the descriptor, not the instance!

class Widget:
    size = BrokenDescriptor()

a = Widget()
b = Widget()
a.size = 10
print(b.size)  # 10 -- Oops! b sees a's value.

Warning

This is arguably the single most common mistake when writing custom descriptors. As the Real Python tutorial on descriptors notes: "Python descriptors are instantiated just once per class. That means that every single instance of a class containing a descriptor shares that descriptor instance." Always store per-instance state on the instance, not on the descriptor.

The fix is to store values in the instance's dictionary, using the instance (obj) parameter that __get__ and __set__ receive:

class CorrectDescriptor:
    def __set_name__(self, owner, name):
        self.name = f"_{name}"
    
    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        return getattr(obj, self.name, None)
    
    def __set__(self, obj, value):
        setattr(obj, self.name, value)

class Widget:
    size = CorrectDescriptor()

a = Widget()
b = Widget()
a.size = 10
b.size = 20
print(a.size)  # 10
print(b.size)  # 20 -- Correct! Each instance has its own value.

When Should You Write a Descriptor?

Descriptors are powerful, but they are not always the right tool. Here are the situations where a custom descriptor genuinely earns its keep:

Reusable validation logic. If you find yourself writing the same @property getter/setter pattern on multiple attributes or across multiple classes, a descriptor lets you write that logic once and apply it everywhere. The PositiveNumber example above is far cleaner than duplicating @property definitions for every numeric field.
ORM-style field definitions. Frameworks like Django and SQLAlchemy use descriptors extensively. When you write name = CharField(max_length=100) in a Django model, you are creating a descriptor that manages how that attribute is stored, retrieved, and validated.
Lazy computation and caching. As shown with CachedProperty, descriptors provide an elegant way to defer expensive calculations until they are actually needed, and then cache the result transparently.
Audit logging and access control. If you need to track every read or write to a specific attribute, a descriptor can intercept those operations without the calling code knowing anything about it.

For simple cases where you just need a getter and setter on a single class, @property is the right choice -- and now you know it is just a convenient wrapper around the descriptor protocol.

Key Takeaways

Four methods, one protocol. A descriptor is any object stored as a class variable that defines __get__, __set__, __delete__, or __set_name__. You only need to implement the methods relevant to your use case.
Data vs. non-data is the critical distinction. Defining __set__ or __delete__ makes a descriptor a data descriptor, which takes precedence over instance dictionaries. Non-data descriptors (only __get__) can be overridden by instance attributes -- a difference that powers both property enforcement and the cached property pattern.
Everything is a descriptor. Functions, property, staticmethod, classmethod, and super() are all built on this same four-method protocol. Understanding descriptors is understanding Python's object model at its root.
Never store per-instance state on the descriptor itself. A descriptor is shared across all instances of a class. Always store instance-level data in the instance's own __dict__, using the obj parameter passed to __get__ and __set__.
__set_name__ changed everything. PEP 487's addition in Python 3.6 made descriptors self-aware, eliminating the need for metaclasses in many common patterns and making descriptor-based frameworks dramatically simpler to write.

Python's descriptor protocol is the invisible infrastructure that powers attribute access across the entire language. The protocol was formalized in PEP 252 as part of the Python 2.2 type/class unification, extended with __set_name__ in PEP 487 for Python 3.6, and continues to be the foundation for new features and third-party frameworks alike. As Raymond Hettinger wrote in the official Descriptor HowTo Guide: "The protocol is simple and offers exciting possibilities." The protocol is simple -- four methods, a clear precedence chain, and one distinction between data and non-data. But once you understand it, you have a window into how Python actually works, not just at the surface, but all the way down.