Shallow Copy vs Deep Copy in Python: What Actually Happens in Memory

The difference between shallow copy and deep copy is one of those topics that seems simple on the surface but causes real bugs in production code. Many explanations stop at "shallow copy copies references, deep copy copies everything." That is technically correct, but it does not help you understand why your nested dictionary just mutated in two places, or why your game board has eight identical rows, or why copy.deepcopy() is taking 400ms on your data pipeline. Worse, it does not help you think about when copying is the wrong solution entirely -- and what to do instead.

This article covers the full picture. We will start with why Python chose reference semantics in the first place, then build up to shallow copies, deep copies, the internal mechanics behind both in CPython's actual source code, and the real-world patterns where choosing wrong breaks your code. We will also cover a question that many copy tutorials skip entirely: when copying is the wrong approach and what alternatives exist. Every claim is backed by the official documentation, verifiable sources, and code you can run yourself.

Why Python Uses References (and Why It Matters)

Before we talk about copies, it is worth asking: why does Python use references at all? Why not just copy everything automatically, the way a spreadsheet duplicates a cell value when you paste it somewhere new?

The answer is a design trade-off between safety and efficiency. If every assignment duplicated an entire object, passing a 10-million-element list to a function would mean allocating and copying 10 million elements. That would make function calls expensive, memory usage unpredictable, and many programs unacceptably slow. Python's creators chose the opposite default: assignment is cheap because it only creates a new reference (essentially a pointer) to an existing object. The object itself stays put.

This design has a name in computer science: reference semantics. Languages like Java (for objects), JavaScript, and Ruby make the same choice. Languages like C and Go, by contrast, default to value semantics, where assignment copies the underlying data. Neither approach is inherently better -- they are different trade-offs. Python's choice prioritizes speed and memory efficiency for the common case (passing data around without modifying it), at the cost of requiring explicit action when you need true independence.

Understanding this trade-off is what separates someone who memorizes copy rules from someone who can predict behavior in unfamiliar situations. Every pattern in this article -- shallow copy, deep copy, list multiplication bugs, default argument traps -- is a direct consequence of reference semantics.

Assignment Is Not Copying

Before you can understand any kind of copy, you need to understand what assignment does in Python. The official Python documentation for the copy module (docs.python.org) states that assignment statements create bindings between names and objects rather than copying data.

This is foundational. When you write b = a, you have not created a second list, dictionary, or object. You have created a second name that points to the same object in memory:

a = [1, 2, 3]
b = a

print(a is b)  # True -- same object
print(id(a) == id(b))  # True -- same memory address

b.append(4)
print(a)  # [1, 2, 3, 4] -- "a" sees the change too

Ned Batchelder, a member of the Python core team and maintainer of coverage.py, devoted an entire PyCon 2015 talk to this concept called "Facts and Myths about Python Names and Values." His central thesis was that names in Python are references to objects, not containers that hold values. As he explained in the original article that the talk was based on (nedbatchelder.com, 2013), assignment never creates copies of data.

This means every time you pass a list to a function, store it in a dictionary, or assign it to a new variable, you are creating another reference to the same object. No duplication occurs. This is efficient -- Python does not waste memory making copies you did not ask for -- but it means mutations through one reference are visible through every other reference to that object.

Note

This is exactly the problem that copying solves. When you need two truly independent objects, you have to ask Python explicitly for a copy. But -- and this is the question many tutorials skip -- do you actually need independence? Often you do not. We will return to this question in the section on structural sharing.

Shallow Copy: New Container, Same Contents

A shallow copy creates a new container object -- a new list, a new dictionary, a new set -- but populates it with references to the same objects that the original contains. The outer structure is independent; the inner contents are shared.

The official Python documentation (docs.python.org, copy module) defines this precisely: a shallow copy builds a new compound object and inserts references to the objects found in the original.

There are multiple ways to create a shallow copy:

import copy

original = [1, 2, [3, 4, 5]]

# All of these create shallow copies:
copy_1 = copy.copy(original)
copy_2 = original.copy()
copy_3 = list(original)
copy_4 = original[:]

# The outer list is a new object
print(original is copy_1)       # False

# But the nested list inside is the SAME object
print(original[2] is copy_1[2]) # True

The last line is where shallow copying either saves you or bites you. The nested list [3, 4, 5] was not duplicated. Both original[2] and copy_1[2] point to the same list object in memory. This means mutating the nested list through one reference changes what both names see:

copy_1[2].append(6)
print(original)  # [1, 2, [3, 4, 5, 6]]  -- original is affected
print(copy_1)    # [1, 2, [3, 4, 5, 6]]  -- copy is affected too

However, reassigning a top-level element in the copy does not affect the original, because reassignment changes which object a reference points to -- it does not mutate the underlying object:

copy_1[0] = 99
print(original)  # [1, 2, [3, 4, 5, 6]]  -- unchanged
print(copy_1)    # [99, 2, [3, 4, 5, 6]] -- only the copy changed
Watch Out

This asymmetry -- mutations to nested objects are shared, but reassignments of top-level elements are not -- is the source of nearly every shallow copy bug. Understanding the difference between rebinding a name and mutating an object is the key to understanding when shallow copy is sufficient and when it is not.

Deep Copy: Independent All the Way Down

A deep copy creates a new container and recursively creates new copies of every object nested inside it. The result is a completely independent clone. The official documentation (docs.python.org) states that a deep copy constructs a new compound object and recursively inserts copies of the objects found in the original.

import copy

original = [1, 2, [3, 4, 5]]
deep = copy.deepcopy(original)

# The outer list is a new object
print(original is deep)       # False

# The nested list is ALSO a new object
print(original[2] is deep[2]) # False

# Mutations are fully isolated
deep[2].append(6)
print(original)  # [1, 2, [3, 4, 5]]      -- untouched
print(deep)      # [1, 2, [3, 4, 5, 6]]   -- only the deep copy changed

The nested list [3, 4, 5] was duplicated during the deep copy. original[2] and deep[2] are different objects. Mutating one has zero effect on the other.

This independence extends to arbitrary depth. If you have a list of dictionaries containing lists of sets, deepcopy will recurse through the entire structure and produce new instances of every mutable object it encounters. But -- and this is a crucial subtlety -- it does not duplicate immutable atomic objects like integers, strings, and booleans. It returns the originals directly, since they cannot be mutated. This is an optimization we will explore in the CPython internals section below.

A Side-by-Side Comparison with Dictionaries

The distinction becomes especially clear with nested dictionaries, which are common in configuration management, API response handling, and data pipelines:

import copy

config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "options": {"timeout": 30, "retries": 3}
    },
    "debug": False
}

shallow = copy.copy(config)
deep = copy.deepcopy(config)

# Modify a nested value in the original
config["database"]["options"]["timeout"] = 60

print(shallow["database"]["options"]["timeout"])  # 60  -- shared reference
print(deep["database"]["options"]["timeout"])      # 30  -- independent copy

The shallow copy created a new outer dictionary, but config["database"] and shallow["database"] point to the same inner dictionary. When you modify config["database"]["options"]["timeout"], the shallow copy sees the change. The deep copy is completely isolated.

This is a real and common source of bugs. If you shallow-copy a configuration dictionary and then modify the original, downstream code holding the shallow copy will see changes you did not intend to share. This pattern frequently appears in web frameworks where request handlers receive configuration objects, in test fixtures where each test needs its own independent state, and in data pipelines where transformations should not corrupt the source data.

The Memo Dictionary: How Deep Copy Handles Circular References

Deep copy faces a challenge that shallow copy does not: circular references. Consider an object that references itself:

a = []
a.append(a)  # a now contains itself
print(a)     # [[...]]  -- Python shows the recursion

A naive recursive copy algorithm would loop forever: copy a, find that it contains an element, try to copy that element, discover it is a, try to copy a again, ad infinitum.

Python's copy.deepcopy() solves this with a memo dictionary. The official documentation (docs.python.org) identifies two problems unique to deep copy: recursive objects can cause infinite loops, and deep copy might duplicate data meant to be shared between copies.

The memo dictionary maps the id() of each original object to its copy. Before copying any object, deepcopy checks: "Have I already copied this object?" If yes, it returns the existing copy instead of recursing:

import copy

a = [1, 2]
a.append(a)  # circular reference

b = copy.deepcopy(a)

# The circular structure is preserved correctly
print(b[2] is b)  # True -- b contains itself, not a
print(b[2] is a)   # False -- it's b's own self-reference, not a's

The deep copy correctly reproduced the circular structure: b contains a reference to itself, not to a. The memo dictionary prevented infinite recursion and ensured structural consistency.

Doug Hellmann's "Python Module of the Week" (pymotw.com), a widely-referenced community resource, illustrates this with a directed graph example. When deep-copying interconnected graph nodes, the __deepcopy__ method checks the memo dictionary and reuses already-copied nodes instead of creating duplicates. As Hellmann demonstrates, when a node is encountered a second time, the recursion is detected and the existing value from the memo dictionary is reused.

The memo dictionary also serves a second, less-discussed purpose: it preserves structural sharing within the original object. If two branches of a data structure both reference the same inner object, the deep copy will produce two references to the same new copy of that inner object, not two separate copies:

import copy

shared_list = [1, 2, 3]
container = [shared_list, shared_list]  # both elements are the same object

print(container[0] is container[1])  # True

deep = copy.deepcopy(container)

# The structural relationship is preserved in the copy
print(deep[0] is deep[1])  # True -- still the same object (but a new one)
print(deep[0] is shared_list)  # False -- independent from the original

Without the memo dictionary, the deep copy would produce two separate copies of shared_list, breaking the structural relationship that existed in the original. This matters in graph-like data structures, object-relational mappings, and any scenario where identity relationships carry meaning.

Inside CPython: How deepcopy Actually Works

Understanding the internal dispatch mechanism of deepcopy gives you a significant advantage when diagnosing performance issues or deciding whether to implement __deepcopy__ on your own classes. The implementation lives in Lib/copy.py in the CPython repository and is written entirely in Python (not C), which means you can read every line.

The algorithm has evolved across Python versions. In Python 3.14+, the function begins with a fast-path check against an _atomic_types set, before even consulting the memo dictionary. In Python 3.13 and earlier, the memo dictionary is checked first, and then the dispatch table (which includes _deepcopy_atomic entries) handles type-specific routing. The exact order of operations differs between versions, but the same core steps are present in both. Taking the Python 3.14+ order as a reference:

  1. Atomic type short-circuit: If the object's type is atomic -- meaning it is immutable and cannot contain references to other objects -- return the original object immediately. The atomic types include int, float, bool, str, bytes, type, range, NoneType, property, functions, and several others. In Python 3.14+, this check uses a dedicated _atomic_types set and runs before anything else. In Python 3.13 and earlier, these types were registered in the _deepcopy_dispatch table, mapped to a _deepcopy_atomic function that simply returns the original object. The end result is the same: immutable atomic objects are never duplicated. Note that tuple and frozenset are not treated as atomic by deepcopy in any version, because they can contain references to other objects. Tuples have a dedicated copier function (_deepcopy_tuple) that recursively deep-copies their contents and returns the original tuple only if no inner element actually changed. This means deep-copying a tuple of immutable values is effectively free, while a tuple containing mutable objects will produce a new tuple with independently copied contents. Frozensets are handled through the reduce protocol, which follows a similar principle.
  2. Memo lookup: Check if the object's id() is already in the memo dictionary. If so, return the cached copy.
  3. Dispatch table lookup: Check _deepcopy_dispatch for a type-specific copier function. Lists use _deepcopy_list, dicts use _deepcopy_dict. Each of these creates a new container and recursively deep-copies its contents.
  4. Custom __deepcopy__ check: If the object defines a __deepcopy__ method, call it with the memo dictionary.
  5. Pickle protocol fallback: As a last resort, use the same __reduce_ex__ or __reduce__ protocol that pickle uses to serialize and reconstruct the object. This is how arbitrary custom classes get deep-copied without defining __deepcopy__.

The atomic type handling is why deep-copying a list of 10,000 integers is fast: none of the integers are actually duplicated. The deep copy creates one new list and fills it with references to the same integer objects. The pickle fallback (step 5) is why deepcopy can handle almost any object, but also why it can be unexpectedly slow for complex custom classes -- the pickle protocol involves introspection, method resolution, and potentially arbitrary Python code.

Pro Tip

You can inspect the dispatch table yourself: import copy; print(copy._deepcopy_dispatch). This shows you exactly which types have optimized copier functions and which will fall through to the slower pickle protocol. If your custom class appears in deep copy hot paths, implementing __deepcopy__ can eliminate the overhead of pickle-based reconstruction entirely.

Types That Cannot Be Copied

Not everything in Python can be copied. The official documentation (docs.python.org) explicitly states that the copy module does not copy modules, method objects, stack traces, stack frames, files, sockets, windows, or similar types.

This makes practical sense: what would it mean to "copy" an open file handle or a network socket? The operating system resources they represent cannot be duplicated by Python. Functions and classes are handled by returning the original object unchanged, which matches how pickle treats them -- they are looked up by name rather than serialized.

import copy
import sys

# These return the original object (treated as atomic):
f = lambda x: x + 1
print(copy.deepcopy(f) is f)  # True

# These raise errors:
try:
    copy.deepcopy(sys.stdin)
except TypeError as e:
    print(e)  # cannot pickle 'TextIOWrapper' instances

This limitation matters when your custom class holds non-copyable resources. A database connection wrapper, a logging handler, or a class that opens a file in __init__ will fail during deep copy unless you implement __deepcopy__ to handle these resources explicitly. The pattern for this is covered in the custom copy behavior section below.

When Each Copy Type Is Appropriate

Use assignment (no copy) when you intentionally want two names to refer to the same object. This is the appropriate default and the most efficient approach. If you are passing a list to a function that only reads it, there is no reason to copy.

Use shallow copy when your data structure is flat -- it contains only immutable objects like integers, strings, floats, or tuples:

# Safe to shallow copy: all elements are immutable
scores = [98, 87, 76, 92, 88]
backup = scores.copy()

scores[0] = 100
print(backup[0])  # 98 -- integers are immutable, so no shared mutation

When the elements are immutable, it does not matter that the copy shares references to them. You cannot mutate an integer or a string in place, so the shared references are harmless. This makes shallow copy both safe and fast for flat collections of immutable values.

Use deep copy when your data structure contains nested mutable objects and you need full independence:

import copy

# Must deep copy: nested mutable structures
user_sessions = {
    "alice": {"cart": ["item_a", "item_b"], "prefs": {"theme": "dark"}},
    "bob": {"cart": ["item_c"], "prefs": {"theme": "light"}},
}

snapshot = copy.deepcopy(user_sessions)

# Modifications to the snapshot are fully isolated
snapshot["alice"]["cart"].append("item_d")
print(user_sessions["alice"]["cart"])  # ["item_a", "item_b"] -- unchanged

Real-World Bug: The Shared Game Board

One of the best-known Python gotchas involves creating a 2D grid using list multiplication. It produces shared references that behave like shallow copies, and the result confuses nearly every beginner who encounters it:

# WRONG: all rows are the same object
board = [[0] * 3] * 3

board[0][0] = 1
print(board)
# [[1, 0, 0], [1, 0, 0], [1, 0, 0]]  -- all rows changed!

The expression [[0] * 3] * 3 creates one inner list [0, 0, 0] and then creates an outer list containing three references to that same inner list. Setting board[0][0] = 1 mutates the shared inner list, so the change appears in every row.

Ned Batchelder addressed this exact pattern in his blog post "Names and values: making a game board" (nedbatchelder.com, 2013). He explained that list multiplication, like assignment, does not copy data -- it replicates references to the same underlying object.

The fix is to ensure each row is a separate object, typically via a list comprehension:

# CORRECT: each row is an independent list
board = [[0] * 3 for _ in range(3)]

board[0][0] = 1
print(board)
# [[1, 0, 0], [0, 0, 0], [0, 0, 0]]  -- only row 0 changed

The list comprehension evaluates [0] * 3 fresh on each iteration, producing three distinct list objects. This is effectively creating three separate shallow copies of a row template.

The conceptual link to understand here is this: [x] * n and y = x follow the same rule. Both create new references to the same object. The list comprehension breaks the pattern because each iteration runs [0] * 3 as a fresh expression, producing a new list each time. It is the same difference as writing a = [0, 0, 0] followed by b = [0, 0, 0] versus writing a = [0, 0, 0] followed by b = a.

Real-World Bug: The Default Mutable Argument

Another classic shallow-copy-related trap is the mutable default argument:

def add_item(item, items=[]):
    items.append(item)
    return items

print(add_item("a"))  # ['a']
print(add_item("b"))  # ['a', 'b']  -- the list persists between calls!

The default list [] is created once when the function is defined, not on each call. Every invocation that uses the default shares the same list object. The fix uses None as the sentinel and creates a new list inside the function body:

def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items
Pro Tip

This is not technically a copy problem, but it stems from the same fundamental misunderstanding: assignment and parameter binding in Python create references, not copies. Every pattern in this article -- shallow copy, deep copy, list multiplication, default arguments -- is a consequence of that single rule. Once you internalize reference semantics, you stop encountering "surprising" Python behavior because you have the correct mental model.

Customizing Copy Behavior with __copy__ and __deepcopy__

The copy module allows classes to define custom behavior for both copy operations through the __copy__ and __deepcopy__ dunder methods. The official documentation (docs.python.org) specifies that classes can define __copy__() for shallow copy and __deepcopy__() (which receives a memo dictionary) for deep copy.

This is useful when your object holds a mix of state that should be shared and state that should be independent:

import copy

class DatabaseConnection:
    _pool = []  # shared across all copies

    def __init__(self, host, port):
        self.host = host
        self.port = port
        self.query_cache = {}     # per-instance, should be independent
        self._connected = False   # per-instance state

    def __copy__(self):
        """Shallow copy: new instance with empty cache, shared pool."""
        new = DatabaseConnection(self.host, self.port)
        new._connected = False  # new copy starts disconnected
        return new

    def __deepcopy__(self, memo):
        """Deep copy: fully independent, including cache contents."""
        new = DatabaseConnection(
            copy.deepcopy(self.host, memo),
            copy.deepcopy(self.port, memo),
        )
        memo[id(self)] = new
        new.query_cache = copy.deepcopy(self.query_cache, memo)
        new._connected = False
        return new

The __copy__ method creates a new connection with an empty cache -- the caller gets a fresh instance that shares the connection pool but has its own state. The __deepcopy__ method goes further, duplicating even the cache contents. The memo[id(self)] = new line registers the new object in the memo dictionary before recursing into the cache, which is critical: if the cache contains a circular reference back to this DatabaseConnection, the memo prevents infinite recursion. It also ensures that if another object being deep-copied references this same instance, the same copy is reused rather than creating a duplicate.

Doug Hellmann's "Python Module of the Week" (pymotw.com) demonstrates this pattern with a directed graph where __deepcopy__ must check the memo dictionary before recursing into connected nodes, ensuring the graph's topology is preserved without infinite loops.

There is also a third customization point, introduced in Python 3.13: the __replace__ method, which powers copy.replace(). This protocol creates a new object of the same type while swapping out specific fields, and it is supported by dataclasses, named tuples, and any class that implements __replace__. We cover this in the performance section below.

Performance: Benchmarks, Alternatives, and Trade-Offs

Shallow copy is fast. It creates one new container and fills it with existing references. For a list of 10,000 elements, it does roughly 10,000 pointer copies.

Deep copy is slower -- sometimes dramatically so. It must traverse the entire object graph, check each object's type against the dispatch table, look up the memo dictionary, and allocate new memory for every mutable object it encounters. The overhead comes from recursive traversal, cycle detection through the memo dictionary, type-specific dispatch, and (for custom objects) the pickle protocol fallback.

To quantify the difference: a Codeflash benchmark (codeflash.ai, 2025) measuring complex nested objects found that deepcopy was roughly 664 times slower than a shallow copy of the same structure. The exact ratio depends heavily on the depth and complexity of your data, but the benchmark illustrates the worst-case cost. The same benchmark found that serialization-based alternatives like pickle and orjson offer middle-ground performance.

For performance-sensitive code, consider these alternatives:

import json
import copy
import pickle

data = {"key": [1, 2, {"nested": True}]}

# Deep copy via copy module (handles all Python objects)
result_1 = copy.deepcopy(data)

# Deep copy via JSON round-trip (faster for JSON-serializable data)
result_2 = json.loads(json.dumps(data))

# Deep copy via pickle round-trip (faster, handles more types than JSON)
result_3 = pickle.loads(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))

Each alternative has distinct trade-offs:

JSON round-trip (json.loads(json.dumps(data))): Often faster than deepcopy for large, pure-data structures because it avoids the memo dictionary and type introspection overhead. However, it only works with JSON-serializable types -- no sets, no custom objects, no circular references, no bytes, no tuples (they become lists). Use it when your data is simple and you need speed.

Pickle round-trip (pickle.loads(pickle.dumps(data))): Handles a broader range of Python types than JSON, including sets, tuples, and many custom objects. It can be faster than deepcopy for certain structures. However, it does not preserve structural sharing (the same object referenced twice will become two separate objects), and it inherits pickle's security concerns -- never unpickle data from untrusted sources.

Third-party serializers like orjson and msgpack: For JSON-compatible data, orjson (a Rust-backed JSON library) can outperform both json and deepcopy significantly. MessagePack offers similar benefits for binary serialization. These are worth considering in data pipeline contexts where copy operations are in the hot path.

Selective copying: Often the highest-performance solution is not to use a general-purpose copy function at all. Copy only what you need to modify. If you have a list of 1,000 items and need to change one, shallow-copy the list and deep-copy just the item you are modifying. This approach was demonstrated in a Pydantic-AI optimization where replacing a full deepcopy of a message history with a shallow copy plus a single targeted deep copy yielded a 180x speedup (Codeflash, codeflash.ai, 2025).

For dataclasses and named tuples, Python 3.13 introduced copy.replace(), which creates a new object of the same type while replacing specific fields. This is neither shallow nor deep copy -- it is a targeted copy that avoids duplicating the entire structure:

from dataclasses import dataclass
import copy

@dataclass
class Point:
    x: float
    y: float
    label: str = ""

p1 = Point(1.0, 2.0, "origin")
p2 = copy.replace(p1, x=5.0)

print(p2)  # Point(x=5.0, y=2.0, label='origin')

The copy.replace() function works by calling the __replace__ dunder method (also introduced in Python 3.13), and it generalizes the dataclasses.replace() function that previously only worked with dataclasses. Any class can support this protocol by implementing __replace__, making it available to named tuples, dataclasses, and custom types alike.

When Not to Copy: Structural Sharing and Immutability

There is a question that many copy tutorials never ask: should you be copying at all?

Copying is a solution to a problem caused by mutability. If you mutate shared objects, you need copies to prevent unintended side effects. But there is another approach: do not mutate shared objects in the first place.

Frozen data structures: Python's frozenset and tuple are immutable by design. If your data is stored in these types, it cannot be mutated, so copying is unnecessary. You can share references freely with no risk of unintended side effects:

# No copy needed: tuples are immutable
config = (("host", "localhost"), ("port", 5432))
backup = config  # safe -- tuples cannot be mutated

Structural sharing through immutable trees: Libraries like pyrsistent provide persistent (immutable) data structures that support efficient "updates" by sharing structure. When you "modify" a persistent vector, the library creates a new vector that shares all unchanged nodes with the original. This gives you the independence of deep copy with performance characteristics closer to shallow copy.

The functional approach: Instead of copying a dictionary and modifying the copy, create a new dictionary that contains the changes:

# Instead of this:
import copy
new_config = copy.deepcopy(config)
new_config["timeout"] = 60

# Consider this (for flat dicts):
new_config = {**config, "timeout": 60}

# Or equivalently (Python 3.9+):
new_config = config | {"timeout": 60}

The unpacking approach ({**config, "timeout": 60}) creates a new dictionary with all original keys plus the override. It is a shallow copy with a modification in one step -- no import copy required. For deeply nested structures, this approach requires more thought, but the principle remains: build new data rather than mutating old data.

This is not a theoretical argument. Many large Python codebases, particularly in data science and web frameworks, have moved toward immutable-by-default patterns precisely because they eliminate entire categories of copy-related bugs. When data cannot be mutated, the question of shallow vs. deep copy becomes irrelevant.

The Decision Framework

When you need to decide how to duplicate data in Python, work through these questions:

  1. Do I need a separate object at all, or can I share a reference? If sharing is fine, use plain assignment. This is the default and the most efficient approach.
  2. Can I avoid mutation entirely? If your data can be immutable (tuples, frozensets, or persistent data structures), you eliminate the need for copies. Build new data instead of modifying old data.
  3. Is my data structure flat (contains only immutable objects)? If yes, a shallow copy is safe and fast. Use list.copy(), dict.copy(), slice notation, or copy.copy().
  4. Does my data structure contain nested mutable objects? If yes, and you need full independence, use copy.deepcopy().
  5. Can I scope the independence? If you only need to modify one branch of a nested structure, shallow-copy the container and deep-copy only the branch you will modify. This is dramatically faster than deep-copying the entire structure.
  6. Do I need custom copy behavior? Implement __copy__ and/or __deepcopy__ on your class.
  7. Is performance critical and my data is JSON-serializable? Consider json.loads(json.dumps(data)) or pickle.loads(pickle.dumps(data)) as alternatives to deepcopy. Benchmark with your actual data to confirm the speedup.
  8. Do I need to replace specific fields on a dataclass or named tuple? Use copy.replace() (Python 3.13+) for targeted field replacement without duplicating the entire structure.

The entire system is consistent once you internalize the foundational rule: Python names are references to objects, and assignment creates new references, never new objects. Shallow copy creates a new container with the same references. Deep copy creates a new container with new references to new objects, all the way down. And sometimes, the best copy is the one you do not make.

back to articles