You wrote a function. You passed it a list. Somewhere between "this should work" and "what on earth just happened," your list mutated behind your back. You didn't touch it. You didn't ask for it. Yet there it is — changed, corrupted, haunted by data you never put there.
This is not a bug in Python. It is one of the most misunderstood behaviors in the language, and it has bitten everyone from first-semester students to senior engineers shipping production code. Stefano Borini, the author of the legendary Stack Overflow question on this topic (with thousands of upvotes and counting), described a colleague's reaction to the behavior as calling it "a dramatic design flaw" of the language. Borini acknowledged that while the behavior has an underlying explanation, "it is indeed very puzzling and unexpected if you don't understand the internals."
This article covers the three core mechanisms that cause lists to change unexpectedly in Python: aliasing, mutable default arguments, and shallow copying. Each one is a direct consequence of how Python's object model works. Understanding them isn't optional — it's the difference between writing code that works and writing code that appears to work until it doesn't.
The Root Cause: Python's Object Model
Before getting into specifics, you need to understand one fundamental truth about Python. Variables are not boxes that hold values. They are name tags attached to objects in memory.
When you write a = [1, 2, 3], Python creates a list object in memory and then sticks the label a on it. If you then write b = a, Python does not create a second list. It sticks a second label on the same object. Both a and b now point to the exact same list.
This is a deliberate consequence of how Python was designed. Guido van Rossum stated that making all objects first class was a core design goal for the language — integers, strings, functions, lists all follow the same rules, with no special cases, no hidden copies, no magic behind the scenes. (Source: Guido van Rossum, various Python history writings.) Once you internalize that, the "unexpected" behavior stops being unexpected.
a = [1, 2, 3]
b = a
print(a is b) # True --- same object in memory
print(id(a) == id(b)) # True --- same memory address
b.append(4)
print(a) # [1, 2, 3, 4] --- "wait, I only changed b!"
This is not a quirk. This is the language working exactly as designed.
Problem 1: Aliasing — Two Names, One List
Aliasing is the simplest and most common cause of unexpected list mutations. It happens whenever two or more variables reference the same mutable object.
def remove_duplicates(items):
seen = set()
result = items # This is NOT a copy. This is an alias.
i = 0
while i < len(result):
if result[i] in seen:
result.pop(i)
else:
seen.add(result[i])
i += 1
return result
original = [1, 2, 2, 3, 3, 4]
cleaned = remove_duplicates(original)
print(cleaned) # [1, 2, 3, 4]
print(original) # [1, 2, 3, 4] --- original is destroyed
The line result = items does not copy the list. It creates an alias. Every .pop() call on result also modifies original, because they are the same object.
This extends to function arguments as well. When you pass a list to a function, Python does not copy the list into the function's local scope. It passes a reference to the same object. The official Python documentation for the copy module states this directly: "Assignment statements in Python do not copy objects, they create bindings between a target and an object."
The Fix: Make an Explicit Copy
def remove_duplicates(items):
seen = set()
result = list(items) # New list object. Independent copy.
i = 0
while i < len(result):
if result[i] in seen:
result.pop(i)
else:
seen.add(result[i])
i += 1
return result
original = [1, 2, 2, 3, 3, 4]
cleaned = remove_duplicates(original)
print(cleaned) # [1, 2, 3, 4]
print(original) # [1, 2, 2, 3, 3, 4] --- untouched
You can create a shallow copy in multiple ways: list(items), items[:], items.copy(), or copy.copy(items). All four produce a new top-level list object.
Problem 2: Mutable Default Arguments — The Classic Gotcha
This is the single most written-about Python gotcha in existence. The Hitchhiker's Guide to Python, by Kenneth Reitz and Tanya Schlusser, calls it "seemingly the most common surprise new Python programmers encounter."
Here is the trap:
def add_task(task, task_list=[]):
task_list.append(task)
return task_list
print(add_task("buy groceries")) # ['buy groceries']
print(add_task("walk the dog")) # ['buy groceries', 'walk the dog'] --- what?
print(add_task("write code")) # ['buy groceries', 'walk the dog', 'write code']
If you expected each call to start with a fresh empty list, you are not alone. But that is not what happens. The empty list [] is created once, at the moment the def statement is executed. Every subsequent call to add_task without an explicit task_list argument reuses that same list object.
You can prove this by inspecting the function's internals:
print(add_task.__defaults__)
# (['buy groceries', 'walk the dog', 'write code'],)
The __defaults__ attribute stores the default argument values as a tuple. That tuple is created when the function is defined, and the list inside it persists and mutates across every call.
Why Python Does This
This is not a design flaw. It is a deliberate consequence of Python's execution model. In Python, def is not a declaration — it is an executable statement. When the interpreter encounters def add_task(task, task_list=[]):, it evaluates the default expression [] right then, creates that list object, and attaches it to the function object.
Florimond Manca, writing about this behavior, described the confusion it caused early in his Python career: lists would grow larger than expected across repeated calls, producing errors that were hard to trace. He recommends the None sentinel pattern as the standard fix — simple in principle but easy to forget when you don't yet understand why it's necessary. (Source: Florimond Manca, florimond.dev)
Even Guido van Rossum himself has acknowledged this behavior as a wart in the language. In a 2021 discussion on the python-ideas mailing list about PEP 671, which proposes late-bound function argument defaults, Van Rossum expressed enthusiasm for the effort to fix it and suggested the => syntax for the new feature, signaling that even the language's creator considers this behavior worthy of a language-level fix.
The Fix: The None Sentinel Pattern
def add_task(task, task_list=None):
if task_list is None:
task_list = []
task_list.append(task)
return task_list
print(add_task("buy groceries")) # ['buy groceries']
print(add_task("walk the dog")) # ['walk the dog']
print(add_task("write code")) # ['write code']
Each call now creates a fresh list when no argument is provided. This is the idiomatic Python solution and is recommended by the official Python documentation, the Hitchhiker's Guide, every major linter (including Ruff's rule B006 and Pylint's W0102), and virtually every Python style guide in existence.
The Class Constructor Variant
This same issue is especially insidious in class constructors. Trey Hunner, creator of Python Morsels, demonstrated this with a TodoList class in an April 2025 tutorial. When tasks=[] is used as a default in __init__, every instance of the class shares the same list. Adding a task to Monday's list also adds it to Tuesday's:
class TodoList:
def __init__(self, tasks=[]):
self.tasks = tasks
def add_task(self, task):
self.tasks.append(task)
monday = TodoList()
tuesday = TodoList()
monday.add_task("Write Python article")
print(monday.tasks) # ['Write Python article']
print(tuesday.tasks) # ['Write Python article'] --- tuesday is contaminated
The fix is the same: use None as the default, and create a new list inside the method.
class TodoList:
def __init__(self, tasks=None):
self.tasks = list(tasks) if tasks else []
def add_task(self, task):
self.tasks.append(task)
Problem 3: Shallow Copy vs. Deep Copy — The Nested List Trap
Even after you learn to make copies, Python has one more surprise waiting for you. A shallow copy only copies the top-level container. If your list contains other mutable objects (like nested lists or dictionaries), those inner objects are still shared between the original and the copy.
original = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
copied = original.copy() # Shallow copy
copied[0][0] = 999
print(copied[0]) # [999, 2, 3]
print(original[0]) # [999, 2, 3] --- the original changed too
The official Python documentation for the copy module makes the distinction precise: a shallow copy builds a new container but fills it with references to the original objects, while a deep copy builds a new container and recursively inserts fresh copies of every object it finds inside. (Source: docs.python.org/3/library/copy.html)
This matters in real-world code more often than you might think. Consider a grid-based game board, a matrix of sensor readings, a nested configuration structure, or a list of dictionaries representing database rows. In all of these cases, a shallow copy creates a false sense of independence.
import copy
original = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
independent = copy.deepcopy(original)
independent[0][0] = 999
print(independent[0]) # [999, 2, 3]
print(original[0]) # [1, 2, 3] --- original is safe
A Common Trap: Multiplying Lists
There is a related gotcha that catches many beginners when they try to create a 2D grid:
# WRONG: Creates 3 references to the SAME inner list
grid = [[0] * 3] * 3
grid[0][0] = 1
print(grid)
# [[1, 0, 0], [1, 0, 0], [1, 0, 0]] --- all rows changed!
The * 3 operator on the outer list does not create three independent inner lists. It creates three references to the same [0, 0, 0] object. The correct approach uses a list comprehension to ensure each row is a distinct object:
# CORRECT: Creates 3 independent inner lists
grid = [[0] * 3 for _ in range(3)]
grid[0][0] = 1
print(grid)
# [[1, 0, 0], [0, 0, 0], [0, 0, 0]] --- only row 0 changed
The Language Is Evolving: PEPs Addressing These Pain Points
The Python community has not ignored these issues. Several Python Enhancement Proposals (PEPs) have directly addressed the pain caused by mutable default arguments.
PEP 671: Syntax for Late-Bound Function Argument Defaults
PEP 671, authored by Chris Angelico and first proposed on October 24, 2021, introduces syntax for default argument values that are evaluated at call time rather than definition time. Instead of the None sentinel pattern, you would write:
# Proposed PEP 671 syntax (not yet accepted)
def add_task(task, task_list=>[]):
task_list.append(task)
return task_list
The => operator signals that the expression should be evaluated fresh on every call where the argument is omitted. The PEP draws a clear distinction between the two syntaxes: the standard = form evaluates at definition time, while the => form evaluates at call time. (Source: PEP 671, peps.python.org)
Chris Angelico, defending the proposal against concerns about complexity, compared the feature to list comprehensions on the python-ideas mailing list in November 2021, arguing that late-bound defaults would normally stay short and readable. Steven D'Aprano agreed, contending in the same thread that the feature would not lead to an explosion of overly complex default values. (Source: python-ideas mailing list archive)
As of early 2026, PEP 671 remains in Draft status. It has not been accepted or rejected. Discussion has continued in fits and starts since 2021, and a core developer sponsor is still needed for the proposal to advance. The discussion reflects a strong community desire to address this long-standing source of bugs, but no timeline for resolution has been set.
PEP 505: None-Aware Operators
PEP 505 proposes adding None-aware operators to Python, including the ??= null-coalescing assignment operator. If accepted, it would simplify the None sentinel pattern:
# Current idiom
def add_task(task, task_list=None):
if task_list is None:
task_list = []
task_list.append(task)
return task_list
# Proposed PEP 505 syntax
def add_task(task, task_list=None):
task_list ??= []
task_list.append(task)
return task_list
PEP 505 also remains a draft. But the fact that both proposals exist underscores how deeply the community feels about the ergonomic burden of the None sentinel pattern.
When Mutable Defaults Are Actually Useful
It would be dishonest to present mutable defaults as purely a source of bugs. The behavior has legitimate uses, and understanding them deepens your grasp of Python's object model.
Memoization and Caching
A mutable default dictionary can serve as a persistent cache across function calls:
def fibonacci(n, _cache={0: 0, 1: 1}):
if n not in _cache:
_cache[n] = fibonacci(n - 1) + fibonacci(n - 2)
return _cache[n]
print(fibonacci(50)) # 12586269025 --- computed efficiently
print(fibonacci.__defaults__) # Shows the populated cache
The cache dictionary persists because the default value is evaluated once and shared across calls. This is the exact behavior that causes bugs in other contexts, but here it is the desired behavior.
That said, modern Python offers cleaner alternatives. The @functools.lru_cache decorator provides the same functionality with explicit intent. In Python 3.9 and later, @functools.cache is the preferred shorthand (it is equivalent to lru_cache(maxsize=None) but more readable):
from functools import cache
@cache
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
The Modern Solution: dataclasses and attrs
If you are building classes that hold collections of data, Python offers a cleaner answer to the mutable default problem than the None sentinel pattern: dataclasses, introduced in Python 3.7 via PEP 557.
Dataclasses raise an explicit ValueError if you try to use a mutable object as a default field value. They require you to use dataclasses.field(default_factory=...) instead, making the intent fully explicit:
from dataclasses import dataclass, field
# This raises a ValueError at class definition time:
# @dataclass
# class TodoList:
# tasks: list = [] # ValueError: mutable default is not allowed
# The correct approach:
@dataclass
class TodoList:
tasks: list = field(default_factory=list)
monday = TodoList()
tuesday = TodoList()
monday.tasks.append("Write Python article")
print(monday.tasks) # ['Write Python article']
print(tuesday.tasks) # [] --- independent, as expected
The default_factory parameter accepts any callable. Each time a new instance is created without an explicit value, that callable is invoked to produce a fresh object. This is the pattern the language itself recommends for class-level mutable defaults.
The attrs library, which predates dataclasses and heavily influenced their design, solves the same problem with attrs.Factory:
import attrs
@attrs.define
class TodoList:
tasks: list = attrs.Factory(list)
monday = TodoList()
tuesday = TodoList()
monday.tasks.append("Write Python article")
print(monday.tasks) # ['Write Python article']
print(tuesday.tasks) # []
If you are writing any class that stores a list, dictionary, or set as an instance attribute, these tools should be your first choice. They eliminate the category of bug entirely rather than patching around it.
Thread Safety and Shared Mutable State
There is a dimension of this problem that rarely appears in beginner tutorials but causes serious production bugs: thread safety. When a mutable default argument is shared across all calls to a function, it is also shared across all threads that call that function concurrently.
import threading
def add_task(task, task_list=[]):
task_list.append(task)
return task_list
# Two threads calling add_task simultaneously share the same list.
# The result is a data race: undefined behavior.
t1 = threading.Thread(target=add_task, args=("task A",))
t2 = threading.Thread(target=add_task, args=("task B",))
t1.start()
t2.start()
t1.join()
t2.join()
# The shared default list now contains both tasks, but the ORDER
# and any intermediate state are non-deterministic.
Python's Global Interpreter Lock (GIL) prevents certain kinds of corruption at the bytecode level, but it does not make list mutations atomic. The list.append() method is thread-safe in CPython in practice because of how the GIL works, but this is an implementation detail, not a language guarantee. Under PyPy, Jython, or a GIL-free CPython build (available since Python 3.13 via the accepted PEP 703, and officially supported in Python 3.14), the behavior becomes genuinely unsafe.
The correct fix is the same as always: use None as the default. But understanding why it matters in concurrent code makes the rule feel less like a style preference and more like a safety requirement.
Identity vs. Equality: A Diagnostic You Are Probably Underusing
When debugging a mutation you cannot explain, the first question to ask is whether two variables are pointing to the same object or merely to objects with the same value. Python gives you two ways to test this, and they are not interchangeable.
== tests equality: do these two objects have the same value? is tests identity: are these the exact same object in memory?
a = [1, 2, 3]
b = [1, 2, 3]
c = a
print(a == b) # True --- same values
print(a is b) # False --- different objects
print(a == c) # True --- same values
print(a is c) # True --- same object (alias!)
# When you're checking for None, always use "is", never "==":
x = None
print(x is None) # True --- correct idiom
print(x == None) # True --- works but wrong: a custom class
# could override __eq__ to make None == itself
The is None vs. == None distinction is particularly important in the sentinel pattern. If you write if task_list == None:, a caller could pass an object whose class implements __eq__ to return True when compared to None, silently bypassing your guard. is None tests object identity and cannot be overridden.
For deeper debugging, sys.getrefcount() reveals how many references exist to an object. This is rarely needed in day-to-day code, but it can confirm exactly how broadly a mutable default has been shared:
import sys
def add_task(task, task_list=[]):
task_list.append(task)
return task_list
add_task("buy groceries")
add_task("walk the dog")
# The default list is referenced by:
# 1. The function's __defaults__ tuple
# 2. The local variable task_list during the call
# 3. The return value from the last call
# getrefcount adds one more temporary reference for the call itself
print(sys.getrefcount(add_task.__defaults__[0])) # typically 3+
When deepcopy Is Too Slow
copy.deepcopy() is the correct tool for nested structures, but it is not free. For large objects, it can be significantly slower than a shallow copy because it must traverse and duplicate the entire object graph. It also cannot handle all types: objects that contain file handles, database connections, or thread locks cannot be deep-copied.
When performance matters, consider whether you actually need a full deep copy or whether a more targeted approach will work:
import copy
import time
# A large nested structure
big = [[i for i in range(1000)] for _ in range(1000)]
start = time.perf_counter()
shallow = big.copy()
print(f"Shallow: {time.perf_counter() - start:.6f}s")
start = time.perf_counter()
deep = copy.deepcopy(big)
print(f"Deep: {time.perf_counter() - start:.6f}s")
# Deep copy is typically 10-100x slower on large nested structures
Alternatives to deepcopy for specific structures:
- List of simple values:
new = list(original)ornew = original[:] - List of dicts:
new = [d.copy() for d in original]— one level deeper than shallow, cheaper than deep - Nested lists of known depth: Write a targeted copy loop rather than paying for full graph traversal
- Immutable data structures: Tuples, frozensets, and strings share safely without copying at all; consider restructuring data to use immutables where mutation is not needed
- JSON-serializable data:
import json; new = json.loads(json.dumps(original))is a blunt but readable deep copy that works for plain data
The right answer depends on your data shape and performance requirements. deepcopy is the safe default; the alternatives are optimizations for when you have measured a real cost.
Defensive Programming: A Checklist
Here is a practical checklist you can apply to any Python code that works with lists (or any mutable objects):
When writing functions that accept lists as arguments, never mutate the input unless that is the function's explicit, documented purpose. If you need to work with a modified version, copy it first.
When using default arguments, never use a mutable object ([], {}, set()) as the default value. Use None and initialize inside the function body. Your linter will tell you too: Ruff's rule B006 and Pylint's W0102 both flag this pattern automatically.
When assigning lists, remember that b = a creates an alias, not a copy. If you need independence, use b = a.copy(), b = list(a), or b = a[:].
When copying nested structures, use copy.deepcopy() from the copy module. Shallow copies (list(), .copy(), [:]) only create independence at the top level. Be aware that deepcopy has a performance cost on large structures — profile before assuming it is free.
When creating 2D structures, use a list comprehension ([[0]*n for _ in range(m)]), never the multiplication operator on nested lists ([[0]*n]*m).
When building classes with mutable fields, use @dataclass with field(default_factory=list) instead of assigning a bare list in __init__. The dataclass machinery enforces this correctly and raises a ValueError at class definition time if you get it wrong.
When writing concurrent code, remember that mutable defaults are shared across threads. The None sentinel pattern is not just a style choice — it is the only way to guarantee each call gets its own object in a multithreaded context.
When debugging unexpected mutations, use id() to check whether two variables point to the same object, use is to test identity (not just equality), and prefer is None over == None in your guards.
a = [1, 2, 3]
b = a
c = a.copy()
print(a is b) # True --- same object
print(a is c) # False --- different objects
print(a == c) # True --- same values, different objects
Conclusion
Your list is changing unexpectedly because Python variables are references, not containers. Every time you assign a list to a new variable, pass it to a function, or use it as a default argument, you are creating another reference to the same object in memory. Mutations through any one of those references are visible through all of them.
This is not a flaw. It is a direct, logical consequence of Python treating everything as a first-class object. Once you understand the object model — that def is executable, that = binds names to objects, that .copy() is shallow — the behavior stops surprising you. It becomes a tool you can wield deliberately, whether you are caching expensive computations in a default dictionary or ensuring your function does not destroy its caller's data.
The solutions scale with the situation. For function defaults, the None sentinel pattern is idiomatic and free. For class attributes, dataclasses with field(default_factory=...) enforces correctness at the language level. For nested structures, copy.deepcopy() is the safe choice, with targeted alternatives when performance matters. For concurrent code, the same None pattern is not just style — it is the only way to guarantee isolation across threads.
The is operator and id() function are your diagnostic allies when something unexpected happens. Use them early. And when your linter flags a mutable default argument, treat that as a correctness warning, not a style suggestion. Python gives you the tools to write code that behaves exactly as you intend. You just need to know which tool to reach for.