Append vs Extend in Python: What Actually Happens Inside the List

Every Python developer learns early that append adds one item and extend adds many. That explanation is accurate but incomplete. It skips the question that matters for writing fast, correct code: what is actually happening inside the list object when you call each method? Python lists are not linked lists. They are dynamic arrays — contiguous blocks of pointers in memory, backed by a C structure that manages both the current number of elements and the total allocated capacity. Understanding how append and extend interact with this structure, how they trigger (or avoid) memory reallocation, and where they differ at the bytecode and C levels is the difference between code that accidentally creates nested lists and code that processes millions of records efficiently. This article covers both methods from the surface API down to the CPython implementation, benchmarks the real performance gap, explains the subtle relationship between extend, +=, and +, and documents the edge cases and gotchas that trip up even experienced developers.

The Surface-Level Difference

At the API level, the distinction is simple. append takes a single object and adds it to the end of the list. extend takes an iterable and adds each of its elements to the end of the list:

numbers = [1, 2, 3]

numbers.append(4)
print(numbers)  # [1, 2, 3, 4]

numbers.extend([5, 6, 7])
print(numbers)  # [1, 2, 3, 4, 5, 6, 7]

The critical behavioral difference becomes visible when you pass a list to both methods:

base = [1, 2, 3]
extra = [4, 5]

# append: adds the list OBJECT as a single element
a = base.copy()
a.append(extra)
print(a)        # [1, 2, 3, [4, 5]]
print(len(a))   # 4

# extend: adds each ELEMENT from the iterable
b = base.copy()
b.extend(extra)
print(b)        # [1, 2, 3, 4, 5]
print(len(b))   # 5

This is where most explanations stop. But understanding why these behave differently requires looking at what Python is doing under the hood.

Inside CPython: The PyListObject Structure

Python lists in CPython are defined in Objects/listobject.c. The core data structure is:

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;    // array of pointers to list elements
    Py_ssize_t allocated;  // total slots allocated in memory
} PyListObject;

Two numbers govern every list operation: the size (the number of elements the list actually contains, what len() returns) and allocated (the total number of pointer slots currently reserved in memory). The difference between these two values is the list's spare capacity — empty slots that can absorb future additions without requiring a memory reallocation.

You can observe this yourself using sys.getsizeof:

import sys

a = []
print(sys.getsizeof(a))  # 56 bytes (empty list overhead, 64-bit system)

a.append(1)
print(sys.getsizeof(a))  # 88 bytes (allocated space for ~4 elements)

a.append(2)
print(sys.getsizeof(a))  # 88 bytes (no change -- spare capacity used)

The jump from 56 to 88 bytes on the first append, then no change on the second, reveals the over-allocation strategy at work.

How list_resize Works: The Growth Formula

Both append and extend ultimately call the internal list_resize function when the list needs to grow beyond its current allocated capacity.

The comment in the CPython source code explains the strategy: the function deliberately over-allocates proportional to the current list size, reserving room for future growth. This mild over-allocation ensures that a long sequence of append calls achieves linear-time amortized performance, even when the underlying system realloc is slow.

The formula (as of recent CPython versions) is:

new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;

Breaking this down: the new allocation is approximately the requested size plus 12.5% of that size (the >> 3 is a bit shift equivalent to dividing by 8), plus a small constant (6), rounded to the nearest multiple of 4. This produces a growth pattern of: 0, 4, 8, 16, 24, 32, 40, 52, 64, and so on.

Note

This ~12.5% growth factor is notably conservative compared to other languages. Java's ArrayList grows by 50%, and Ruby's Array doubles (100% growth). CPython's modest over-allocation trades slightly more frequent reallocations for lower memory waste — a design choice that reflects Python's emphasis on memory efficiency over raw speed.

What Happens When You Call append

When you call list.append(item), CPython executes the internal app1 function, which:

Checks whether allocated > len (is there spare capacity?)
If yes: stores the pointer to item at position len, increments len. Done. This is a single pointer write — O(1).
If no: calls list_resize(len + 1), which computes the new allocation size, calls realloc to expand (or relocate) the pointer array, then stores the item. The resize is O(n) because it may need to copy all existing pointers, but it happens infrequently enough that the amortized cost per append remains O(1).

The Python Wiki's TimeComplexity page documents this precisely: list append is amortized O(1). The amortized analysis works because the over-allocation means that for every expensive resize, there were many cheap pointer-write operations that benefited from the extra capacity.

What Happens When You Call extend

The extend method is implemented by the list_extend function in CPython, and its logic is fundamentally different from calling append in a loop:

It first tries to determine the length of the incoming iterable. If the iterable has a known length (a list, tuple, or any object with __len__), extend calls list_resize once to allocate all the space needed upfront.
It then iterates through the iterable, writing each element's pointer directly into the pre-allocated array slots.
If the iterable does not have a known length (a generator, a map object, etc.), extend falls back to appending elements one by one, but still does so in C rather than through Python's method dispatch.

This design is why extend is faster than a loop of append calls for adding multiple elements. There are two sources of the speedup: fewer memory reallocations (often just one, vs. potentially many for repeated appends), and the iteration happens entirely in C rather than through Python's bytecode interpreter.

The time complexity is O(k) where k is the number of elements being added, as documented on the Python Wiki TimeComplexity page. But the constant factor is significantly smaller than calling append k times from Python.

Benchmarking the Real Difference

Let's measure the actual performance gap:

import timeit

def using_append(data):
    result = []
    for item in data:
        result.append(item)
    return result

def using_extend(data):
    result = []
    result.extend(data)
    return result

data = list(range(100_000))

t_append = timeit.timeit(lambda: using_append(data), number=100)
t_extend = timeit.timeit(lambda: using_extend(data), number=100)

print(f"append loop: {t_append:.4f}s")
print(f"extend:      {t_extend:.4f}s")
print(f"extend is {t_append / t_extend:.1f}x faster")

Typical results on a modern machine:

append loop: 0.4856s
extend:      0.1184s
extend is 4.1x faster

The 3–5x speedup for extend comes from three factors working together: a single resize instead of multiple, C-level iteration instead of Python bytecode dispatch, and elimination of per-call method lookup overhead that the append loop incurs on every iteration.

Pro Tip

For adding a single element, append is always the right choice — there is no scenario where extend([x]) is faster or clearer than append(x).

The += Operator: extend in Disguise (Almost)

The += operator on lists calls __iadd__, which internally invokes the same list_extend function. In practice, l1 += l2 and l1.extend(l2) ultimately execute the same code (the list_extend function in listobject.c).

But there are two important differences:

First, += uses dedicated bytecodes while extend uses generalized method dispatch. This means += has slightly less overhead for simple local-variable cases, making it marginally faster:

# Bytecode comparison (simplified)
# l += [4, 5]  ->  LOAD_FAST, LOAD_CONST, LIST_EXTEND, STORE_FAST
# l.extend([4, 5])  ->  LOAD_FAST, LOAD_METHOD, LOAD_CONST, CALL_METHOD

Second, += performs a reassignment. After calling __iadd__, it stores the result back to the variable. For a plain list variable, this is invisible — the object is the same. But it matters in two edge cases:

# Edge case 1: tuple containing a list
t = ([1, 2],)
t[0].extend([3, 4])   # Works: [1, 2, 3, 4]
# t[0] += [3, 4]      # TypeError! Tuple doesn't support item assignment
                        # But the list IS modified before the error

# Edge case 2: identity preservation
a = [1, 2, 3]
b = a
a += [4, 5]
print(b)  # [1, 2, 3, 4, 5] -- b sees the change (same object)

c = [1, 2, 3]
d = c
c = c + [4, 5]
print(d)  # [1, 2, 3] -- d does NOT see the change (new object)

Common Pitfall

The tuple-with-list case is one of Python's most famous gotchas. The += operator successfully modifies the list inside the tuple but then fails when trying to reassign the tuple slot, leaving you with modified data and a TypeError.

The + Operator: A Different Animal Entirely

The plain + operator is fundamentally different from both extend and +=. It creates a new list containing the concatenation:

a = [1, 2, 3]
b = [4, 5, 6]

c = a + b
print(c)       # [1, 2, 3, 4, 5, 6]
print(a)       # [1, 2, 3] -- unchanged
print(id(a) == id(c))  # False -- different object

The + operator also only works with two lists. You cannot concatenate a list with a tuple or other iterable using +:

[1, 2] + (3, 4)    # TypeError: can only concatenate list to list
[1, 2].extend((3, 4))  # Works: [1, 2, 3, 4]

For performance, + is the slowest option because it always allocates a new list and copies all elements from both operands. The extend method and += modify in place and are significantly faster for large lists.

The Unpacking Operator: The Modern Alternative

Python 3.5 introduced generalized unpacking via PEP 448, and the [*a, *b] syntax has become a common way to merge lists. Like +, it creates a new list. Unlike +, it works with any iterable:

a = [1, 2, 3]
b = (4, 5, 6)       # a tuple, not a list
c = {7, 8, 9}       # a set

merged = [*a, *b, *c]
print(merged)  # [1, 2, 3, 4, 5, 6, 8, 9, 7] (set order varies)

# Contrast with +, which only works with lists:
# a + b  # TypeError: can only concatenate list to list

This flexibility makes unpacking useful when merging heterogeneous iterables. It also lets you splice elements naturally in the middle of a new list:

header = ["timestamp", "source"]
fields = ["severity", "message", "trace_id"]

# Insert a literal value between two unpacked lists
schema = [*header, "event_type", *fields]
print(schema)
# ['timestamp', 'source', 'event_type', 'severity', 'message', 'trace_id']

Under the hood, [*a, *b] compiles to BUILD_LIST and LIST_EXTEND bytecode instructions — it creates a new empty list, then extends it with each starred operand. This means the performance is comparable to creating an empty list and calling extend twice: the iteration happens in C, and CPython can pre-size if the iterables have a known length. For two lists, the performance sits between extend (which modifies in place) and + (which copies everything into a brand-new allocation).

Pro Tip

Use [*a, *b] when you need a new list from mixed iterable types, or when you want to interleave literal values between unpacked sequences. Use extend when you are building a list incrementally and want to avoid the overhead of creating a new object each time.

Five Gotchas That Catch Real Developers

Gotcha 1: Extending with a String

Strings are iterable, so extend iterates over their characters:

words = ["hello", "world"]
words.extend("python")
print(words)  # ['hello', 'world', 'p', 'y', 't', 'h', 'o', 'n']

# What you probably wanted:
words = ["hello", "world"]
words.append("python")
print(words)  # ['hello', 'world', 'python']

This is the single most common mistake with extend. If you want to add a string as one element, use append. If you want to add a list containing one string, use extend(["python"]).

Gotcha 2: Accidentally Creating Nested Lists

# Building a matrix row by row
matrix = []
for i in range(3):
    row = [i * 3 + j for j in range(3)]
    matrix.append(row)    # Correct: each row is a sub-list

# Flattening data
flat = []
for chunk in [[1, 2], [3, 4], [5, 6]]:
    flat.extend(chunk)    # Correct: [1, 2, 3, 4, 5, 6]
    # flat.append(chunk)  # Wrong: [[1, 2], [3, 4], [5, 6]]

Ask yourself: do you want the result to have one more item (append) or k more items (extend)?

Gotcha 3: extend with a Generator Loses the Length Optimization

When you pass a generator to extend, CPython cannot pre-compute the length, so it cannot resize the array in a single operation:

# This gets the fast path (list has __len__)
numbers.extend([x * 2 for x in range(1000)])

# This does NOT get the fast path (generator has no __len__)
numbers.extend(x * 2 for x in range(1000))

If performance matters and the data fits in memory, materialize the iterable as a list first. If memory matters more, the generator approach is fine — the C-level iteration is still faster than a Python append loop.

Gotcha 4: append Modifies, Not Copies

append stores a reference to the object, not a copy:

row = [0, 0, 0]
matrix = []
for i in range(3):
    row[0] = i
    matrix.append(row)  # Appends the SAME list object three times

print(matrix)  # [[2, 0, 0], [2, 0, 0], [2, 0, 0]] -- all point to row

# Fix: append a copy
matrix = []
for i in range(3):
    row[0] = i
    matrix.append(row.copy())

Gotcha 5: Neither Method Returns the List

Both append and extend return None. They modify the list in place. This means method chaining does not work:

# This is a bug -- result is None
result = [1, 2, 3].append(4)
print(result)  # None

# The list was modified, but you lost the reference to it

Bytecode Comparison: Why extend Is Faster in Loops

You can use the dis module to see exactly what Python compiles for each approach:

import dis

def loop_append():
    result = []
    for x in [1, 2, 3]:
        result.append(x)
    return result

def single_extend():
    result = []
    result.extend([1, 2, 3])
    return result

dis.dis(loop_append)
dis.dis(single_extend)

The loop_append bytecode shows LOAD_ATTR (to look up the append method) and CALL_FUNCTION inside the loop body — these execute on every iteration. The single_extend bytecode shows a single LOAD_ATTR and CALL_FUNCTION pair, with all iteration happening inside the C implementation of extend.

This per-iteration method lookup overhead is one reason why the common optimization pattern of caching append as a local variable exists:

def cached_append():
    result = []
    _append = result.append  # cache the method lookup
    for x in range(100_000):
        _append(x)
    return result

This eliminates the repeated LOAD_ATTR instruction and can provide a 10–20% speedup for tight loops. However, extend still wins when you have the data available as an iterable, because it eliminates Python-level iteration entirely.

List Comprehensions vs extend: When Building Replaces Merging

A question that comes up often alongside the append-vs-extend discussion: where do list comprehensions fit? The answer is that they solve a different problem, but the overlap is wide enough to cause confusion.

A list comprehension creates a new list from an expression evaluated over an iterable. Under the hood, CPython compiles a comprehension into its own code object that uses the LIST_APPEND bytecode instruction (not the list.append method call), which skips the method dispatch overhead entirely:

import timeit

data = list(range(100_000))

# Approach 1: extend (no transformation)
def using_extend():
    result = []
    result.extend(data)
    return result

# Approach 2: list comprehension (no transformation)
def using_comprehension():
    return [x for x in data]

# Approach 3: list comprehension (with transformation)
def using_comprehension_transform():
    return [x * 2 for x in data]

# Approach 4: extend with a generator (with transformation)
def using_extend_transform():
    result = []
    result.extend(x * 2 for x in data)
    return result

t1 = timeit.timeit(using_extend, number=100)
t2 = timeit.timeit(using_comprehension, number=100)
t3 = timeit.timeit(using_comprehension_transform, number=100)
t4 = timeit.timeit(using_extend_transform, number=100)

print(f"extend (no transform):           {t1:.4f}s")
print(f"comprehension (no transform):    {t2:.4f}s")
print(f"comprehension (with transform):  {t3:.4f}s")
print(f"extend+generator (with transform): {t4:.4f}s")

Typical results:

extend (no transform):             0.1184s
comprehension (no transform):      0.2650s
comprehension (with transform):    0.3820s
extend+generator (with transform): 0.4210s

For a straight copy with no transformation, extend wins because it copies pointers at the C level without evaluating any Python expression per element. But the moment you need to transform, filter, or compute values, a list comprehension is the right tool — it combines iteration and construction into a single optimized operation that outperforms an append loop and competes with extend fed by a generator.

The practical guideline: use extend when you already have the data and want to merge it into an existing list. Use a list comprehension when you are creating a new list from scratch and need to apply logic to each element during construction.

Where insert Fits In

The list.insert(index, item) method is a cousin of append that can place an element at any position, not just the end. It comes at a significant cost:

import timeit

data = list(range(100_000))

# append to end: O(1) amortized
t_append = timeit.timeit(lambda: data.append(1), number=100_000)

# insert at beginning: O(n) -- shifts all existing elements
t_insert = timeit.timeit(lambda: data.insert(0, 1), number=1_000)

print(f"append (100k ops):  {t_append:.4f}s")
print(f"insert(0) (1k ops): {t_insert:.4f}s")

Internally, insert calls list_resize if needed (same as append), then shifts every element from the insertion point to the end one position to the right using memmove. That shift is O(n), making insert(0, x) dramatically slower than append(x) on large lists. Inserting at position len(list) // 2 is half as expensive as inserting at position 0, but still linear.

There is no insert_many or extend_at method in Python. If you need to insert multiple elements at a specific position, the idiomatic approach is slice assignment:

original = [1, 2, 5, 6]
to_insert = [3, 4]

# Insert [3, 4] at position 2
original[2:2] = to_insert
print(original)  # [1, 2, 3, 4, 5, 6]

Slice assignment uses the same list_ass_slice function in CPython. It resizes the array once (if needed) and shifts the trailing elements once, making it far more efficient than calling insert in a loop for each element individually.

When a List Is the Wrong Tool: collections.deque

If your access pattern involves frequent additions or removals at both ends of the sequence, the built-in list is the wrong data structure. The collections.deque (double-ended queue) provides O(1) append, appendleft, pop, and popleft — no element shifting required:

from collections import deque
import timeit

n = 100_000

# List: insert at front is O(n) per operation
def list_prepend():
    lst = []
    for i in range(n):
        lst.insert(0, i)

# Deque: appendleft is O(1) per operation
def deque_prepend():
    dq = deque()
    for i in range(n):
        dq.appendleft(i)

t_list  = timeit.timeit(list_prepend, number=1)
t_deque = timeit.timeit(deque_prepend, number=1)

print(f"list insert(0):    {t_list:.4f}s")
print(f"deque appendleft:  {t_deque:.4f}s")

On a typical machine with 100,000 elements, the deque version finishes in a fraction of the time the list version takes. The difference is architectural: a list is a contiguous array of pointers that must shift elements for non-tail operations, while a deque is implemented as a doubly-linked list of fixed-size blocks that can grow at either end without moving existing data.

Deques also support extend and extendleft, mirroring the list API:

dq = deque([3, 4, 5])
dq.extendleft([2, 1, 0])  # Note: elements are added one by one from left
print(dq)  # deque([0, 1, 2, 3, 4, 5])

dq.extend([6, 7, 8])
print(dq)  # deque([0, 1, 2, 3, 4, 5, 6, 7, 8])

The trade-off is that deques do not support slicing and have O(n) performance for indexed access in the middle. If your workload is "append at the end, occasionally iterate, rarely index by position," a list is the right choice. If your workload involves a queue, a stack used from both ends, or a sliding window, a deque is almost certainly what you want.

Note

A deque initialized with maxlen automatically discards elements from the opposite end when full, making it the standard tool for fixed-size sliding windows and "most recent N items" patterns: history = deque(maxlen=100).

Thread Safety: What the GIL Does and Does Not Guarantee

Since this article examines CPython internals, it is worth addressing a question that surfaces in every concurrent Python codebase: are append and extend thread-safe?

In CPython, the Global Interpreter Lock (GIL) ensures that only one thread executes Python bytecode at a time. Individual bytecode instructions are atomic, and because list.append resolves to a single C-level call (app1), a concurrent append from two threads will not corrupt the list's internal state. The same applies to extend, pop, indexing, and other single-operation list methods.

But "atomic at the operation level" does not mean "safe at the workflow level." Consider this pattern:

import threading

shared = []

def check_and_append(value):
    if value not in shared:  # Step 1: read
        shared.append(value)  # Step 2: write
    # A thread switch can happen between Step 1 and Step 2,
    # so two threads can both pass the "not in" check
    # and both append the same value.

threads = [threading.Thread(target=check_and_append, args=(42,)) for _ in range(10)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(shared)  # Could be [42] or [42, 42] or [42, 42, 42]...

Each individual operation (__contains__ check, append) is atomic, but the compound "check-then-act" sequence is not. If your code reads from and writes to a shared list across multiple steps, you need a threading.Lock:

lock = threading.Lock()

def safe_check_and_append(value):
    with lock:
        if value not in shared:
            shared.append(value)

Common Pitfall

The GIL is a CPython implementation detail, not a language guarantee. Alternative Python runtimes (PyPy, GraalPy, Jython) may not provide the same atomicity for list operations. CPython's free-threaded build (PEP 703) is now officially supported as of Python 3.14, meaning GIL-based atomicity guarantees are already optional in production. If your code must be thread-safe across interpreters or in free-threaded builds, always protect shared mutable state with explicit locks.

When to Use Which: The Decision Framework

Use append when:

Adding a single item to a list (any type: int, string, dict, another list)
Building a list of containers where each element is itself a collection
The item to add is not iterable, or you want to preserve it as a single unit

Use extend when:

Adding multiple elements from another list, tuple, set, or any iterable
Flattening one level of nested data
Merging data from multiple sources into a single list
Performance matters and you have a batch of items to add

Use += when:

You want the same behavior as extend with slightly less typing
The variable is a simple local (not a tuple element or class attribute where the reassignment could cause issues)

Use + when:

You need a new list without modifying either original
You are working with immutable data patterns or functional-style code

Use a list comprehension when:

You are both transforming and collecting data in one step
The construction pattern is more complex than simple appending

Use [*a, *b] unpacking when:

You need a new list from mixed iterable types (lists, tuples, sets, generators)
You want to interleave literal values between unpacked sequences in a readable expression

Use collections.deque when:

Your workload adds or removes elements from both ends of the sequence
You are implementing a queue, a stack used from both ends, or a sliding window
You need appendleft or popleft without paying the O(n) shifting cost of a list

Performance Summary

Operation	Time Complexity	Modifies In Place	Accepts Any Iterable
`append(x)`	O(1) amortized	Yes	N/A (single item)
`extend(iterable)`	O(k)	Yes	Yes
`+=`	O(k)	Yes (with reassignment)	Yes
`+`	O(n + k)	No (new list)	No (lists only)
`append` in loop	O(k) but with higher constant	Yes	N/A
`[a, b]`	O(n + k)	No (new list)	Yes
`insert(i, x)`	O(n)	Yes	N/A (single item)
List comprehension	O(k)	No (new list)	N/A (builds from expression)
`deque.appendleft(x)`	O(1)	Yes	N/A (single item)

Where n is the current list length and k is the number of elements being added.

The difference between append and extend is not just about "one item vs. many." It is about whether Python can allocate memory once or must do it repeatedly, whether iteration happens in C or in the bytecode interpreter, and whether you are adding an object to a list or unpacking an iterable into a list. Understanding the full landscape — how +=, +, unpacking, insert, list comprehensions, and deque each fit into the picture, and where thread safety begins and ends — is what separates code that works from code that works efficiently across every loop, every data pipeline, and every function you write.

References

CPython source: Objects/listobject.c — list_resize, list_extend, app1 functions (github.com/python/cpython) • Python Wiki: TimeComplexity • Laurent Luce, "Python list implementation" • Notes on CPython List Internals — Analysis of the ~12.5% growth factor vs. Java (50%) and Ruby (100%) • Artem Golubin, "Optimization tricks in Python: lists and tuples" • PEP 448 — Additional Unpacking Generalizations • Python docs: collections.deque • PEP 703 — Making the Global Interpreter Lock Optional in CPython