Why Does Python Have a GIL? The Full Story Behind the Global Interpreter Lock

If you have spent any time working with multithreaded Python code, you have almost certainly run into the Global Interpreter Lock. The GIL is one of the most debated, most misunderstood, and most consequential design decisions in the history of programming languages. It is the reason a 256-core server running a multithreaded Python application will leave 255 of those cores sitting idle. It is also the reason Python's C extensions are remarkably simple to write, and why the language became as wildly popular as it did.

What the GIL Actually Is

The GIL is a mutex (mutual exclusion lock) built into CPython, the reference implementation of Python that nearly all developers use. It allows only one thread to execute Python bytecode at any given time, even on a machine with dozens of CPU cores.

Think of it like a single steering wheel in a car full of drivers. No matter how many people are in the vehicle, only one person can steer at a time. Every other driver has to wait their turn.

Here is what that looks like in practice:

import threading
import time

def cpu_intensive_task():
    """A deliberately expensive computation."""
    total = 0
    for i in range(20_000_000):
        total += i * i
    return total

# Run two threads on a multi-core machine
start = time.perf_counter()

t1 = threading.Thread(target=cpu_intensive_task)
t2 = threading.Thread(target=cpu_intensive_task)

t1.start()
t2.start()
t1.join()
t2.join()

elapsed = time.perf_counter() - start
print(f"Two threads: {elapsed:.2f}s")

# Now run them sequentially for comparison
start = time.perf_counter()
cpu_intensive_task()
cpu_intensive_task()
elapsed = time.perf_counter() - start
print(f"Sequential: {elapsed:.2f}s")

On standard CPython, the two-thread version will take roughly the same amount of time as the sequential version, sometimes even longer due to thread-switching overhead. The GIL ensures that even though both threads exist, only one is executing Python bytecode at any moment.

Why the GIL Was Created: The Real History

Python was born over Christmas break in 1989 when Guido van Rossum, working at the Centrum Wiskunde & Informatica (CWI) in the Netherlands, decided to write an interpreter for a new scripting language. The first version of Python was released in 1991. At that time, personal computers had a single CPU core by default — multicore hardware was exotic workstation territory.

The GIL was introduced in 1992 by van Rossum when threading support was first added to CPython. Victor Stinner, a CPython core developer, traced the exact origin in a 2018 blog post where he dug through git blame to find the original commit. It was authored by van Rossum on August 4, 1992, and its commit message mentions splitting pythonmain.c in two and adding "new optional built-in threadmodule.c, built upon Sjoerd's thread.{c,h}." The function init_save_thread() in that commit is the ancestor of today's GIL.

Van Rossum explained the reasoning on the Lex Fridman Podcast (Episode 341, 2022). The original thinking, as he described it: the approach was to provide something that looks like threads, and as long as you only have a single CPU — which most computers at the time did — it feels just like threads. Speed was not the goal. Correctness on single-core hardware was.

The logic was pragmatic. In 1992, multicore processors were exotic hardware found in expensive workstations and servers, not in the desktop machines many Python developers used. Designing a thread-safe interpreter without a global lock would have been dramatically more complex, and for single-core machines it would have provided zero benefit while imposing a constant performance tax.

The GIL exists primarily because of how CPython manages memory. CPython uses reference counting as its primary garbage collection mechanism. Every Python object has a reference count that tracks how many variables or data structures point to it. When that count drops to zero, the object is immediately deallocated.

Here is how reference counting works under the hood:

import sys

a = [1, 2, 3]          # Reference count: 1
b = a                   # Reference count: 2
print(sys.getrefcount(a))  # 3 (includes the temporary reference from getrefcount itself)

del b                   # Reference count drops back
del a                   # Reference count hits 0 -> object deallocated

Without the GIL, two threads could simultaneously increment and decrement an object's reference count, creating a race condition. One thread might read a reference count of 1, another thread might also read 1, both decrement it to 0, and the object gets freed twice, which corrupts memory and crashes the interpreter. Alternatively, a decrement could be lost entirely, causing a memory leak.

Making reference count operations atomic (thread-safe at the hardware level) is possible, but in 1992, and even today, it comes with a significant performance penalty. Atomic operations are more expensive than regular operations because they require CPU cache coordination across cores. Larry Hastings demonstrated during his "Gilectomy" work at PyCon 2016 that simply switching to atomic reference counting introduced a 30% performance hit on single-threaded code.

Writing on the Python-3000 mailing list in 2007, van Rossum urged developers to prefer process-based concurrency over shared-memory threading, citing the combined hazards of locking, deadlocks, livelocks, and nondeterminism that threading introduces. — Guido van Rossum, Python-3000 mailing list, 2007

Van Rossum made this argument in response to a discussion about removing the GIL for Python 3. He argued that multiple processes with inter-process communication were a better concurrency model than shared-memory threading for many use cases. The position was pragmatic, not dogmatic: van Rossum was not claiming threads were universally wrong, but that the cost of making CPython's internals fully thread-safe without a global lock would produce a language harder to write extensions for and slower for single-threaded workloads — a trade he did not consider worthwhile at the time.

What the GIL Protects (And What It Does Not)

The GIL protects more than just reference counts. It provides implicit thread safety for all of CPython's internal data structures: dictionaries, lists, the module import system, the bytecode compiler, and essentially every piece of the interpreter's internal state.

This implicit safety is one of the GIL's underappreciated benefits. Writing C extensions for CPython is far simpler than writing thread-safe C extensions for a language without a global lock. Extension authors have historically relied on the GIL to protect their global state, meaning they never needed to implement fine-grained locking themselves. This low barrier to entry helped CPython accumulate an enormous ecosystem of C extensions, which in turn fueled Python's adoption in scientific computing, data science, and machine learning.

However, the GIL does not protect your application-level data structures from logical race conditions:

import threading

counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1  # This is NOT atomic!

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start()
t2.start()
t1.join()
t2.join()

print(f"Expected: 2000000, Got: {counter}")
# The result will almost certainly be less than 2,000,000

Common Misconception

The GIL prevents the interpreter from crashing due to internal race conditions, but it does not prevent logical bugs in your own code. The operation counter += 1 compiles to multiple bytecode instructions (load, add, store), and the GIL can release between any of them.

When the GIL Became Infamous

For the first decade or so of Python's existence, the GIL was a non-issue. Single-core machines were the norm, and Python's threading module worked perfectly well for I/O-bound concurrency: handling network connections, reading files, and managing user interfaces.

The problem became visible as multicore processors went from exotic to standard. By the mid-2000s, quad-core consumer CPUs were common, and developers noticed that Python simply could not use them from threaded code. A CPU-bound Python program running eight threads on an eight-core machine would not run eight times faster — it would run at roughly the same speed as a single thread, because only one thread could execute Python bytecode at a time.

The GIL became a genuine pain point for scientific computing. Researchers who had chosen Python for its clean syntax and excellent libraries found that parallelizing their calculations required workarounds: launching separate processes instead of threads, using C extensions like NumPy that released the GIL during computation, or moving to distributed computing frameworks. These solutions worked, but they were cumbersome and represented a significant ergonomic cost.

Note on I/O-Bound Work

The GIL is released during I/O operations, so Python threads work well for network requests, file reads, database queries, and anything else that spends time waiting. The performance problem is specific to CPU-bound work — calculations that keep the CPU busy.

Attempts to Remove the GIL

The GIL has been a target for removal since at least 1996. Greg Stein produced a patch that year against Python 1.4 that removed the GIL and replaced it with fine-grained locking throughout the interpreter. It worked, but single-threaded performance slowed by nearly twofold. As van Rossum later explained, that meant on two CPUs you could get only slightly more work done without the GIL than on a single CPU with it. The patch was never merged.

The most notable attempt in recent history was Larry Hastings' "Gilectomy" project, which he presented at PyCon 2016 and updated annually through 2019. Hastings removed the GIL and achieved genuine parallelism, but the single-threaded performance penalty — caused primarily by the overhead of atomic reference counting — remained stubbornly around 30–40% even after years of optimization work. The implication was clear: a solution that slows down the many Python programs that don't need parallelism in order to benefit the few that do is not acceptable.

The problem needed a different approach. That approach came from Sam Gross.

The PEPs That Changed Everything

PEP 703: Making the GIL Optional (Python 3.13+)

In July 2023, the Python Steering Council announced its intention to accept PEP 703, authored by Sam Gross, with formal acceptance confirmed in October 2023. Rather than removing the GIL outright, PEP 703 proposes making it optional through a compile-time flag, creating a separate "free-threaded" build of CPython. The GIL-enabled build remains the default, ensuring no performance regression for existing users. The free-threaded build is available for those who need it.

Gross's approach to reference counting was fundamentally different from Hastings'. Instead of atomic operations, Gross used "biased reference counting," a technique where reference count operations from the object's owning thread are cheap (non-atomic), while operations from other threads use a slower but thread-safe path. This dramatically reduced the overhead compared to naive atomic reference counting.

But biased reference counting was only one piece of the puzzle. Gross introduced two additional mechanisms that work together with it:

Deferred reference counting handles objects that are frequently accessed by many threads concurrently — top-level functions, code objects, modules, and methods. Rather than updating their reference counts atomically on every access (which would be prohibitively expensive), deferred reference counting batches those updates and processes them during garbage collection. This removes per-access atomic overhead on the hottest paths in any typical Python program.

Object immortalization takes the optimization further. Common objects that will always exist — None, True, False, small integers in the range -5 to 256, interned strings, PyTypeObjects — are designated as immortal. Their reference counts are set to a sentinel maximum value (UINT32_MAX), meaning Py_INCREF and Py_DECREF become effectively no-ops for them. This eliminates an enormous number of reference count operations entirely, since these objects appear constantly in virtually every Python program.

The nogil implementation also replaced CPython's internal pymalloc allocator with mimalloc, a modern allocator designed from the ground up for multi-threaded use. mimalloc uses thread-local memory pools that reduce cross-thread contention and improve memory locality — both critical for performance when threads are genuinely running in parallel.

The combined effect of these four mechanisms — biased reference counting, deferred reference counting, object immortalization, and mimalloc — is what makes free-threaded Python viable where the Gilectomy was not. It is worth understanding that none of these are workarounds. They represent a fundamentally redesigned memory management model for CPython, one that happens to also benefit single-threaded performance in several ways even as it enables thread safety.

There is a cost, however. The free-threaded build currently consumes approximately 15–20% more memory than the standard GIL-enabled build, as measured by the pyperformance benchmark suite. PEP 779 accepted a hard ceiling of 20% memory overhead for Phase II, noting that higher memory use is the inherent cost of safe, efficient free-threading and is unlikely to approach the GIL build's consumption without significant performance sacrifices.

Python 3.13, released in October 2024, shipped the free-threaded build as an experimental feature (Phase I of PEP 703). The single-threaded performance penalty at that point was approximately 40%, and the free-threaded build was explicitly marked as experimental with no stability guarantees.

PEP 684: A Per-Interpreter GIL (Python 3.12)

Before PEP 703, Eric Snow spent years working on subinterpreter isolation in CPython, culminating in PEP 684, which gives each subinterpreter its own GIL. Since CPython has supported subinterpreters (separate Python interpreters running in the same process) via the C-API since Python 1.5 in 1997, the idea was to make them useful for parallelism by stopping them from sharing a single GIL.

The challenge was enormous. As Snow described in his PyCon 2023 talk, CPython was written with the assumption that interpreter state was isolated, but in practice, that was far from the truth. Thousands of global variables holding runtime state had accumulated over the decades. PEP 684 was implemented for Python 3.12.

PEP 734: Multiple Interpreters in the Stdlib (Python 3.14)

PEP 554, originally drafted by Eric Snow in 2017, proposed exposing subinterpreters through a standard library module. After six years of iteration and growing complexity, PEP 554 was formally superseded by PEP 734, which Snow introduced in November 2023 as a leaner, more focused successor. PEP 734 was accepted and implemented for Python 3.14, adding the concurrent.interpreters module to the standard library. This makes PEP 684's per-interpreter GIL accessible from Python code rather than just the C-API, and also introduces a concurrent.futures.InterpreterPoolExecutor for higher-level parallel use.

Combined, PEP 684 and PEP 734 provide one complete path to parallelism: isolated interpreters, each with their own GIL, communicating through cross-interpreter queues that pass data by value. This is meaningfully different from free-threading: interpreters share no object state by default and communicate explicitly, trading some flexibility for a much simpler thread-safety story. For workloads that can be decomposed into isolated tasks, the multi-interpreter model may actually be easier to adopt correctly than free-threading.

PEP 779: Criteria for Supported Status for Free-Threaded Python

PEP 779, authored by Thomas Wouters, Matt Page, and Sam Gross, defines the criteria for moving free-threaded Python from experimental (Phase I) to officially supported (Phase II). The accepted text establishes performance guardrails: the free-threaded build must stay within a 15% single-threaded performance hit (the build was already around 10% at the time of writing, but 15% serves as the hard ceiling) and no more than a 20% increase in memory consumption. This PEP was accepted for Python 3.14.

PEP 659: Specializing Adaptive Interpreter

PEP 659, while not directly about the GIL, is critical to the free-threaded story. It introduced the specializing adaptive interpreter that dynamically optimizes frequently executed bytecode. In Python 3.13's free-threaded build, this interpreter was disabled for thread-safety reasons, contributing to the roughly 40% single-threaded performance penalty. In Python 3.14, the specializing interpreter was re-enabled in a thread-safe way, which was one of the biggest factors in bringing the penalty down to 5–10%.

Where Things Stand Today: Python 3.14

Python 3.14, released on October 7, 2025, represents a major milestone. The free-threaded build is now officially supported (Phase II of PEP 703).

Real benchmarks tell the story. On CPU-bound multithreaded workloads, the Python 3.14 free-threaded build demonstrates near-linear scaling across cores — performance that was impossible under the GIL. Python 3.13's free-threaded build showed meaningful but more limited gains, held back by the disabled specializing interpreter. On single-threaded performance, the free-threaded build runs at roughly 90–91% of the speed of the standard interpreter, consistent with the 5–10% penalty documented in PEP 779.

Here is what free-threaded Python looks like in practice with 3.14:

import threading
import time
import sys

def compute_primes(limit):
    """Sieve-like prime counting, deliberately CPU-bound."""
    primes = []
    for num in range(2, limit):
        is_prime = True
        for div in range(2, int(num ** 0.5) + 1):
            if num % div == 0:
                is_prime = False
                break
        if is_prime:
            primes.append(num)
    return len(primes)

LIMIT = 200_000

# Check GIL status
if hasattr(sys, '_is_gil_enabled'):
    print(f"GIL enabled: {sys._is_gil_enabled()}")

# Single-threaded baseline
start = time.perf_counter()
compute_primes(LIMIT)
single_time = time.perf_counter() - start
print(f"Single-threaded: {single_time:.2f}s")

# Multi-threaded with 4 threads
start = time.perf_counter()
threads = [threading.Thread(target=compute_primes, args=(LIMIT,)) for _ in range(4)]
for t in threads:
    t.start()
for t in threads:
    t.join()
multi_time = time.perf_counter() - start
print(f"4 threads: {multi_time:.2f}s")
print(f"Speedup: {single_time * 4 / multi_time:.1f}x vs sequential")

On a GIL-enabled interpreter, the four-thread version will take roughly four times as long as the single-threaded version — the threads exist but cannot run in parallel. On the free-threaded 3.14 build, all four threads execute simultaneously, and the total time approaches that of a single-threaded run, demonstrating near-linear scaling across cores.

What This Means for Your Code Right Now

The free-threaded build is not yet the default. You have to opt in. On macOS and Windows, the official Python 3.14 installers offer it as an optional component. On Linux, tools like uv make it straightforward:

# The 't' suffix denotes free-threaded
uv python install 3.14t
uv run --python 3.14t your_script.py

You can also enable or disable the GIL at runtime without rebuilding, using an environment variable or interpreter flag:

# Disable the GIL at startup via environment variable
PYTHON_GIL=0 python3.14t your_script.py

# Or using the -X flag
python3.14t -X gil=0 your_script.py

# Re-enable the GIL in a free-threaded build (useful for testing)
PYTHON_GIL=1 python3.14t your_script.py

You can check whether the GIL is active at runtime:

import sys

if hasattr(sys, '_is_gil_enabled'):
    if sys._is_gil_enabled():
        print("GIL is active")
    else:
        print("Running free-threaded (no GIL)")
else:
    print("Python version does not support free-threading check")

What "Thread-Safe" Actually Means Now

The free-threaded build guarantees that CPython itself will not crash from concurrent access to Python objects. Pure Python code is thread-safe in the sense that the interpreter will not corrupt memory or produce undefined behavior from concurrent thread execution. However, this is a lower-level guarantee than what many developers intuitively expect.

Thread safety at the interpreter level does not mean your program logic is automatically correct under concurrent access. The counter example earlier in this article still applies in free-threaded Python: counter += 1 is still not atomic, and you will still get incorrect results without explicit synchronization. The difference is that previously the GIL was accidentally serializing many such operations for you in ways you might not have realized. With free-threading, those accidental guarantees are gone.

This has a practical consequence: code that worked correctly under the GIL but relied on its implicit serialization may behave incorrectly under free-threading even if it never crashes. The standard library's threading.Lock, threading.RLock, and the queue.Queue class remain the correct tools for protecting shared mutable state. The good news is that Python's built-in container types — dict, list, set — remain internally thread-safe at the operation level in the free-threaded build, meaning you won't get memory corruption from concurrent container access, but you can still get logical race conditions if your application logic assumes atomicity across multiple operations.

Choosing Between Free-Threading and Subinterpreters

Python 3.14 actually provides two distinct paths to multi-core parallelism, and they solve different problems. Understanding which one to reach for is not obvious from the documentation alone.

Free-threading (via the 3.14t build) is suited to workloads where threads need to share mutable objects directly. Scientific computing pipelines where worker threads update a shared result array, web servers where thread-local state is minimal, and AI inference pipelines where pre- and post-processing logic needs to run alongside model execution are all good candidates. The cost is that you must reason carefully about thread safety in your own code, and your C extension dependencies must support the free-threaded build.

Subinterpreters (via concurrent.interpreters in the standard GIL-enabled build) are suited to workloads that can be decomposed into isolated tasks that communicate through explicit data passing. Each interpreter has its own GIL and its own module state. Objects are passed between interpreters by value through queues, not by reference. This isolation makes thread safety reasoning straightforward — you simply cannot accidentally share an object between interpreters without going through a queue. The trade-off is that you cannot share Python objects directly; communication happens through serialization.

For many workloads, the subinterpreter model will be easier to adopt correctly in the near term, because it requires no changes to C extension dependencies and imposes a programming model that naturally prevents the category of bugs that free-threading introduces.

Extension Compatibility Warning

If you import any C extension module that has not been explicitly marked as supporting free-threading, the GIL will automatically be re-enabled at runtime and a warning will be printed. As of early 2026, packages like NumPy and Pydantic work with the free-threaded build, but many others do not yet. The community-maintained tracker at py-free-threading.github.io monitors package compatibility.

Pro Tip

For I/O-bound workloads — web servers, API calls, file operations — the free-threaded build provides minimal benefit. The GIL was never the bottleneck there because it is released during I/O operations. The real gains are for genuinely CPU-bound work that can be parallelized across threads. If you are unsure whether your workload is actually CPU-bound, profile it first with cProfile or py-spy before switching builds.

The Bigger Picture

The GIL was not a mistake. It was a rational engineering decision made for a single-core world that enabled Python to become one of the most widely used programming languages ever created. The simplicity it provided to CPython's internals and its C extension ecosystem was a genuine competitive advantage. Libraries like NumPy, SciPy, and eventually PyTorch were easier to write because extension authors could rely on the GIL as an implicit serialization guarantee. The low barrier to entry for C extension development is one reason Python became the dominant language for scientific computing.

What changed was the hardware. Multicore processors became universal, and Python's inability to use them from threaded code became an increasingly visible limitation, particularly for scientific computing and machine learning workloads where frameworks like PyTorch had to build elaborate workarounds. The GIL was never the bottleneck for PyTorch's tensor operations — those happen in C++ and CUDA — but the Python orchestration layer that manages training loops, data pipelines, and environment logic could not scale with the hardware beneath it.

There is also a deeper question the GIL's removal forces into the open: what does Python want to be? For three decades, the language occupied a comfortable position where threading was available but fundamentally limited, and developers were quietly nudged toward either multiprocessing or I/O-concurrency via asyncio. Free-threading changes that calculus. Threads can now actually parallelize CPU work, which means Python programs must be written with thread safety in mind in ways they previously could largely ignore.

This is not a regression. It is a maturation. Languages that offer genuine parallelism — Go, Rust, Java, C++ — require developers to reason about concurrent access. Python is joining that group. The difference is that Python is doing it incrementally and optionally, which gives the ecosystem time to adapt without the kind of hard break that the Python 2-to-3 transition caused. When the Steering Council accepted PEP 703, they were explicit that the rollout must be gradual and that the community must be able to roll back changes that prove too disruptive — a deliberate choice to stretch the transition over years rather than repeat the fracture of the Python 2-to-3 migration.

The path forward has three phases. Phase I (Python 3.13) made free-threading experimental and available. Phase II (Python 3.14) made it officially supported. Phase III — making the free-threaded build the default — has no timeline yet. The Steering Council has been explicit that the decision will depend on community readiness, ecosystem-wide adoption, and demonstrated benefit. A future PEP will govern that transition.

For developers today, the practical position is straightforward. If your workload is I/O-bound, nothing changes. If your workload is CPU-bound and you have been using multiprocessing as a workaround, you now have two new options worth evaluating: the free-threaded build for workloads that benefit from shared state, and concurrent.interpreters for workloads that can be decomposed into isolated tasks. Both are production-ready in Python 3.14. Neither requires you to abandon the GIL-enabled build if you are not ready.

The GIL is not dead. But after three decades, it is finally, genuinely, optional.

Sources referenced in this article include the official PEP documents (PEP 703, PEP 684, PEP 734, PEP 659, PEP 779) available at peps.python.org; PEP 554, which was superseded by PEP 734; the Steering Council's October 2023 acceptance notice for PEP 703 (discuss.python.org); Guido van Rossum's appearance on the Lex Fridman Podcast (Episode 341, 2022); van Rossum's 2007 post on the Python-3000 mailing list; Victor Stinner's 2018 blog post on tracing GIL history in CPython via git blame; Larry Hastings' PyCon 2016 talks on the Gilectomy; Eric Snow's PyCon 2023 talk on subinterpreters; LWN.net's coverage of the Python Language Summit (2016, 2017, 2023); the Python 3.14 What's New documentation (docs.python.org); the py-free-threading.github.io compatibility tracker; the concurrent.interpreters module documentation (docs.python.org/3/library/concurrent.interpreters.html); and Sam Gross's PEP 703 implementation notes and nogil design documents.