PEP 659: How Python Learned to Specialize Itself at Runtime

Every time a Python function runs, CPython has to figure out what its instructions mean — and for most of Python's history, it had to figure that out from scratch on every single execution. PEP 659 changed that. Authored by Mark Shannon and first posted to the Python-Dev mailing list on May 12, 2021, it introduced the specializing adaptive interpreter: a mechanism that lets CPython observe what types are flowing through a piece of code, replace generic bytecode instructions with faster type-specific variants on the fly, and fall back gracefully when its assumptions break. It shipped as part of Python 3.11 in October 2022 and is now a permanent, foundational layer of the CPython execution model.

PEP 659 is classified as an Informational PEP, which means it documents a design decision rather than proposing a language-visible feature. That classification sparked a minor debate on the Python-Dev list when Shannon first introduced it. Some contributors felt the scale of the change warranted a Standards Track PEP. Shannon's position was pragmatic: the implementation could be rolled out incrementally without touching the language, the standard library, or any public API, so an informational document was the right vehicle. The PEP reached Final status after the implementation was confirmed complete and stable in CPython 3.11, with some groundwork laid as early as 3.10. Its canonical technical documentation now lives in the Python 3.11 What's New page, to which peps.python.org redirects.

The Problem: A Generic Interpreter Running Typed Code

CPython compiles Python source code to bytecode — a compact, platform-independent sequence of instructions that a virtual machine then interprets one at a time. The bytecode for x + y is a single BINARY_ADD instruction (or its modern equivalent). That instruction has to handle every possible combination of types: two integers, two floats, a string and a string, a list and a list, or any user-defined type that implements __add__. Every time it runs, it checks the types of its operands, dispatches to the right implementation, and only then performs the actual operation.

For programs that are consistently typed — which describes the overwhelming majority of real Python code — this per-execution type-checking is unnecessary overhead. A loop that adds integers a million times pays the type-dispatch cost a million times, even though the answer is always the same: these are integers, use integer addition. The solution that JIT-compiled languages like V8 (for JavaScript) and LuaJIT had long used is specialization: once you know what types appear in a hot code path, generate a version of the operation optimized for exactly those types. PEP 659 brought a variant of that approach to CPython without requiring a JIT compiler.

From the Rationale

PEP 659's Rationale section observes that while specialization is commonly associated with JIT compilers, research demonstrates that an interpreter can achieve meaningful speedups through specialization alone — in some benchmarks outperforming a naive compiler. This justified pursuing the approach without waiting for a full JIT. (peps.python.org/pep-0659)

The academic foundation Shannon cited in PEP 659 is a body of research on inline caching and quickening. The closest prior art he acknowledged is the paper Inline Caching meets Quickening, which demonstrated that inline caches and in-place instruction replacement could be combined effectively. PEP 659 extended that model by adding the ability to de-optimize cheaply, which makes the approach robust even when type assumptions turn out to be wrong. Earlier attempts at specializing Python bytecode — including internal experiments at Meta (Cinder) and Pyston — had tackled similar problems, but Shannon argued in the mailing list announcement that PEP 659's approach offers better interpreter performance and is more compatible with CPython's collaborative, open-source development model.

PEP Status

PEP 659 carries Type: Informational and Status: Final. It was created April 13, 2021, first posted publicly May 11, 2021, and finalized after Python 3.11's release. The author is Mark Shannon ([email protected]). The up-to-date implementation details now live at the Python 3.11 What's New documentation.

How the Adaptive Interpreter Works

The adaptive interpreter operates through three sequential stages: quickening, adaptive instruction execution, and specialization. Each stage has a defined role, and the system can move backward (de-optimize) as well as forward.

Stage 1 — Quickening

When a code object is first loaded, its bytecode is immutable. The first time that code object executes, CPython creates a mutable working copy of the bytecode. During this copy process — called quickening — every instruction that could benefit from specialization is replaced with an "adaptive" variant. The generic LOAD_ATTR becomes LOAD_ATTR_ADAPTIVE. The generic LOAD_GLOBAL becomes LOAD_GLOBAL_ADAPTIVE. Instructions that have no meaningful specialization path, such as POP_TOP, are left unchanged. According to experimentation documented in the PEP, roughly 25% to 30% of instructions in typical bytecode can be usefully specialized.

The quickened bytecode uses the same 16-bit instruction format as the original — an 8-bit opcode followed by an 8-bit operand — which means the tracing and debugging infrastructure can still fall back to the original bytecode without any special handling. This design decision was deliberate: it kept the change compatible with debugging tools and the existing bytecode inspection API.

Stage 2 — Adaptive Instructions and Inline Counters

Each adaptive instruction contains an inline counter stored in the first 16-bit entry of the instruction's inline data cache. Every time the adaptive instruction executes without yet specializing, it decrements this counter. When the counter reaches zero, the interpreter calls the corresponding specialization function — for example, _Py_Specialize_LoadAttr for LOAD_ATTR_ADAPTIVE. That function examines the types and values currently present, determines which specialized variant (if any) fits, and replaces the adaptive instruction in place with the faster variant.

Pro Tip

Because specialization happens at the granularity of individual bytecode instructions, de-optimization is trivial. There is no region of code that has been partially optimized and needs to be unwound. Each instruction is either specialized or it is not, and reverting it costs a single opcode write.

Stage 3 — Specialization and Saturating Counters

Once a specialized instruction is in place, it runs a fast path tailored to a specific type or context. Each specialized instruction also maintains its own saturating counter. When the instruction's fast-path assumptions are satisfied — the operand is the right type, the namespace hasn't changed, the object layout matches — the counter increments. If the assumptions are violated, the counter decrements and the generic operation is performed instead. If the counter reaches its minimum value, the instruction is de-optimized: its opcode is replaced with the adaptive variant, which will start the observation process again. This self-correcting loop is what makes the approach adaptive rather than just speculative.

python
# This loop will quickly get the integer-specialized BINARY_OP variant
# because the types are consistent across all iterations.
total = 0
for i in range(1_000_000):
    total += i  # After a few iterations, CPython replaces the generic
                # BINARY_OP with BINARY_OP_ADD_INT internally.

# Mixing types forces de-optimization back to the generic path.
mixed = 0
for i in range(1_000_000):
    mixed += i if i % 2 == 0 else float(i)  # int sometimes, float sometimes
    # The saturating counter decrements on type mismatches;
    # if it hits the minimum, the instruction reverts to BINARY_OP_ADAPTIVE.
Spot the Bug

A developer wants to benchmark how the adaptive interpreter specializes integer addition over time. The code below is meant to run a tight integer loop and report the total. It runs without a Python error — but there is a subtle logic bug that prevents the adaptive interpreter from ever stabilizing a specialized instruction on the inner operation. Click the line you think contains the bug, then submit your guess.

total = 0 ITERATIONS = 1_000_000 for i in range(ITERATIONS): value = i if i % 2 == 0 else str(i) total += value print(f"Result: {total}")

Hint: click a line to select it, then press Submit.

Inline Data Caches

Most specialized instructions need more information than fits in an 8-bit operand. Rather than allocating separate storage, PEP 659 stores this ancillary data in 16-bit entries that sit immediately after the instruction in the bytecode array — a technique called an inline data cache. Unspecialized and adaptive instructions skip over these cache entries; specialized instructions read from them directly. This eliminates a pointer dereference that would otherwise be required to look up cached type or namespace information.

The PEP's memory analysis shows that this approach does not meaningfully increase memory usage compared to Python 3.10's opcache scheme. On a 64-bit machine, a 3.10 "cold" code object uses 2 bytes per instruction. A 3.10 "hot" code object (one that has crossed the ~2,000 execution threshold to gain a cache) uses approximately 7.4 bytes per instruction on average. Python 3.11 with the adaptive interpreter uses 6 bytes per instruction — and it starts optimizing far sooner, after just a handful of executions, with no large threshold to cross.

Instruction Families: LOAD_ATTR and LOAD_GLOBAL Up Close

PEP 659 uses two instruction families as its primary worked examples. Both illustrate how a single generic instruction becomes a family of targeted variants, each handling exactly one common case efficiently.

The LOAD_ATTR Family

LOAD_ATTR is one of the most frequently executed instructions in Python. It loads a named attribute from whatever object sits on top of the evaluation stack. That covers object instance attributes, class attributes, module attributes, properties, slots, and any custom __getattr__ implementation. The generic path has to check all of these possibilities every time. The specialized variants each handle exactly one:

  • LOAD_ATTR_INSTANCE_VALUE — the attribute is stored directly in the object's value array and is not shadowed by a descriptor. This is the common case for plain instance attributes and works in concert with CPython's lazy dictionary optimization.
  • LOAD_ATTR_MODULE — the attribute is being loaded from a module object. The cache stores the module's keys version and the index of the attribute, so the lookup becomes an array index read after a single version check.
  • LOAD_ATTR_SLOT — the attribute lives in a slot defined by __slots__. The cache stores the slot offset, turning the lookup into a direct pointer dereference.
python
class Point:
    def __init__(self, x, y):
        self.x = x  # stored in the object's value array
        self.y = y

p = Point(3, 4)

# In a hot loop, LOAD_ATTR for p.x will be replaced by
# LOAD_ATTR_INSTANCE_VALUE after a few executions.
# The inline cache stores the offset of 'x' in the value array.
for _ in range(1_000_000):
    _ = p.x + p.y

The LOAD_GLOBAL Family

Before PEP 659, the C code for LOAD_GLOBAL in CPython 3.9 contained logic for checking whether to allocate a cache, looking up values in both the global namespace and the builtins namespace, and falling back when the cache wasn't valid — all mixed together in a single function. Shannon described this in the PEP as "complicated and bulky," and noted that it performed many redundant operations even in the supposedly optimized path.

The specialized family replaces that with two clean variants. LOAD_GLOBAL_MODULE handles the case where the name is found in the global namespace. It checks that the dictionary's key version has not changed (meaning no key has been added or removed), then reads the value from a stored index. Crucially, it does not need to check whether the values have changed — only the keys matter for determining whether the cached index is still valid. LOAD_GLOBAL_BUILTIN handles the case where the name falls through to the builtins namespace. It checks that no key has been added to the global namespace (which would shadow the builtin) and that the builtins namespace itself is unchanged.

Memory, Performance, and What You Don't Have to Change

One of PEP 659's most important design commitments is stated plainly in its Compatibility section: there is no change to the language, the standard library, or any public API. The only ways a user can observe the adaptive interpreter's presence are through timing execution, inspecting bytecode with debugging tools, or measuring memory consumption. Code that ran on Python 3.10 runs identically on Python 3.11 — it simply runs faster.

Memory Cost in Practice

The PEP documents the per-instruction memory cost across versions on a 64-bit machine:

Version / State Bytes per Instruction Specialization Rate
Python 3.10 (cold — before ~2,000 executions) 2 bytes 0%
Python 3.10 (hot — after threshold is crossed) ~7.4 bytes ~15%
Python 3.11 (adaptive interpreter always active) 6 bytes ~25%

The breakeven point — where 3.10's memory use equals 3.11's — occurs when approximately 70% of code objects are "hot" under 3.10's definition. In most real applications, many functions are called infrequently, meaning 3.10's cache is never allocated for them. In those scenarios, 3.11 uses slightly more memory overall, but the PEP characterizes the difference as "not by much." For applications with consistently hot code — tight loops, computational pipelines, server handlers — 3.11 can use less memory than 3.10 while performing significantly better.

25% avg speedup — Python 3.11
3.11 ships with adaptive interpreter
~50% ceiling on hot workloads
6 B bytes / instruction (3.11)

Measured Speedups

PEP 659 is careful about performance claims. The document states that speedups appear to be in the range of 10% to 60% depending on the workload, with "extensive experimentation" suggesting a ceiling of around 50%. The PEP also notes that the largest contributors to speedup are attribute lookup, global variable access, and function calls — which are precisely the operations that LOAD_ATTR, LOAD_GLOBAL, and the CALL family target. A smaller fraction of the gain comes from super-instructions (instructions that span multiple logical operations) and other optimizations that quickening enables.

When Python 3.11 shipped, the Python core team reported an average 25% speedup on the standard pyperformance benchmark suite relative to Python 3.10. Some workloads saw considerably more. The adaptive interpreter was the single largest contributor to that improvement, though it was accompanied by other changes including faster Python-to-Python function calls (a collaboration between Shannon and Pablo Galindo) and improved frame handling.

Performance Context

In September 2022, Michael Kennedy framed the release on Talk Python To Me Episode 381 as a turning point: existing Python code could see speedups exceeding 25% with no changes to the code itself — a claim that the Python 3.11 pyperformance benchmarks went on to substantiate.

Writing Specialization-Friendly Code

The adaptive interpreter specializes more aggressively when types are consistent. If a function adds integers in a loop, keep the operands as integers rather than mixing in floats partway through — even a single type mismatch decrements the saturating counter and may cause de-optimization. Tools like Specialist (created by Faster CPython team member Brandt Bucher) can visualize which instructions in your code are being specialized and which are falling back to the generic path.

How to Write Specialization-Friendly Python Code

The adaptive interpreter specializes on types it observes consistently. These steps help ensure CPython can stabilize fast-path instructions rather than cycling between specialization and de-optimization.

  1. Keep types consistent through hot loops. Every iteration that passes a different type through the same operation decrements the saturating counter. If that counter reaches its minimum, the specialized instruction reverts to its adaptive form and the observation cycle restarts. Integer-only loops, float-only loops, and string-only loops all specialize cleanly; mixed-type loops often do not.
  2. Avoid type-alternating expressions inside tight loops. A ternary like i if i % 2 == 0 else str(i) alternates between integer and string on every other iteration. The downstream operation sees a type mismatch every other call, which is enough to prevent stabilization entirely.
  3. Inspect bytecode with the dis module. After warming up a function, import dis; dis.dis(your_function) reveals whether adaptive or specialized instruction variants are active. In Python 3.11+, specialized opcodes appear in the output with names like LOAD_ATTR_INSTANCE_VALUE or BINARY_OP_ADD_INT, confirming that specialization has occurred.
  4. Use Specialist to visualize specialization coverage. The Specialist tool (by Brandt Bucher) renders a color-coded view of bytecode showing which instructions are specialized and which are falling back to generic paths — useful for identifying bottlenecks.
  5. Warm up functions before benchmarking. Specialization does not happen on the first call. Run the target function several times before taking timing measurements so the adaptive interpreter has had time to observe types and install specialized instructions.

From PEP 659 to the Tier-2 JIT: The Lineage

PEP 659 explicitly deferred, rather than rejected, the idea of just-in-time compilation. That deferral was not rhetorical. The specializing adaptive interpreter was designed from the beginning as the profiling and optimization substrate on which a JIT compiler could be built. By generating rich runtime information — which types flow through which instructions, what memory layouts objects have, which paths through a function execute most often — the adaptive interpreter gives a JIT compiler exactly the intelligence it needs to generate efficient machine code.

Python 3.13 (released October 7, 2024) introduced that JIT compiler experimentally under PEP 744, authored by Brandt Bucher. The architecture that connects the two is described directly in PEP 744: the specializing adaptive interpreter serves as the Tier 1 optimizer — the profiling layer that generates runtime intelligence a JIT compiler can then act on. When Tier 1 bytecode becomes hot enough, CPython translates it into a new internal representation called Tier 2 IR (micro-ops), which is better suited to machine code generation than the stack-based bytecode format. Several optimization passes run over the Tier 2 IR before it is either interpreted by a secondary interpreter (used mainly for debugging the pipeline) or compiled to native machine code by the copy-and-patch JIT. PEP 744 notes that the specializing adaptive interpreter "delivers significant performance improvements" while simultaneously collecting profiling data that the Tier 2 optimizer consumes. The full specification is at peps.python.org/pep-0744.

The JIT in Python 3.13 required building CPython with the --enable-experimental-jit flag and was disabled by default. Python 3.14 (released October 7, 2025) raised the profile of the JIT meaningfully: official Windows and macOS binaries now ship with it built in. It remains off by default, but can be enabled in those builds simply by setting the PYTHON_JIT=1 environment variable — no custom build required. Performance in 3.14 was mixed: on some benchmarks the JIT delivered up to 20% speedups; on others — particularly heavily recursive code — it ran measurably slower, and the team explicitly advised against enabling it for production use. Python 3.13 also introduced the Tier 2 interpreter as a separate, inspectable stage that can be enabled without the full JIT, using --enable-experimental-jit=interpreter. Both the Tier 1 and Tier 2 execution paths are generated from the same bytecode definition DSL, which means updates to instruction definitions propagate automatically to the JIT backend.

Python Source Generic Bytecode Tier 1 Adaptive Interpreter (PEP 659) Quickening + Specialization Tier 2 IR (micro-ops) Copy-and-Patch JIT (PEP 744) hot code
CPython execution pipeline — PEP 659's adaptive interpreter (Tier 1) feeds runtime profiling data into the Tier 2 micro-op IR and optional copy-and-patch JIT (PEP 744, Python 3.13+).

Python 3.12, released October 2023, continued refining the adaptive interpreter without yet adding the JIT. The 3.12 release cycle improved specialization coverage, added new specialized variants for additional instruction families, and tightened the inline caching implementation for production robustness. Benchmarks from the 3.12 cycle showed an average 4% gain over 3.11 on the standard pyperformance suite — a smaller increment than 3.11's headline number, but meaningful given that it arrived without the step-change that a new execution model provides. Python 3.13 added approximately 7% on top of that, and Python 3.14, released October 7, 2025, contributed roughly 8% more. By the time Python 3.14 arrived, the cumulative improvement from Python 3.10 stood at close to 50% across diverse workloads — a result that the Faster CPython project, backed initially by Microsoft and led by Guido van Rossum, had identified as the medium-term target. LWN's coverage of PyCon US 2025 noted that around 93% of the project's benchmark suite had improved since the effort began, and nearly half of those benchmarks were more than 50% faster than their Python 3.10 baselines.

In May 2025, Microsoft canceled its support for the Faster CPython project and laid off the majority of the team as part of a broader reduction affecting roughly 6,000 employees worldwide. The core Python developers let go included Mark Shannon (PEP 659's author and the team's technical lead), Eric Snow, and Irit Katriel. Mike Droettboom, a principal software engineering manager at Microsoft and a CPython core developer who remained employed, confirmed the situation publicly on Python Discourse, stating that Microsoft's backing for the project had ended and expressing concern for his colleagues who had been laid off. Droettboom also noted that the notifications went out while the team was en route to the Python Language Summit at PyCon US 2025 in Pittsburgh. Brandt Bucher, who built the copy-and-patch JIT and created the Specialist visualization tool, was not among those laid off and gave his planned PyCon talk on the JIT regardless, making clear his intention to continue JIT development. Community stewardship of the Faster CPython work was discussed publicly on Python Discourse within days of the announcement, with multiple core contributors committing to maintain the momentum.

As of Python 3.14 (released October 7, 2025), official Windows and macOS release binaries now include the experimental JIT built in — disabled by default, but testable by setting PYTHON_JIT=1. Performance in 3.14 was uneven: some benchmarks saw up to 20% speedups, others ran slower, and it remained unsuitable for production. The team attributed the mixed 3.14 results to limited JIT optimizer improvements in that cycle — primarily expanded bytecode coverage rather than new optimization passes — and framed it as a necessary foundation-building phase. By early 2026, community-led development driving Python 3.15 had delivered the first release in which enabling the JIT produces consistent, meaningful speedups for CPU-bound workloads. The Python 3.15 "What's New" documentation describes three principal improvements: an overhauled JIT tracing frontend with a dual dispatch mechanism (increasing code coverage by approximately 50%), elimination of reference count branches in JIT code, and a basic form of register allocation that avoids stack reads and writes. These changes combine to deliver approximately 11–12% geometric-mean speedup on AArch64 macOS and 5–6% on x86-64 Linux. The two GitHub issues framing the remaining 3.15 JIT work were thread safety (critical given the progress of free-threaded Python under PEP 779) and stack unwinding support to allow native debuggers and profilers to traverse JIT frames.

Frequently Asked Questions

Does PEP 659 require any changes to existing Python code?

No. PEP 659 makes no changes to the Python language, the standard library, or any public API. Code that runs on Python 3.10 runs identically on 3.11 and later — the adaptive interpreter operates transparently at the bytecode level. The only observable effects are faster execution times, minor differences in memory usage for consistently hot code, and changed output from low-level bytecode inspection tools such as dis.

Which operations benefit most from specialization?

The operations that gain the most are the ones executed most frequently in typical Python programs: attribute loading (LOAD_ATTR), global variable access (LOAD_GLOBAL), and function calls (CALL). Integer and float arithmetic also specialize effectively. Code that mixes types frequently — passing both integers and floats through the same operation in the same loop — sees smaller gains because the saturating counter mechanism detects the mismatch and reverts toward the generic path.

What is a saturating counter in CPython's adaptive interpreter?

Each specialized bytecode instruction maintains a small counter that increments when the instruction's fast-path type assumptions hold and decrements when they are violated. If the counter reaches its minimum value, CPython replaces the specialized instruction with its adaptive variant, which restarts the observation and specialization cycle. This self-correcting mechanism is what makes the system "adaptive" rather than merely speculative.

What is the relationship between PEP 659 and Python's JIT compiler?

PEP 659 was designed from the start as the profiling substrate for a future JIT compiler. PEP 744 (authored by Brandt Bucher), which introduced an experimental copy-and-patch JIT in Python 3.13, describes PEP 659's adaptive interpreter as the Tier 1 optimizer. Specialized bytecode that becomes hot enough gets translated into Tier 2 micro-ops, which are then compiled to native machine code by the JIT. As of Python 3.14, the JIT is included in official Windows and macOS binaries — disabled by default, enabled via PYTHON_JIT=1. Python 3.15 is the first release in which enabling the JIT delivers consistent, meaningful speedups for CPU-bound workloads: roughly 11–12% on AArch64 macOS and 5–6% on x86-64 Linux, achieved through an overhauled tracing frontend, reference count branch elimination, and basic register allocation.

How does the adaptive interpreter handle code that uses many different types?

When a specialized instruction encounters a type it was not specialized for, it falls back to the generic operation and decrements its saturating counter. If the counter bottoms out, the instruction de-optimizes to its adaptive form. The adaptive form will then re-profile over the next several executions. If the type mix is genuinely inconsistent, specialization may never stabilize — which is expected and correct behavior. The generic path is still correct; it is simply not as fast as a stable specialized path would be.

What happened to the Faster CPython team after Microsoft ended its funding?

In May 2025, Microsoft canceled its Faster CPython sponsorship as part of a company-wide reduction of roughly 6,000 positions. Mark Shannon (PEP 659's author and the team's technical lead), Eric Snow, and Irit Katriel were among those laid off. Mike Droettboom, who confirmed the news publicly, noted that the notifications arrived while the team was en route to the Python Language Summit at PyCon US 2025 in Pittsburgh. Brandt Bucher, author of PEP 744 and creator of the Specialist visualization tool, was not laid off and continued JIT development. Community stewardship of the effort was organized on Python Discourse within days. Python 3.15 then delivered the first JIT results that are consistently beneficial for CPU-bound workloads — a meaningful outcome given the project had lost its primary institutional backer less than a year earlier.

Check Your Understanding Question 1 of 5
    session score: 0 / 5

    Key Takeaways

    PEP 659 is one of those rare internal changes with measurable consequences for every Python developer. Whether you are new to CPython internals or working through python tutorials that touch on performance and bytecode, understanding this mechanism helps explain why Python 3.11 and later behave differently under the hood.

    1. No user action required: PEP 659 makes no changes to the Python language, standard library, or public API. Any code running on Python 3.10 runs identically on 3.11 and later — the speedup is automatic.
    2. Specialization works at the instruction level: Rather than optimizing whole functions or regions of code, CPython specializes individual bytecode instructions. This keeps de-optimization trivial — reverting a single opcode — and avoids the complex unwinding that larger-region specialization would require.
    3. Type consistency unlocks the fast path: Code that repeatedly uses the same types in the same operations will specialize aggressively and stay specialized. Mixing types, even occasionally, can trigger de-optimization via the saturating counter mechanism.
    4. Memory cost is modest and front-loaded: The inline data cache adds bytes per instruction, but 3.11's total per-instruction cost is lower than 3.10's in hot-code scenarios, and specialization begins after just a few executions rather than after a ~2,000 execution threshold.
    5. PEP 659 is the foundation, not the ceiling: The adaptive interpreter was explicitly designed to feed data into a future JIT compiler. PEP 744's copy-and-patch JIT (experimental in Python 3.13, included in official Windows and macOS binaries in Python 3.14) is a direct continuation of the architecture PEP 659 laid out, using the same instruction definition DSL and the same runtime profiling pipeline. In Python 3.15, the community-led team — continuing after Microsoft ended its Faster CPython funding in May 2025 — delivered the first release where enabling the JIT produces consistent, meaningful speedups for CPU-bound workloads, with approximately 11–12% geometric-mean gains on AArch64 macOS and 5–6% on x86-64 Linux.

    PEP 659 represents something unusual in CPython's history: a large-scale internal redesign that delivered concrete, measurable performance gains without asking anything of users or library authors. It changed the fundamental execution model of the interpreter, introduced inline caching, and created the profiling infrastructure that a JIT compiler would later exploit — all while maintaining complete backward compatibility. For anyone writing code that runs in tight loops, processes data at volume, or calls functions millions of times, every Python 3.11-or-later runtime already has this optimization running silently on their behalf. The pipeline that began with PEP 659 now extends through PEP 744's copy-and-patch JIT, which shipped in official Python 3.14 binaries for Windows and macOS and reached its first consistent CPU-bound performance targets in Python 3.15 — despite Microsoft ending its Faster CPython sponsorship in May 2025. Community stewardship of the effort is ongoing; Brandt Bucher and a growing group of contributors have continued JIT development into the Python 3.15 cycle and beyond. The source of truth for the specializing adaptive interpreter remains the PEP itself at peps.python.org/pep-0659, and the Python 3.15 What's New page at docs.python.org/3.15/whatsnew/3.15.html documents the latest JIT improvements, alongside the python/cpython repository.

    Sources & Further Reading

    All claims in this article are drawn from primary sources. The list below covers everything cited in the text so readers can verify any specific point directly.

    • PEP 659 — Specializing Adaptive Interpreter (Mark Shannon, 2021): peps.python.org/pep-0659 — canonical specification for quickening, adaptive instructions, inline caches, and the memory model.
    • PEP 744 — JIT Compilation (Brandt Bucher, 2023): peps.python.org/pep-0744 — defines the Tier 1 / Tier 2 architecture and the copy-and-patch JIT.
    • What's New in Python 3.11 — PEP 659: docs.python.org/3/whatsnew/3.11.html — canonical redirect target for the PEP; source for the 25% benchmark figure.
    • What's New in Python 3.15: docs.python.org/3.15/whatsnew/3.15.html — documents the dual dispatch frontend, reference count elimination, register allocation, and measured JIT speedups.
    • Python-Dev mailing list — PEP 659 announcement (May 12, 2021): mail.python.org — Shannon's original announcement, including the informational vs. standards-track discussion.
    • CPython Steering Council Issue #180: github.com/python/steering-council — request to mark PEP 659 as Final after implementation was confirmed complete.
    • Community Stewardship of Faster CPython (Mike Droettboom, May 2025): discuss.python.org — Droettboom's public post confirming Microsoft's cancellation of Faster CPython and the layoffs.
    • Following up on the Python JIT (LWN.net, PyCon US 2025 coverage): lwn.net/Articles/1029307 — source for per-version speedup figures (3.12: 4%, 3.13: 7%, 3.14: ~8%) and benchmark suite statistics.
    • Python 3.15's JIT is now back on track (Python Insider / blog.python.org, March 2026): blog.python.org/2026/03/jit-on-track — community account of how the 3.15 JIT targets were reached after Microsoft funding ended.
    • Reflections on 2 years of CPython's JIT Compiler (Ken Jin, 2025): fidget-spinner.github.io — detailed technical account of why 3.13 and 3.14 JIT gains were limited and what changed for 3.15.
    • Talk Python To Me, Episode 381 — Specializing Adaptive Interpreter (Michael Kennedy & Brandt Bucher, September 2022): talkpython.fm/episodes/show/381 — performance framing quote and Specialist tool introduction.
    • Microsoft layoff coverage — The Register (May 16, 2025): theregister.com — corroborating source for layoff details and Droettboom's LinkedIn statement.