Meta's Cinder: The CPython Fork That Powers Instagram

When Instagram needed to squeeze more performance out of Python without abandoning the language entirely, Meta's engineers did something drastic: they rewrote the runtime itself. The result was Cinder — a performance-oriented fork of CPython that introduced a method-at-a-time JIT compiler, inline bytecode caching, eager coroutine evaluation, and a novel approach to object immortality. Years later, the lessons baked into Cinder are reshaping CPython itself.

Python is the language of choice at Meta. It powers Instagram's web server, drives PyTorch for AI and machine learning research, and runs a significant share of internal tooling across the company. But Python's standard runtime, CPython, was not designed with Instagram's scale in mind. Instagram runs one of the world's largest deployments of the Django web framework, re-deploying its codebase roughly every ten minutes. At that cadence, startup time, memory efficiency, and raw throughput all matter enormously. Stock CPython, for all its strengths, left performance on the table that Meta could not afford to ignore.

Why Meta Needed a Custom Python Runtime

The pressure point is simple: Instagram's pre-fork web server architecture spawns many worker processes from a single parent process. Under Linux's fork model, child processes initially share the parent's memory pages. The moment any shared object is written to — even just to update a reference count — the kernel triggers a copy-on-write, creating a private copy of that memory page in the child. CPython's reference counting mechanism modifies object headers constantly, even during read-only workloads. For Instagram, this meant that shared memory was being rapidly converted into private memory across thousands of worker processes, driving up RAM consumption and hardware costs at scale.

Meta's engineers also observed that CPython's interpreter carried overhead that was justified for a general-purpose runtime but unnecessary in a controlled, profiled production environment. Every function call involved heap-allocating a full Python frame object. Every coroutine created an additional wrapper object even when the coroutine would resolve immediately without suspending. Generic bytecode opcodes performed dictionary lookups on every attribute access, even for attributes accessed repeatedly in tight loops. These are not bugs — they are the correct default behaviors for a dynamic language. But in a production environment where the hot code paths are well understood, they represent avoidable overhead.

Note

Cinder was originally built against CPython 3.8.5 and open sourced in April 2021. It was subsequently extended through CPython 3.10, 3.12, and the current meta/3.14 branch. The GitHub repository at facebookincubator/cinder is publicly available, though Meta is explicit that it is not polished for external use and does not commit to fixing external bug reports. Starting with Python 3.10, Meta began the parallel effort to extract Cinder's optimizations into the CinderX extension module to reduce fork-maintenance cost.

Cinder was Meta's answer to these constraints. Rather than replacing Python or wrapping it in another language, Meta's team forked CPython directly and began layering targeted optimizations on top of the existing interpreter. The goal was never to build a rival runtime for the broader community — as Meta stated on the project's GitHub page, "our goal in making this code available is a unified faster CPython." The research, in other words, was always meant to flow back into the language itself.

The Core Optimizations Inside Cinder

Cinder's performance story is built on several interlocking mechanisms, each addressing a different source of overhead in the standard CPython runtime. Understanding them individually is necessary to appreciate why Cinder can deliver the gains it does.

Shadowcode (Inline Bytecode Caching and Quickening). Standard CPython executes generic bytecode opcodes that perform dictionary lookups for attribute and method resolution on every call. Cinder's shadowcode system observes the execution of these generic opcodes in hot functions and dynamically replaces them with specialized variants tailored to the types observed at runtime. Attribute loads that previously walked a dictionary chain become direct indexed reads. This is conceptually similar to the adaptive specializing interpreter that CPython 3.11 later adopted — a direct lineage that Cinder helped prove out.

The Cinder JIT Compiler. Cinder includes a method-at-a-time JIT compiler implemented in C++, enabled via the -X jit flag or the PYTHONJIT=1 environment variable. The JIT compiles functions from bytecode through a control-flow graph, then a high-level intermediate representation (HIR), then a static single assignment (SSA) form, then a low-level IR, then register-allocated IR, and finally to native x86-64 machine code. This pipeline eliminates the bytecode dispatch overhead (the interpreter's switch/case loop), the stack push-and-pop model for passing values between opcodes, and the genericism of the standard interpreter.

"The JIT supports almost all Python opcodes, and can achieve 1.5–4x speed improvements on many Python performance benchmarks." — Meta / facebookincubator/cinder GitHub README

In production, Meta does not JIT-compile every function — that would make programs slower by spending time compiling rarely-called code. Instead, hot functions are identified through production profiling and listed in a JIT list file. The JIT compiles only those functions, keeping compilation overhead minimal while maximizing throughput on the critical path. The JIT also uses shadow frames — two stack-allocated words per function call that hold metadata sufficient to reconstruct a full Python frame object if needed (for tracebacks, debuggers, and so on) — instead of heap-allocating a frame object on every call.

The JIT also incorporates a function inliner, released in 2022. Function inlining allows the JIT to fold the body of a called function directly into the caller's compiled code, eliminating the call overhead entirely. Python's dynamism complicates inlining because global variable bindings are mutable — the function being called could in principle be replaced at runtime. Cinder's JIT handles this with guard instructions: before executing the inlined body, a fast pointer comparison verifies that the global binding is still the expected function. If it has changed, execution falls back to the interpreter. Because global rebinding is rare in production code, the fast path dominates.

Pro Tip

Meta uses the -X jit-list-file option to supply a text file of fully qualified function names (e.g., path.to.module:ClassName.method_name) for JIT compilation. Feeding this list from production profiling data rather than compiling everything is the key to avoiding JIT warm-up costs in a pre-fork server model.

Static Python. Static Python is an opt-in bytecode compiler that uses Python's existing type annotations to emit type-specialized opcodes. A static Python function begins with a CHECK_ARGS opcode that validates argument types against annotations at the boundary, raising TypeError on mismatch. Static-to-static calls skip this check because the types have already been validated. Attribute accesses on statically-typed instances become direct indexed loads from fixed memory offsets — the compiler resolves the offset at compile time, eliminating dictionary lookups entirely. Classes in Static Python modules are automatically given typed slots, and method calls are compiled to INVOKE_FUNCTION and INVOKE_METHOD opcodes that, when JIT-compiled, become direct calls to fixed memory addresses using the standard x86-64 calling convention — roughly the overhead of a C function call.

The performance ceiling for Static Python plus the Cinder JIT is striking. Meta reported achieving 18 times the throughput of stock CPython on a typed version of the Richards benchmark. On production workloads, Static Python was used to replace all Cython modules in Instagram's primary web server codebase with no performance regression — a significant operational win since it removed a compilation step and a separate language from the stack.

python
# Static Python example — type annotations drive specialized opcodes
import __static__
from __static__ import int64

def fibonacci(n: int64) -> int64:
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

# With the Cinder JIT enabled, calls between static functions
# resolve to direct memory addresses — C-function-call overhead.

Strict Modules. Python's import system allows modules to have arbitrary side effects when imported, which complicates static analysis and prevents many optimizations. Strict Modules enforce that a module does not mutate global state or produce side effects during import. A module opts in by including import __strict__ as its first statement. The module loader then validates this claim and stores the module as an immutable StrictModule object. Types defined within strict modules can be frozen via cinder.freeze_type(), making their method tables and attribute layouts immutable and enabling the JIT to use stable, pre-resolved call targets. Strict Modules also open the door to safe hot-reloading — a meaningful operational advantage when Instagram redeploys every ten minutes.

Immortal Objects. This is perhaps Cinder's most architecturally significant innovation. Instagram's memory problem stemmed from CPython's reference counting modifying object headers on every increment and decrement. In a pre-fork server, the parent process pre-loads Python modules and objects into memory. Child processes inherit this memory as shared pages. Any reference count update — even to a read-only object — triggers a copy-on-write in the kernel, creating a private memory page in the child. At Instagram's scale, this converted enormous amounts of shared memory into private memory, increasing the memory footprint per worker process and limiting how many workers could fit on a given machine.

Cinder's solution was Immortal Instances: a mechanism to mark Python objects as exempt from reference counting entirely. An immortal object's reference count is never incremented or decremented. It lives for the full lifetime of the interpreter. Because the kernel never sees a write to the immortal object's memory page, copy-on-write is never triggered, and the page stays shared across all child processes.

"Now, objects can bypass reference count checks and live throughout the entire execution of the runtime, unlocking exciting avenues for true parallelism." — Eddie Elizondo, Software Engineer, Instagram / Meta Engineering Blog, 2023

The implementation required adding explicit checks to the reference count increment and decrement routines — two of the hottest code paths in the entire runtime. This introduced a performance regression. Meta's engineers used careful register allocation to bring that regression down to approximately 2 percent across all systems, judging the memory efficiency gains well worth the tradeoff. In production at Instagram, Immortal Objects produced a roughly 5 percent performance win through reduced memory churn and copy-on-write pressure.

Eager Async Execution. Python's async/await model creates coroutine objects on every async function call and typically involves Task objects and event loop scheduling overhead even for coroutines that resolve immediately without suspending. In a fully async codebase, Cinder observed that many async functions could return a result directly on the first call, making the coroutine and Task machinery pure overhead. In its original fork-level implementation, Cinder's eager async execution allows an immediately-awaited async function to return its result directly, bypassing coroutine object creation entirely for the trivially-resolving case. When Meta contributed a version of this feature to CPython 3.12 as eager asyncio tasks, the upstream implementation was deliberately scoped down: coroutine and Task objects are still created when a result is available immediately, but the event loop scheduling step can be skipped and the task resolved inline. The reason for the more conservative upstream implementation is that full coroutine elision would be a semantic-breaking change incompatible with Python 3.11's TaskGroup API. This optimization is particularly valuable in Instagram's Django server where async code paths are common and many short-lived operations complete synchronously.

Lazy Imports. A less obvious but highly impactful optimization is Cinder's approach to module imports. Standard Python resolves all import statements eagerly at module load time. In a large codebase like Instagram's, this means that loading any module can trigger a cascading chain of imports across the entire dependency graph before the first line of business logic executes. Cinder's Lazy Imports system defers all imports by default — instead of resolving an import immediately, it substitutes a lightweight proxy object that triggers the actual import only when the imported name is first accessed. Unlike hand-picking individual imports to defer, Cinder's approach is comprehensive and transparent to application code.

From Cinder to CinderX: The Extension Strategy

Maintaining a full fork of CPython is expensive. Every upstream CPython release requires rebasing Cinder's patches, resolving conflicts, and re-validating the entire test suite. As CPython's development velocity increased — particularly with the major performance work in 3.11 and 3.12 — this maintenance burden became untenable. Meta's team made a strategic decision: rather than maintaining an ever-diverging fork, they would extract Cinder's most distinctive and hard-to-upstream components into a standalone CPython extension module called CinderX.

"Some parts of Cinder (our JIT compiler and Static Python) wouldn't make sense as part of upstream CPython (because of limited platform support, C versus C++, semantic changes, and just the size of the code), so our goal is to package these as an independent extension module, CinderX." — Meta Engineering Blog, October 2023

CinderX packages the Cinder JIT compiler and Static Python as a Python extension that can be installed alongside standard CPython. For Python versions 3.10 through 3.12, CinderX still requires patches to Meta's CPython fork — the upstream runtime did not yet expose the hooks that CinderX needed to intercept execution. Meta contributed many of these hooks to CPython 3.12: an API to set the vectorcall entry point for a Python function, giving the JIT a takeover point; dictionary watchers, type watchers, function watchers, and code object watchers that allow the JIT to be notified when its compiled assumptions are invalidated; extensibility in the code generator that lets Static Python add its own opcodes; and a thread-safe API for writing to perf-map files so the Linux perf profiler can name JIT-compiled machine code sections. The CinderX codebase also includes a parallel garbage collector and a lighter-weight implementation of Python interpreter frames, though both features are not yet compatible with stock CPython and remain gated to Meta's fork for now.

Python 3.14, released on October 7, 2025, is the first version of stock CPython that CinderX supports without requiring a patched fork. This represents the culmination of years of work to bring the necessary extension hooks into the upstream runtime. CinderX is published to PyPI on a weekly basis and is MIT licensed — the most recent PyPI release at the time of writing is cinderx-2026.1.19.0, targeting CPython 3.14. It remains under active development and is used in production at Meta for the Instagram Django service.

python
import cinderx
import cinderx.jit

# Enable automatic JIT compilation of hot functions
cinderx.init()
cinderx.jit.enable(auto_jit=True)

# Or compile specific functions explicitly:
def process_request(data):
    # Hot path in the Django request handler
    ...

cinderx.jit.force_compile(process_request)

# Lazy compile: compiles on next call
cinderx.jit.lazy_compile(process_request)
Note

CinderX currently supports Linux x86-64 fully. macOS builds are possible but most features are disabled at runtime. Windows is not yet supported. This platform limitation is one reason the JIT was packaged as an extension rather than proposed for inclusion in CPython itself.

Real-World Impact: Numbers from Production

The performance numbers Meta has published span several years and multiple optimization layers, making it worth examining each set of results in context.

Optimization Reported Gain Context / Source
Cinder JIT (general benchmarks) 1.5–4x throughput facebookincubator/cinder README
Static Python + Cinder JIT (Richards benchmark) 18x vs. stock CPython facebookincubator/cinder README
Immortal Objects (production, memory churn) ~5% CPU win, significant private memory reduction Meta Engineering Blog, Aug 2023
Immortal Objects (reference count overhead) ~2% regression (mitigated by register allocation) Meta Engineering Blog, Aug 2023
Lazy Imports + Cinder (ML workloads) Up to 40% time-to-first-batch improvement Meta Engineering Blog, Jan 2024
Lazy Imports + Cinder (Jupyter kernel startup) 20% reduction in startup time Meta Engineering Blog, Jan 2024

The machine learning numbers deserve particular attention. Meta's January 2024 Engineering Blog post documented the deployment of Lazy Imports alongside the Cinder runtime for AI training workloads. Time to first batch (TTFB) — the time from launching a training job to processing its first batch of data — improved by up to 40 percent. Jupyter kernel startup times dropped by 20 percent. These are significant developer experience improvements in an environment where iteration speed directly affects research productivity.

"The adoption of Lazy Imports and Cinder represented a meaningful enhancement in Meta's AI key workloads. The TTFB wins, DevX improvements, and reduced kernel startup times are all tangible results of this initiative." — Meta Engineering Blog, January 2024

The Lazy Imports approach used in Cinder differs meaningfully from other lazy import libraries. Most such libraries require developers to identify specific imports to defer. Cinder's approach is comprehensive: all imports are deferred by default, with the actual resolution happening transparently the first time an imported name is accessed. This eliminates the cognitive overhead of selective import management and removes the need for TYPE_CHECKING guards for typing-only imports.

Cinder's Influence on Upstream CPython

Cinder's most enduring contribution may not be the runtime itself but the ideas it validated and eventually pushed into CPython. The influence runs through several Python Enhancement Proposals and release notes.

PEP 683 — Immortal Objects — landed in CPython 3.12, contributed directly by Meta. This was not a small patch; it required changes to the reference counting routines, object headers, and the garbage collector, along with careful performance tuning to keep the regression to the 2 percent figure that Meta had already demonstrated was achievable. The PEP establishes a foundation for future work on true object immutability and GIL-free multi-threading.

PEP 709 — Inlined Comprehensions — also landed in Python 3.12 with Meta's contribution. In prior versions, list, dict, and set comprehensions were compiled as nested functions, requiring allocation and destruction of a single-use function object for every comprehension execution. PEP 709 inlines comprehension bodies directly into the containing function's bytecode, eliminating that allocation. Meta reported up to a 2x improvement in comprehension-heavy code, and the implementation process also uncovered a pre-existing bytecode compiler bug that could silently produce wrong results in Python 3.11.

CPython 3.11 introduced a specializing adaptive interpreter — the idea of replacing generic opcodes with type-specialized variants at runtime — that is conceptually identical to Cinder's shadowcode system. While the CPython implementation was developed independently as part of the "Faster CPython" project, Cinder served as public evidence that the technique delivered real-world gains at production scale, not just on benchmarks.

Cinder Fork Shadowcode Immortal Objects Eager Async JIT + Static Python CPython 3.11 Specializing Interpreter CPython 3.12 PEP 683 (Immortal) PEP 709 (Comprehensions) CinderX (ext.) PyPI weekly releases CPython 3.14+ Extension hooks for JIT CinderX runs on stock CPython (no fork needed)
How Cinder's research fed into CPython releases and ultimately enabled CinderX to run on stock CPython 3.14 without a forked runtime.

The CPython 3.12 contributions also included a vectorcall entry point API — a hook that allows the CinderX JIT to intercept execution of a Python function and hand control to compiled machine code — along with dictionary, type, function, and code object watchers that notify the JIT when its speculative assumptions have been invalidated. These hooks are available to any third-party JIT or optimizer, not just CinderX, making CPython 3.12 meaningfully more extensible for runtime research.

The Cinder repository continues to track CPython's main branch, with the meta/3.14 branch actively maintained. The ongoing work reflects Meta's need to keep running newer Python versions in production at Instagram while the CinderX extension matures toward full feature compatibility on stock CPython, including the parallel garbage collector and lighter-weight frame implementations that are not yet ready for the upstream runtime.

Key Takeaways

  1. Cinder is a production-proven CPython fork, not a research prototype. It has powered Instagram's Django web server at scale for years, validating optimizations under real workload pressure that benchmarks alone cannot replicate.
  2. The evolution from Cinder to CinderX reflects a deliberate open-source strategy. Rather than maintaining an ever-diverging fork, Meta extracted its most distinctive optimizations into a standalone Python extension, reducing maintenance cost while enabling the broader community to experiment with the JIT and Static Python on stock CPython.
  3. Immortal Objects and Immortal Instances solved a fork-model memory problem unique to Instagram's architecture. The solution — opting objects out of reference counting entirely — is now part of CPython 3.12 via PEP 683 and represents a foundational building block for future GIL-free parallelism in Python.
  4. Lazy Imports produced dramatic gains in ML workflows. The up-to-40-percent reduction in time to first batch and 20-percent reduction in Jupyter kernel startup time demonstrate that import-time overhead is a real and addressable bottleneck in large Python codebases, particularly in AI and data science contexts.
  5. CinderX runs on stock CPython 3.14 without a forked runtime. Python 3.14 is the first release where CinderX requires no patches to the CPython source, completing a multi-year effort to bring the necessary extension hooks into the upstream interpreter.

Cinder's story is ultimately about what happens when a company with Instagram's scale takes Python's performance seriously enough to fork the runtime and prove out ideas in production. The result is not just a faster Instagram — it is a faster Python. PEP 683, PEP 709, the specializing interpreter in 3.11, the vectorcall and watcher APIs in 3.12: these are Cinder's legacy in the language itself, available to every Python developer whether they ever touch CinderX or not. The runtime performance improvements that were once gated behind a private fork at Meta are progressively becoming the standard. That was always the stated goal, and it is measurably happening.

Frequently Asked Questions

What is Meta's Cinder and why was it built?

Cinder is a performance-oriented fork of CPython developed at Meta, open sourced in April 2021 from a CPython 3.8.5 base. It was built to address performance bottlenecks specific to Instagram's architecture: a pre-fork web server model that spawns many worker processes under Linux's copy-on-write memory model, combined with the overhead of CPython's generic bytecode interpreter and reference counting system at massive scale. The goal from the outset was to validate optimizations in production and push proven ideas back into upstream CPython — not to build a rival runtime for general use.

What is the difference between Cinder and CinderX?

Cinder is a full fork of the CPython runtime. CinderX is a standalone Python extension module that packages Cinder's JIT compiler and Static Python as a pip-installable library. The shift to CinderX began at Python 3.10 to reduce the maintenance cost of rebasing patches on every CPython release. Python 3.14, released October 7, 2025, is the first version of stock CPython that CinderX supports without requiring any patches to a forked runtime.

What performance gains has Meta documented?

Published figures include: 1.5–4x throughput improvement from the Cinder JIT on Python benchmarks; 18x throughput over stock CPython for a typed Static Python function on the Richards benchmark; approximately 5 percent CPU win from Immortal Objects in production at Instagram; up to 40 percent reduction in time-to-first-batch for ML training workloads using Lazy Imports and Cinder; and 20 percent reduction in Jupyter kernel startup times. All figures are drawn from Meta's Engineering Blog posts and the facebookincubator/cinder GitHub README.

Can CinderX be installed on stock CPython today?

Yes. Python 3.14 is the first stock CPython release fully supported by CinderX without requiring a patched fork. Install via pip install cinderx on Linux x86-64. macOS builds are possible but most features are disabled at runtime. Windows is not yet supported. Releases are published to PyPI weekly under the MIT license.

How did Cinder influence upstream CPython?

Several CPython features trace directly to Cinder research. PEP 683 (Immortal Objects) and PEP 709 (Inlined Comprehensions) both landed in CPython 3.12 as Meta contributions. The specializing adaptive interpreter in CPython 3.11 is conceptually identical to Cinder's shadowcode system. CPython 3.12 also received vectorcall entry point APIs, dictionary and type watchers, and a thread-safe perf-map API — hooks contributed by Meta to enable CinderX to run on stock CPython. Eager asyncio tasks also landed in CPython 3.12 as a more conservative version of Cinder's eager async execution feature.

Sources: facebookincubator/cinder (GitHub) · facebookincubator/cinderx (GitHub) · How the Cinder JIT's function inliner helps us optimize Instagram, Meta Engineering Blog (May 2022) · Introducing Immortal Objects for Python, Meta Engineering Blog (Aug 2023) · Meta contributes new features to Python 3.12, Meta Engineering Blog (Oct 2023) · Lazy is the new fast: How Lazy Imports and Cinder accelerate machine learning at Meta, Meta Engineering Blog (Jan 2024) · cinderx on PyPI · Talk Python to Me, Episode 347: Cinder — Specialized Python That Flies