Python is the dominant language for data science, genomics, and quantitative finance. It is also notoriously slow. Codon, a compiler born inside MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), takes Python code as input and produces native machine binaries that run at the speed of C or C++. In January 2025, it went fully open source under the Apache 2.0 license and shipped a native NumPy reimplementation — removing the two largest barriers between high-performance compiled Python and anyone who needs it. Codon does not attempt to make every Python program faster. It is a precision instrument for compute-bound workloads, and in 2025 it became a significantly more accessible one.
Codon was first presented publicly in a paper at the 32nd ACM SIGPLAN International Conference on Compiler Construction, held February 25–26, 2023 in Montréal, Canada (DOI: 10.1145/3578360.3580275). Its lead author is Ariya Shajii (SM '18, PhD '21 MIT CSAIL, now CEO of Exaloop), alongside MIT professor and CSAIL principal investigator Saman Amarasinghe, Gabriel Ramirez (former CSAIL student, then at Jump Trading), Jessica Ray (MIT Lincoln Laboratory), Bonnie Berger (MIT professor of mathematics and CSAIL PI), Haris Smajlović (University of Victoria), and Ibrahim Numanagić (University of Victoria assistant professor and Canada Research Chair). The project evolved directly from Seq — a domain-specific language first published in Nature Biotechnology in 2021 and designed originally for high-performance genomic computation — before expanding into a general-purpose Python compiler. Understanding Codon requires understanding precisely what makes Python slow in the first place.
Why Python Is Slow and What That Costs
CPython, the reference implementation of Python, does not compile source code into machine instructions. It compiles source into bytecode and then executes that bytecode line by line inside a virtual machine. At every step, the interpreter must determine the type of each variable — is this an integer or a string, a list or a dictionary — because Python permits types to change at runtime. That type interrogation, carried out millions of times per second, generates substantial overhead. It is one of the core reasons a Python loop runs orders of magnitude slower than an equivalent loop in C.
A second structural constraint is the Global Interpreter Lock, or GIL. The GIL is a mutex inside CPython that prevents more than one thread from executing Python bytecode simultaneously. It exists to protect CPython's internal memory management from race conditions, but the practical consequence is that pure Python programs cannot use multiple CPU cores in parallel. For a language that has become the default tool in data science and scientific computing — fields where datasets routinely reach into gigabytes or terabytes — this is a significant limitation.
"Python is the language of choice for domain experts that are not programming experts. If they write a program that gets popular, and many people start using it and run larger and larger datasets, then the lack of performance of Python becomes a critical barrier to success." — Saman Amarasinghe, MIT professor and CSAIL principal investigator. Source: MIT News, March 2023
Amarasinghe's framing clarifies the design motivation. The problem is not that Python is the wrong tool for the job. The problem is that researchers, analysts, and domain scientists — the people who rely on Python — should not have to abandon a language they know and rewrite programs in C++ the moment those programs scale up. Codon is designed to close that gap without requiring anyone to switch languages.
How Codon Works: AOT Compilation and the LLVM Pipeline
Codon takes a fundamentally different approach from CPython. Rather than interpreting bytecode at runtime, it performs ahead-of-time (AOT) compilation: it reads Python source, analyzes it statically, and translates it into native machine code before the program runs. The resulting binary executes directly on the CPU with no interpreter layer in between.
The enabling mechanism is static type checking. As Shajii explained to IEEE Spectrum, with CPython, bytecode runs inside a virtual machine, whereas with Codon the end result runs directly on the CPU — no intermediate virtual machine or interpreter. To make that possible, Codon performs type inference across the entire program at compile time, assigning a fixed type to every variable and function argument. Once types are known, the compiler eliminates the metadata that Python normally carries with every object — metadata needed to answer type questions at runtime. MIT professor Amarasinghe described the effect: "if you have a dynamic language, every time you have some data, you need to keep a lot of additional metadata around it. Codon does away with this metadata, so the code is faster and data is much smaller." Source: IEEE Spectrum, 2023.
After type checking, Codon translates the typed program into its own intermediate representation (IR), applies a suite of optimization passes, then hands the result to LLVM — the same compiler infrastructure used by Clang, Rust, and Swift — which produces the final native binary. LLVM's backend applies further low-level optimizations including auto-vectorization, which allows loops to use SIMD (single instruction, multiple data) CPU instructions that process multiple data elements in a single clock cycle.
Codon's five-stage compilation pipeline. Each stage eliminates a layer of Python's runtime overhead.
Codon also eliminates the GIL entirely. It supports native multithreading via OpenMP, which means a loop annotated with the @par decorator will distribute its iterations across as many CPU cores as the programmer specifies. The compiler automatically converts operations like total += 1 inside a parallel loop into atomic reductions to prevent race conditions, with no manual synchronization required. Beyond CPU parallelism, Codon supports writing and executing GPU kernels, making it applicable to workloads that traditionally required CUDA expertise or low-level GPU programming knowledge.
One design feature that receives less attention than raw speed is Codon's plug-in architecture. The compiler is extensible: developers can write domain-specific plugins that contribute new libraries, new IR optimization passes, and new compilation targets. The original Seq paper demonstrated this for genomics; the 2023 ACM paper extended the demonstration to secure multi-party computation, block-based data compression, and parallel programming. This means Codon is not just a Python speed layer — it is a platform for building high-performance domain-specific languages that share Python's syntax. For researchers building quantitative finance tools or genomics pipelines, this is architecturally distinct from anything offered by PyPy, Numba, or Cython. Source: ACM SIGPLAN CC '23 paper, DOI 10.1145/3578360.3580275.
A concrete real-world example: a community contributor adapted llama2.py — Andrej Karpathy's pure Python implementation of a Llama 2 inference engine — to compile with Codon. The result was a 74x speedup over the standard CPython version, without rewriting the core logic in C or Rust. Source: Exaloop.
Always include the -release flag when building with Codon. Without it, Codon links the standard CPython NumPy library rather than Codon's native NumPy implementation. A USENIX reviewer independently confirmed a 115x speedup on a NumPy loop benchmark with -release enabled, versus roughly 2x without it. The compiled binary is also substantially smaller. Source: USENIX, March 2025.
Benchmark Results: What the Numbers Show
The headline performance claim for Codon — stated in its GitHub repository and FAQ — is 10 to 100 times faster than CPython on a single thread, with performance typically on par with C or C++. The canonical demonstration is a recursive Fibonacci computation. Running fib(40) with CPython completes in approximately 17.98 seconds. Running the identical source file with codon run -release fib.py produces the same result in 0.276 seconds. The source code requires no modifications. Source: Codon GitHub repository.
from time import time
def fib(n):
return n if n < 2 else fib(n - 1) + fib(n - 2)
t0 = time()
ans = fib(40)
t1 = time()
print(f'Computed fib(40) = {ans} in {t1 - t0} seconds.')
# CPython: ~17.98 seconds
# Codon: ~0.276 seconds (codon run -release fib.py)
A March 2025 independent review published by USENIX, authored by security and systems consultant Rik Farrow, confirmed real-world speedups when using Codon with NumPy. Testing a nested-loop array initialization benchmark that Exaloop claimed produced a 300x speedup, Farrow initially ran without the -release flag and observed only a 2x improvement. After Exaloop CEO Ariya Shajii identified the missing flag, Farrow re-ran with -release and confirmed a 115x speedup, consistent with Exaloop's published benchmark range for that workload category. Source: USENIX, March 2025.
A broader academic study published in March 2025 benchmarked Codon against PyPy, Numba, Nuitka, Mypyc, Cython, Pyston-lite, and the experimental Python 3.13 build, comparing all of them against CPython across seven workloads on two hardware configurations. The findings showed that Codon, PyPy, and Numba each achieved over 90% improvement in both execution time and energy consumption relative to CPython. Nuitka optimized memory usage most consistently across hardware. The study noted that Codon's impact on last-level cache miss rates varied considerably across benchmarks, indicating that memory access patterns in the specific workload significantly influence the performance profile. Source: Codon academic paper, ResearchGate, 2025.
Codon in 2025: Native NumPy and the Apache License
January 2025 brought two substantial changes. First, Exaloop moved Codon from the Business Source License — which prohibited commercial use of recent versions — to the Apache License 2.0. The Apache 2.0 license permits commercial use, modification, and redistribution without restriction. Exaloop continues to offer enterprise support packages for organizations that require them, but the compiler itself is now free for any use. As USENIX reviewer Rik Farrow summarized, commercial use and derivations of Codon are now permitted without licensing. Source: USENIX, March 2025.
The second change was the release of Codon-NumPy: a full reimplementation of the NumPy library written in Codon itself. This is architecturally distinct from simply calling standard NumPy through Codon's Python interoperability layer. Standard NumPy is implemented as opaque C extensions — code that Codon's optimizer cannot inspect. When NumPy is called via interoperability, Codon treats it as a black box and cannot apply its optimization passes to NumPy's internal computations. Codon-NumPy changes this by making the full NumPy implementation visible to the compiler.
"NumPy support has been the biggest barrier for many folks looking to use Codon." — Exaloop blog, January 2025
The practical consequences include operator fusion. When a program contains an expression like ((x-1)**2 + (y-1)**2 < 1).sum(), standard NumPy evaluates each sub-expression independently, allocating a temporary array for each intermediate result. Codon's optimizer can analyze the expression structure, determine that all intermediate arrays are temporary, and collapse the entire computation into a single pass over the input data — eliminating the intermediate allocations. On a pi-approximation benchmark using 500 million random points, this technique produces roughly a 5x speedup over standard NumPy: standard execution takes approximately 2.25 seconds and Codon-NumPy completes in 0.43 seconds on Apple Silicon hardware. Source: Codon GitHub repository; Exaloop, January 2025.
For nested loops — the worst-case performance scenario in CPython — Codon-NumPy demonstrates the largest gains. A triple-nested array initialization across a 300x300x300 array takes 3.5 seconds under standard Python and approximately 0.01 seconds under Codon: a roughly 300x speedup, because the compiled native loops carry no interpreter overhead. Benchmarked against the full NPBench suite, Codon-NumPy achieves a geometric mean speedup of 2.4x over standard Python plus NumPy in single-threaded mode, with a maximum speedup exceeding 900x on the best-case workloads. Enabling Codon's multithreading and GPU features pushes performance further beyond those figures. Source: Exaloop, January 2025.
Codon vs. Other Python Speed-Up Tools
Several tools exist for accelerating Python, and Codon occupies a distinct architectural position among them. The table below reflects their differences as documented in the Codon FAQ, the USENIX reviews, and the ACM SIGPLAN paper.
| Tool | Compilation Model | GIL Removed | CPython Compatibility |
|---|---|---|---|
| CPython | Bytecode interpreter | No | Reference implementation |
| PyPy | Just-in-time (JIT), tracing | No | High — most code runs unchanged |
| Numba | JIT (selective, via decorator) | Partial (nopython mode) | Moderate — decorated functions only |
| Cython | Transpiles to C extensions | Partial (with annotations) | Moderate — requires .pyx syntax |
| Codon | Ahead-of-time (AOT) to native binary | Yes | Partial — static subset of Python |
The distinction between Codon and PyPy is worth examining specifically. PyPy uses tracing JIT compilation: it observes a program as it runs, identifies the execution paths that are called frequently, and compiles those paths to machine code on the fly. PyPy works within the full dynamic Python runtime and is a genuine drop-in replacement for CPython in the vast majority of programs. Codon compiles the entire program statically before it runs. As the Codon FAQ states, Codon's compilation process is closer to C++ than to Julia, and substantially different from PyPy. The consequence is that Codon can produce faster binaries in compute-intensive scenarios, but cannot run programs that depend on Python's dynamic runtime behaviors. Source: Codon FAQ, Exaloop.
It is not necessary to compile an entire program with Codon to benefit from it. Codon provides a @jit decorator that marks individual functions for native compilation within an otherwise standard CPython program. This allows incremental adoption: identify the bottleneck function, annotate it, and get compiled performance for that specific operation without migrating the entire codebase. Source: Codon FAQ, Exaloop.
Platform Support, Limitations, and When Not to Use Codon
Codon runs on Linux and macOS, including Apple Silicon. It does not currently run on Windows. As USENIX reviewer Rik Farrow confirmed in his March 2025 review, Windows support remains absent from the current release. This is a practical constraint worth knowing before committing to Codon in a mixed-OS development environment. Source: USENIX, March 2025.
Codon's documentation is explicit: it is not a drop-in replacement for CPython. The compiler enforces static typing, which means it cannot handle programs where variable types change at runtime, or programs that use Python's runtime reflection capabilities — features like getattr, setattr, or dynamic class modification that appear in metaprogramming, some web frameworks, and object-relational mappers.
Not every Python standard library module has been reimplemented in Codon. The compiler supports calling any Python package through its interoperability layer — you can import matplotlib, scikit-learn, or any other library using from python import — but doing so bypasses Codon's native compilation for that library and leaves performance on the table. The full performance benefit comes when both the program logic and its numeric libraries are compiled natively. For codebases where full migration is impractical, Codon offers two incremental paths: the @jit decorator compiles individual functions within an otherwise standard CPython program, and the pyext build mode compiles Codon code into Python extension modules that can be imported directly from CPython — similar to Cython, but without requiring the .pyx syntax. Source: Codon Changelog, Exaloop.
"Python has been battle-tested by numerous people, and Codon hasn't reached anything like that yet. It needs to run a lot more programs, get a lot more feedback, and harden up more." — Saman Amarasinghe, MIT professor and CSAIL principal investigator. Source: IEEE Spectrum, 2023
That candid assessment from Codon's own academic supervisor is worth taking seriously. The USENIX March 2025 review confirmed that for workloads dominated by simple data aggregation — filling a dictionary, summing values, processing log files — Codon provided no meaningful speedup over CPython. Farrow's own log-file summarization script, which fills an associative array and prints sorted totals, ran in essentially the same time under Codon as under CPython. Performance gains are concentrated in compute-bound workloads: numerical loops, large array operations, signal processing, and genomic sequence analysis. Programs that spend the majority of their time on I/O or dictionary operations are unlikely to benefit from Codon compilation. Source: USENIX, March 2025.
There is also a practical integer semantics difference to account for. CPython uses arbitrary-precision integers by default — Python integers can grow as large as available memory permits. Codon uses 64-bit integers, matching C and C++ behavior. For the vast majority of scientific and data processing programs, this makes no difference. Code that relies on Python's arbitrary-precision integer behavior, such as cryptographic implementations or very large factorial computations, will require adjustment before compiling with Codon.
CPython uses arbitrary-width integers. Codon uses 64-bit integers by default. Code that depends on Python's ability to handle integers larger than 2'63 will behave differently under Codon. For scientific and numerical computing this rarely matters, but it is worth verifying for any code that processes cryptographic values or produces very large factorials. Source: Codon FAQ, Exaloop.
Key Takeaways
- AOT compilation removes Python's fundamental overhead: By compiling to native machine code before runtime, Codon eliminates the bytecode interpreter, per-variable type interrogation, and all associated metadata that CPython carries. The resulting binary runs directly on the CPU with zero interpreter overhead.
- The GIL is gone: Codon's native multithreading support via OpenMP allows Python programs to use all available CPU cores in true parallel. The
@pardecorator distributes loop iterations across threads, and atomic reductions handle synchronization automatically. - Codon-NumPy is the major 2025 addition: The fully compiled NumPy reimplementation, released in January 2025, enables operator fusion, LLVM-level auto-vectorization, and GPU offloading — capabilities unavailable when calling standard NumPy through Python interoperability mode.
- Codon is now fully open source under Apache 2.0: The January 2025 license change means commercial use is permitted without a licensing agreement. Organizations can adopt Codon in production pipelines freely, with enterprise support packages available separately from Exaloop.
- Linux and macOS only — no Windows: As of early 2025, Codon does not run on Windows. This is a real constraint for mixed-OS teams and should be verified before committing to Codon in any environment that includes Windows machines.
- Codon is not a universal CPython replacement: It targets compute-bound workloads in numerical computing, genomics, quantitative finance, and GPU programming. Programs that depend on dynamic typing, runtime reflection, or arbitrary-precision integers require modification or are not suited to Codon compilation at all.
Codon occupies a well-defined and genuinely useful position in the Python ecosystem. It is not trying to make every Python program faster. It is giving domain experts who write compute-intensive Python a verifiable path to C-speed execution without abandoning the language they know. The 2025 releases — particularly Codon-NumPy and the Apache license transition — have made that path considerably more accessible. For anyone running Python programs that loop over large arrays, process genomic sequences, or perform quantitative simulations, Codon is now a serious option that warrants testing against their specific workloads. The Windows limitation and the static typing constraint are real boundaries, but they are clearly documented ones. Within those boundaries, the performance results are real and independently verified.
Sources: Codon GitHub, Exaloop · Exaloop blog, January 2025 · MIT News, March 2023 · IEEE Spectrum, May 2023 · USENIX, March 2025 · USENIX, April 2023 · Codon FAQ, Exaloop · Codon Changelog, Exaloop · ACM SIGPLAN CC '23 paper, DOI 10.1145/3578360.3580275 · Seq paper, Nature Biotechnology, 2021