Codon is an ahead-of-time Python compiler developed at MIT CSAIL that translates Python source code into native machine code using an LLVM backend. It achieves 10 to 100x speedups over CPython on a single thread, eliminates the Global Interpreter Lock, and supports native multithreading and GPU acceleration.

Does Codon run on Windows?

No. As of early 2025, Codon runs only on Linux and macOS (x86_64 and arm64). Windows is not currently supported.

Is Codon a drop-in replacement for CPython?

No. Codon is not a drop-in replacement for CPython. It compiles a static subset of Python and does not support runtime reflection, arbitrary-precision integers by default, or programs that rely on Python's dynamic typing behavior. However, many Python programs will work with few or no modifications.

Codon-NumPy is a full reimplementation of the NumPy library written in Codon itself, released in January 2025. Unlike calling standard NumPy through Codon's interoperability layer, Codon-NumPy is fully compiled and visible to Codon's optimizer, enabling operator fusion, LLVM auto-vectorization, and GPU offloading.

What changed in Codon v0.19?

Codon v0.19 upgraded the LLVM backend from version 17 to version 20 and updated OpenMP to match, delivering broad performance improvements. A revamped type-checking engine covers Python patterns the previous checker could not handle: class fields are now inferred automatically, forward declarations of functions and classes are no longer required, functions can be passed and stored more freely including lists of lambdas, and error messages are more informative. v0.19 also added else on try-statements, updated nonlocal variable semantics to match CPython, expanded format string support, and improved compilation time.

How does Codon compare to Mojo?

Mojo aims to be a superset of Python that adds low-level programming features, relying on CPython for the broader Python ecosystem. Codon, by contrast, focuses on making Python itself faster through ahead-of-time type checking and compilation, without introducing new language constructs beyond what is needed to express parallelism. Codon is not a superset and does not aim to be a drop-in replacement.

How do I install Codon?

Codon installs via a one-line shell command: /bin/bash -c "$(curl -fsSL https://exaloop.io/install.sh)". Pre-built binaries are available for Linux (x86_64) and macOS (x86_64 and arm64). Windows is not currently supported.

Does Codon support GPU programming?

Yes. Codon supports GPU kernel execution on NVIDIA hardware through its @par(gpu=True) decorator. No CUDA knowledge is required. Codon-NumPy also integrates with the GPU backend, and since PyTorch tensors can be converted to NumPy arrays without data copying, Codon-compiled operations can be incorporated into existing PyTorch workflows.

Codon: The MIT-Built Python Compiler That Matches C++ Speed

Q: What license does Codon use?

Since January 2025 (v0.18), Codon is licensed under the Apache License 2.0, which permits commercial use, modification, and redistribution without restriction. It was previously under the Business Source License.

Python is the dominant language for data science, genomics, and quantitative finance. It is also notoriously slow. Codon, a compiler born inside MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), takes Python code as input and produces native machine binaries that run at the speed of C or C++. In January 2025, it went fully open source under the Apache 2.0 license and shipped a native NumPy reimplementation — removing the two largest barriers between high-performance compiled Python and anyone who needs it. A subsequent release, v0.19, upgraded the LLVM backend to version 20 and introduced a new type-checking engine that handles Python patterns the previous checker could not compile. Codon does not attempt to make every Python program faster. It is a precision instrument for compute-bound workloads, and in 2025 it became a significantly more accessible one.

Codon was first presented publicly in a paper at the 32nd ACM SIGPLAN International Conference on Compiler Construction, held February 25–26, 2023 in Montréal, Canada (DOI: 10.1145/3578360.3580275). Its lead author is Ariya Shajii (SM '18, PhD '21 MIT CSAIL, now CEO of Exaloop), alongside MIT professor and CSAIL principal investigator Saman Amarasinghe, Gabriel Ramirez (former CSAIL student, then at Jump Trading), Jessica Ray (MIT Lincoln Laboratory), Bonnie Berger (MIT professor of mathematics and CSAIL PI), Haris Smajlović (University of Victoria), and Ibrahim Numanagić (University of Victoria assistant professor and Canada Research Chair). The project evolved directly from Seq — a domain-specific language first published in the proceedings of OOPSLA 2019 (DOI: 10.1145/3360551) and later described in a shorter communication in Nature Biotechnology in September 2021 (DOI: 10.1038/s41587-021-00985-6) — designed originally for high-performance genomic computation before expanding into a general-purpose Python compiler. Understanding Codon requires understanding precisely what makes Python slow in the first place.

Getting Started: Installation and Basic Usage

Codon provides a one-line installer for Linux and macOS. Open a terminal and run the following command, which downloads and executes the official install script from Exaloop:

bash

/bin/bash -c "$(curl -fsSL https://exaloop.io/install.sh)"

After the installer completes, the codon command becomes available in your terminal. The compiler supports three primary modes, each suited to a different use case. Source: Codon GitHub repository.

bash

# Compile and run without optimizations (development mode)
codon run script.py

# Compile and run with full optimizations — always use this for benchmarking
codon run -release script.py

# Compile to a standalone executable with optimizations
codon build -release -exe script.py
./script

# Compile to LLVM IR (outputs script.ll — useful for inspecting what the backend sees)
codon build -release -llvm script.py

The distinction between codon run and codon build -exe matters in practice. codon run compiles and immediately executes the program, discarding the binary afterward. codon build -exe produces a persistent standalone executable that can be distributed, run on other compatible machines without a Codon installation, and deployed in production pipelines. Compiled binaries do not expose the Python source — a practical advantage for commercial teams distributing tools without revealing implementation logic. The USENIX 2023 reviewer noted this as a secondary benefit worth considering alongside the performance gains. Source: USENIX, April 2023.

Spot the Bug Codon build command gone wrong

A developer is trying to benchmark a Fibonacci program with Codon. They run the command below — but the speedup is only about 2x, nowhere near the expected 65x. One line contains the problem. Can you find it?

1# fib.py 2from time import time 3 4def fib(n): 5 return n if n < 2 else fib(n - 1) + fib(n - 2) 6 7t0 = time() 8ans = fib(40) 9t1 = time() 10print(f'fib(40) = {ans} in {t1 - t0:.4f}s')

The terminal command used to run it:

$codon run fib.py

Which answer correctly identifies the bug and explains why it matters?

Corrected command

codon run -release fib.py

The -release flag instructs Codon to use its native compiled libraries instead of falling back to CPython-linked ones. The same benchmark that produced only ~2x improvement without the flag confirmed a 115x speedup after an independent USENIX reviewer added it.

Profile Before You Compile

Before targeting any program with Codon, profile it under CPython to confirm the bottleneck is compute-bound rather than I/O-bound. The standard library's cProfile module is sufficient for identifying which functions consume the majority of execution time. Run python3 -m cProfile -s cumulative script.py and examine the top entries by cumulative time. If the bottleneck functions involve numerical loops, array operations, or recursive computation, Codon will help. If they involve network requests, file reads, or dictionary construction, Codon will not produce a meaningful speedup — profiling first avoids that wasted effort. Source: Codon FAQ, Exaloop.

Why Python Is Slow and What That Costs

CPython, the reference implementation of Python, does not compile source code into machine instructions. It compiles source into bytecode and then executes that bytecode line by line inside a virtual machine. At every step, the interpreter must determine the type of each variable — is this an integer or a string, a list or a dictionary — because Python permits types to change at runtime. That type interrogation, carried out millions of times per second, generates substantial overhead. It is one of the core reasons a Python loop runs orders of magnitude slower than an equivalent loop in C.

A second structural constraint is the Global Interpreter Lock, or GIL. The GIL is a mutex inside CPython that prevents more than one thread from executing Python bytecode simultaneously. It exists to protect CPython's internal memory management from race conditions, but the practical consequence is that pure Python programs cannot use multiple CPU cores in parallel. For a language that has become the default tool in data science and scientific computing — fields where datasets routinely reach into gigabytes or terabytes — this is a significant limitation.

"The lack of performance of Python becomes a critical barrier to success." — Saman Amarasinghe, MIT professor and CSAIL principal investigator, on what happens when domain scientists scale their programs. Source: MIT News, March 2023

Amarasinghe's framing clarifies the design motivation. The problem is not that Python is the wrong tool for the job. The problem is that researchers, analysts, and domain scientists — the people who rely on Python — should not have to abandon a language they know and rewrite programs in C++ the moment those programs scale up. Codon is designed to close that gap without requiring anyone to switch languages.

How Codon Works: AOT Compilation and the LLVM Pipeline

Codon takes a fundamentally different approach from CPython. Rather than interpreting bytecode at runtime, it performs ahead-of-time (AOT) compilation: it reads Python source, analyzes it statically, and translates it into native machine code before the program runs. The resulting binary executes directly on the CPU with no interpreter layer in between.

The enabling mechanism is static type checking. As Shajii explained to IEEE Spectrum, with CPython, bytecode runs inside a virtual machine, whereas with Codon the end result runs directly on the CPU — no intermediate virtual machine or interpreter. To make that possible, Codon performs type inference across the entire program at compile time, assigning a fixed type to every variable and function argument. Once types are known, the compiler eliminates the metadata that Python normally carries with every object — metadata needed to answer type questions at runtime. MIT professor Amarasinghe described the effect: "if you have a dynamic language, every time you have some data, you need to keep a lot of additional metadata around it. Codon does away with this metadata, so the code is faster and data is much smaller." Source: IEEE Spectrum, 2023.

After type checking, Codon translates the typed program into its own intermediate representation (IR), applies a suite of optimization passes, then hands the result to LLVM — the same compiler infrastructure used by Clang, Rust, and Swift — which produces the final native binary. LLVM's backend applies further low-level optimizations including auto-vectorization, which allows loops to use SIMD (single instruction, multiple data) CPU instructions that process multiple data elements in a single clock cycle.

INPUT Python Source .py file

STAGE 1 Type Checking Static inference

STAGE 2 Codon IR Optimization passes

STAGE 3 LLVM Backend Auto-vectorization

OUTPUT Native Binary No runtime overhead

Codon's compilation pipeline: Python source enters, static type checking produces a typed AST, the Codon IR stage applies optimization passes, LLVM translates IR to native instructions, and the output binary carries no runtime overhead.

Codon also eliminates the GIL entirely. It supports native multithreading via OpenMP, which means a loop annotated with the @par decorator will distribute its iterations across as many CPU cores as the programmer specifies. The compiler automatically converts operations like total += 1 inside a parallel loop into atomic reductions to prevent race conditions, with no manual synchronization required. Beyond CPU parallelism, Codon supports writing and executing GPU kernels, making it applicable to workloads that traditionally required CUDA expertise or low-level GPU programming knowledge.

One design feature that receives less attention than raw speed is Codon's plug-in architecture. The compiler is extensible: developers can write domain-specific plugins that contribute new libraries, new IR optimization passes, and new compilation targets. The original Seq paper demonstrated this for genomics; the 2023 ACM paper extended the demonstration to secure multi-party computation, block-based data compression, and parallel programming. This means Codon is not just a Python speed layer — it is a platform for building high-performance domain-specific languages that share Python's syntax. For researchers building quantitative finance tools or genomics pipelines, this is architecturally distinct from anything offered by PyPy, Numba, or Cython. Source: ACM SIGPLAN CC '23 paper, DOI 10.1145/3578360.3580275.

A concrete real-world example: a community contributor adapted llama2.py — Andrej Karpathy's pure Python implementation of a Llama 2 inference engine — to compile with Codon. The result was a 74x speedup over the standard CPython version, without rewriting the core logic in C or Rust. Source: Exaloop.

The -release Flag Is Essential

Always include the -release flag when building with Codon. Without it, Codon links the standard CPython NumPy library rather than Codon's native NumPy implementation. A USENIX reviewer independently confirmed a 115x speedup on a NumPy loop benchmark with -release enabled, versus roughly 2x without it. The compiled binary is also substantially smaller. Source: USENIX, March 2025.

Benchmark Results: What the Numbers Show

The headline performance claim for Codon — stated in its GitHub repository and FAQ — is 10 to 100 times faster than CPython on a single thread, with performance typically on par with C or C++. The canonical demonstration is a recursive Fibonacci computation. Running fib(40) with CPython completes in approximately 17.98 seconds. Running the identical source file with codon run -release fib.py produces the same result in 0.276 seconds. The source code requires no modifications. Source: Codon GitHub repository.

python

from time import time

def fib(n):
    return n if n < 2 else fib(n - 1) + fib(n - 2)

t0 = time()
ans = fib(40)
t1 = time()
print(f'Computed fib(40) = {ans} in {t1 - t0} seconds.')

# CPython:  ~17.98 seconds
# Codon:    ~0.276 seconds  (codon run -release fib.py)

A March 2025 independent review published by USENIX, authored by security and systems consultant Rik Farrow, confirmed real-world speedups when using Codon with NumPy. Testing a nested-loop array initialization benchmark that Exaloop claimed produced a 300x speedup, Farrow initially ran without the -release flag and observed only a 2x improvement. After Exaloop CEO Ariya Shajii identified the missing flag, Farrow re-ran with -release and confirmed a 115x speedup, consistent with Exaloop's published benchmark range for that workload category. Source: USENIX, March 2025.

A broader comparison study benchmarked Codon against PyPy, Numba, Nuitka, Mypyc, Cython, Pyston-lite, and the experimental Python 3.13 JIT build, comparing all of them against CPython across seven workloads on two hardware configurations (an Intel NUC and a server). The findings showed that Codon, PyPy, and Numba each achieved over 90% improvement in both execution time and energy consumption relative to CPython. Nuitka optimized memory usage most consistently across hardware. The study noted that Codon's impact on last-level cache miss rates varied considerably across benchmarks, indicating that memory access patterns in the specific workload significantly influence the performance profile. Source: Stoico et al., arXiv:2505.02346, EASE 2025.

Check Your Understanding AOT compilation and performance fundamentals

Codon in 2025: Native NumPy and the Apache License

January 2025 brought two substantial changes. First, Exaloop moved Codon from the Business Source License — which prohibited commercial use of recent versions — to the Apache License 2.0. The Apache 2.0 license permits commercial use, modification, and redistribution without restriction. Exaloop continues to offer enterprise support packages for organizations that require them, but the compiler itself is now free for any use. As USENIX reviewer Rik Farrow summarized, commercial use and derivations of Codon are now permitted without licensing. Source: USENIX, March 2025.

The second change was the release of Codon-NumPy: a full reimplementation of the NumPy library written in Codon itself. This is architecturally distinct from simply calling standard NumPy through Codon's Python interoperability layer. Standard NumPy is implemented as opaque C extensions — code that Codon's optimizer cannot inspect. When NumPy is called via interoperability, Codon treats it as a black box and cannot apply its optimization passes to NumPy's internal computations. Codon-NumPy changes this by making the full NumPy implementation visible to the compiler.

"NumPy support has been the biggest barrier for many folks." — Exaloop blog, January 2025

The practical consequences include operator fusion. When a program contains an expression like ((x-1)**2 + (y-1)**2 < 1).sum(), standard NumPy evaluates each sub-expression independently, allocating a temporary array for each intermediate result. Codon's optimizer can analyze the expression structure, determine that all intermediate arrays are temporary, and collapse the entire computation into a single pass over the input data — eliminating the intermediate allocations. On a pi-approximation benchmark using 500 million random points, this technique produces roughly a 5x speedup over standard NumPy: standard execution takes approximately 2.25 seconds and Codon-NumPy completes in 0.43 seconds on Apple Silicon hardware. Source: Codon GitHub repository; Exaloop, January 2025.

For nested loops — the worst-case performance scenario in CPython — Codon-NumPy demonstrates the largest gains. A triple-nested array initialization across a 300x300x300 array takes 3.5 seconds under standard Python and approximately 0.01 seconds under Codon: a roughly 300x speedup, because the compiled native loops carry no interpreter overhead. Benchmarked against the full NPBench suite, Codon-NumPy achieves a geometric mean speedup of 2.4x over standard Python plus NumPy in single-threaded mode, with a maximum speedup exceeding 900x on the best-case workloads. Enabling Codon's multithreading and GPU features pushes performance further beyond those figures. Source: Exaloop, January 2025.

Exaloop has also announced that a Codon-native Pandas implementation is in active development, following the same architectural approach as Codon-NumPy: a full reimplementation of the library in Codon itself rather than a wrapper around the existing C-backed library. The stated rationale is that Codon's compilation framework can optimize data frame queries in the same way it optimizes NumPy expressions — specifically operator fusion and memory allocation elimination across chained operations. No release date has been announced as of the date of this article, but the roadmap item is documented in the January 2025 Exaloop blog post. For Pandas-heavy workloads, the current path is to call Pandas through Codon's Python interoperability layer, which provides no compilation benefit for the Pandas operations themselves. Source: Exaloop, January 2025.

What Changed in v0.19

Following the January 2025 v0.18 release, Codon shipped v0.19 with two backend-level changes that meaningfully expand what programs Codon can compile and how fast the resulting binaries run.

The first change is an LLVM backend upgrade from version 17 to version 20. LLVM 20 brings broad performance improvements across the board for backend code generation — the compiler infrastructure improvements directly translate to better native code for any program compiled with Codon, with no changes required at the Python source level. The OpenMP runtime was updated to match LLVM 20 simultaneously. Source: Codon Changelog, Exaloop.

The second change is a completely revamped type-checking engine. The previous type checker could not handle certain Python patterns, requiring code modifications before those programs would compile. The v0.19 type checker covers a wider range of Python constructs without requiring changes. Practically, this means five things: class fields are now inferred automatically (before, Codon classes had to declare their fields explicitly, unlike standard Python classes); functions and classes no longer require forward declarations, matching the semantics developers expect from Python; function and class name resolution now matches CPython semantics; functions can be passed around and stored more freely, including lists of lambda functions, which were not previously possible; and error messages are more informative. Beyond the type-checker, v0.19 also added support for else on try statements, updated nonlocal variable semantics to match CPython, added broader support for Python's format strings, and improved compilation time. Source: Codon Changelog, Exaloop; Codon releases, GitHub.

The -fast-math Flag

Codon v0.18 added a -fast-math flag that enables LLVM's fast-math optimizations, trading strict IEEE 754 floating-point semantics for additional speed. This can produce measurable gains in floating-point-heavy numerical workloads where strict rounding and NaN propagation behavior are not required. It is appropriate for many scientific simulations and signal processing workloads, but should not be used in code that depends on exact floating-point semantics. Source: Codon Changelog, Exaloop.

The -disable-exceptions Flag

Codon also provides a -disable-exceptions flag that removes runtime validation checks — such as bounds checks when indexing an array — that Codon performs by default. Disabling them can yield additional vectorization opportunities and performance gains for programs where you have verified no exceptions will be raised. If an exception is raised in a binary compiled with this flag, the program terminates with a SIGTRAP rather than a Python-style traceback. Use only after confirming your program is exception-free. Source: Exaloop, January 2025; Codon Changelog, Exaloop.

Codon vs. Other Python Speed-Up Tools

Several tools exist for accelerating Python, and Codon occupies a distinct architectural position among them. The table below reflects their differences as documented in the Codon FAQ, the USENIX reviews, and the ACM SIGPLAN paper.

Tool	Compilation Model	GIL Removed	CPython Compatibility
CPython	Bytecode interpreter	No	Reference implementation
PyPy	Just-in-time (JIT), tracing	No	High — most code runs unchanged
Numba	JIT (selective, via decorator)	Partial (nopython mode)	Moderate — decorated functions only
Cython	Transpiles to C extensions	Partial (with annotations)	Moderate — requires .pyx syntax
Codon	Ahead-of-time (AOT) to native binary	Yes	Partial — static subset of Python
Mojo	AOT + JIT; superset of Python	Yes	High — relies on CPython for full Python ecosystem

Compilation ModelBytecode interpreter

GIL RemovedNo

CPython CompatibilityReference implementation

Compilation ModelJust-in-time (JIT), tracing

GIL RemovedNo

CPython CompatibilityHigh — most code runs unchanged

Compilation ModelJIT (selective, via decorator)

GIL RemovedPartial (nopython mode)

CPython CompatibilityModerate — decorated functions only

Compilation ModelTranspiles to C extensions

GIL RemovedPartial (with annotations)

CPython CompatibilityModerate — requires .pyx syntax

Compilation ModelAhead-of-time (AOT) to native binary

GIL RemovedYes

CPython CompatibilityPartial — static subset of Python

Compilation ModelAOT + JIT; superset of Python

GIL RemovedYes

CPython CompatibilityHigh — relies on CPython for full Python ecosystem

The distinction between Codon and PyPy is worth examining specifically. PyPy uses tracing JIT compilation: it observes a program as it runs, identifies the execution paths that are called frequently, and compiles those paths to machine code on the fly. PyPy works within the full dynamic Python runtime and is a genuine drop-in replacement for CPython in the vast majority of programs. Codon compiles the entire program statically before it runs. As the Codon FAQ states, Codon's compilation process is closer to C++ than to Julia, and substantially different from PyPy. The consequence is that Codon can produce faster binaries in compute-intensive scenarios, but cannot run programs that depend on Python's dynamic runtime behaviors. Source: Codon FAQ, Exaloop.

The Codon FAQ also explicitly addresses Mojo, a programming language developed by Modular that is frequently compared to Codon in discussions about high-performance Python. The architectural distinction is fundamental: Mojo aims to be a superset of Python, adding low-level programming constructs while relying on CPython to support the rest of the Python ecosystem. Codon does not attempt to be a superset. It targets Python performance improvements through ahead-of-time type checking and compilation, without introducing new syntax beyond what is needed to express parallelism. As the FAQ puts it, Codon tries to minimize new syntax and language features with respect to Python. The practical consequence is that Codon is more constrained than Mojo — it cannot run programs that require CPython's dynamic features — but it also requires no new language knowledge, only familiarity with Python's existing semantics. Source: Codon FAQ, Exaloop.

JIT Decorator for Incremental Adoption

It is not necessary to compile an entire program with Codon to benefit from it. Codon provides a @codon.jit decorator that marks individual functions for native compilation within an otherwise standard CPython program. Annotate the bottleneck function, and Codon compiles and caches it on first call — no full codebase migration required. The pyext build mode extends this further by compiling Codon modules into importable Python extensions ahead of time, avoiding the first-call compilation overhead entirely. Source: Codon FAQ, Exaloop.

Platform Support, Limitations, and When Not to Use Codon

Codon runs on Linux and macOS, including Apple Silicon. It does not currently run on Windows. As USENIX reviewer Rik Farrow confirmed in his March 2025 review, Windows support remains absent from the current release. This is a practical constraint worth knowing before committing to Codon in a mixed-OS development environment. Source: USENIX, March 2025.

Codon's documentation is explicit: it is not a drop-in replacement for CPython. The compiler enforces static typing, which means it cannot handle programs where variable types change at runtime, or programs that use Python's runtime reflection capabilities — features like getattr, setattr, or dynamic class modification that appear in metaprogramming, some web frameworks, and object-relational mappers.

Not every Python standard library module has been reimplemented in Codon. The compiler supports calling any Python package through its interoperability layer — you can import matplotlib, scikit-learn, or any other library using from python import — but doing so bypasses Codon's native compilation for that library and leaves performance on the table. The full performance benefit comes when both the program logic and its numeric libraries are compiled natively. For codebases where full migration is impractical, Codon offers two incremental paths: the @codon.jit decorator compiles individual functions within an otherwise standard CPython program, and the pyext build mode compiles Codon code into Python extension modules that can be imported directly from CPython — similar to Cython, but without requiring the .pyx syntax. Source: Codon Changelog, Exaloop.

"Codon hasn't reached anything like that yet. It needs more programs." — Saman Amarasinghe, MIT professor and CSAIL principal investigator, on readiness relative to CPython. Source: IEEE Spectrum, 2023

That candid assessment from Codon's own academic supervisor is worth taking seriously. The USENIX March 2025 review confirmed that for workloads dominated by simple data aggregation — filling a dictionary, summing values, processing log files — Codon provided no meaningful speedup over CPython. Farrow's own log-file summarization script, which fills an associative array and prints sorted totals, ran in essentially the same time under Codon as under CPython. Performance gains are concentrated in compute-bound workloads: numerical loops, large array operations, signal processing, and genomic sequence analysis. Programs that spend the majority of their time on I/O or dictionary operations are unlikely to benefit from Codon compilation. Source: USENIX, March 2025.

There is also a practical integer semantics difference to account for. CPython uses arbitrary-precision integers by default — Python integers can grow as large as available memory permits. Codon uses 64-bit integers, matching C and C++ behavior. For the vast majority of scientific and data processing programs, this makes no difference. Code that relies on Python's arbitrary-precision integer behavior, such as cryptographic implementations or very large factorial computations, will require adjustment before compiling with Codon.

Integer Precision Difference

CPython uses arbitrary-width integers. Codon uses 64-bit integers by default. Code that depends on Python's ability to handle integers larger than 2'63 will behave differently under Codon. For scientific and numerical computing this rarely matters, but it is worth verifying for any code that processes cryptographic values or produces very large factorials. Source: Codon FAQ, Exaloop.

GPU Programming in Codon

Codon supports GPU kernel execution through an extension of its existing @par parallel decorator. To execute a loop on the GPU rather than the CPU, pass gpu=True to the decorator:

python

# CPU multithreading — distributes loop iterations across cores
@par
for i in range(N):
    result[i] = compute(data[i])

# GPU offload — executes the loop as a GPU kernel
@par(gpu=True)
for i in range(N):
    result[i] = compute(data[i])

The GPU backend targets NVIDIA hardware. Writing GPU code in Codon requires no CUDA knowledge: the same Python loop syntax used for CPU parallelism maps directly to GPU execution. Codon-NumPy integrates with the GPU backend as well, so NumPy array operations can be offloaded to the GPU using the same decorator. The Exaloop blog demonstrated a Mandelbrot set computation where GPU offloading, combined with Codon-NumPy's compiled array operations, produces speedups in the thousands relative to standard CPython with NumPy — the kind of gain that previously required either hand-written CUDA C or a library like CuPy. Codon-NumPy also integrates with PyTorch: because PyTorch tensors can be converted to NumPy arrays without data copying, existing PyTorch workflows can incorporate Codon-compiled NumPy operations without restructuring the pipeline. Source: Exaloop, January 2025; Codon GitHub repository.

How Codon-NumPy Represents Arrays Internally

Understanding what makes Codon-NumPy fast requires understanding how it represents array data at the compiler level. Standard NumPy's ndarray is a complex C structure with a variable-length fields pointer, a data type object, strides, shape, and reference counting machinery. Every element access involves multiple pointer dereferences and type checks, and the memory layout is opaque to any Python-level optimizer.

Codon-NumPy takes a structurally different approach. Its ndarray type is implemented as a named tuple with three fields: a shape tuple of ndim integers, a strides tuple of ndim integers representing the byte offset to the next element along each axis, and a raw data pointer. Because Codon assigns a concrete element type at compile time — rather than storing a dtype object that is checked at runtime — the compiler knows the element type statically. That static knowledge allows LLVM to eliminate the shape and strides tuples as runtime overhead entirely in many cases, collapsing the array access code to the same sequence of pointer arithmetic that equivalent hand-written C code would produce. Source: Exaloop, January 2025.

Vectorization works differently as well. Standard NumPy ships with hand-tuned C extension implementations of each operation for each supported SIMD instruction set — SSE2, AVX2, AVX-512, NEON — maintained across multiple platform targets. Codon-NumPy does not maintain any of that hand-tuned SIMD code. Instead, it relies on LLVM's auto-vectorizer to analyze the compiled loop and emit optimal SIMD instructions for the host machine at compile time. For transcendental math functions such as cos() and exp(), Codon-NumPy uses Google's Highway library for efficient vectorized implementations — a detail that matters because LLVM's auto-vectorizer does not generate optimal code for complex transcendental functions on its own. The practical advantages of this design are code simplicity, automatic adaptation to new SIMD instruction sets as LLVM gains support for them without any library-level changes, and vectorization of user code as well as library internals — a loop written by the user gets auto-vectorized by the same mechanism that vectorizes NumPy's own operations. The practical risk is that LLVM's auto-vectorizer does not always match the performance of hand-tuned SIMD code for certain specific operation shapes — a tradeoff that is worth benchmarking on your specific workload. Source: Exaloop, January 2025.

What Operator Fusion Actually Removes

When a NumPy expression like ((x - 1)**2 + (y - 1)**2 < 1).sum() runs through standard NumPy, each operator allocates an intermediate array and writes results to it before the next operator reads them. For 500 million elements, that means multiple full-array allocations of 4GB each, all of which must be written to and read from main memory. Codon's optimizer recognizes that none of those intermediate arrays are ever referenced again, fuses the entire expression into a single loop, and eliminates the intermediate allocations. The bottleneck shifts from memory bandwidth to arithmetic throughput — the exact scenario where SIMD instructions on modern CPUs provide the most gain. Source: Exaloop, January 2025.

Energy Consumption: What the 2025 Empirical Study Found

Raw execution time is not the only performance dimension reported in the academic literature. A 2025 study by Stoico et al., presented at the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025), benchmarked Codon against CPython across seven workloads and also measured energy consumption — a metric that matters in cloud compute environments where electrical cost is a real operational variable. Codon, PyPy, and Numba each achieved over 90% improvement in both execution time and energy consumption relative to CPython across the tested workloads. Nuitka demonstrated the most consistent memory usage improvements across hardware configurations, suggesting a different optimization profile than the execution-time leaders. On the n_body benchmark specifically, Codon ran approximately 89 times faster than CPython.

The study also documented variation in last-level cache (LLC) miss rates across benchmarks. For some workloads, Codon produced significantly lower LLC miss rates than CPython — consistent with the memory allocation elimination from operator fusion and the compact array representation. For other workloads, LLC miss rates were similar. This variation is architecturally meaningful: it indicates that Codon's performance advantage in a given program is heavily influenced by how much of the program's time is spent in memory-bound versus compute-bound operations. Programs that are already memory-bandwidth-limited will see smaller gains from Codon than programs that are compute-limited with dense arithmetic. Source: Stoico et al., arXiv:2505.02346, EASE 2025.

For practitioners evaluating Codon in a cloud cost context, the energy findings have a direct financial implication. A program that runs in one-tenth the time and consumes one-tenth the energy on the same hardware is not just faster — it is cheaper per compute unit. The ceiling is the program's actual compute profile: if a script spends 70% of its runtime waiting on network I/O or disk reads, Codon's benefits apply only to the remaining 30%. Profiling the bottleneck before targeting it with Codon is the same discipline required for any performance optimization work.

Key Takeaways

AOT compilation removes Python's fundamental overhead: By compiling to native machine code before runtime, Codon eliminates the bytecode interpreter, per-variable type interrogation, and all associated metadata that CPython carries. The resulting binary runs directly on the CPU with zero interpreter overhead.
The GIL is gone: Codon's native multithreading support via OpenMP allows Python programs to use all available CPU cores in true parallel. The @par decorator distributes loop iterations across threads, and atomic reductions handle synchronization automatically.
Codon-NumPy is the major v0.18 addition: The fully compiled NumPy reimplementation, released in January 2025, enables operator fusion, LLVM-level auto-vectorization, and GPU offloading — capabilities unavailable when calling standard NumPy through Python interoperability mode.
v0.19 upgraded LLVM and expanded Python coverage: The LLVM backend moved from version 17 to version 20, delivering broad performance improvements. A new type-checking engine handles Python patterns the previous checker could not compile: class fields are inferred automatically, forward declarations are no longer required, and functions can be stored and passed more freely. v0.19 also added else on try-statements, corrected nonlocal semantics, expanded format string support, and improved compilation time.
Codon is now fully open source under Apache 2.0: The January 2025 license change means commercial use is permitted without a licensing agreement. Organizations can adopt Codon in production pipelines freely, with enterprise support packages available separately from Exaloop.
Linux and macOS only — no Windows: As of 2025, Codon does not run on Windows. Pre-built binaries are available for Linux (x86_64) and macOS (x86_64 and arm64). This is a real constraint for mixed-OS teams and should be verified before committing to Codon in any environment that includes Windows machines.
Codon-NumPy's ndarray is a named tuple, not a C struct: The three-field named tuple representation — shape, strides, and raw data pointer — gives LLVM full visibility into the array structure. Intermediate shape and stride values are frequently optimized out entirely, leaving pointer arithmetic equivalent to C. The auto-vectorizer, rather than hand-tuned SIMD code, handles the target-specific instruction generation.
Energy consumption improvements match the speed gains: A 2025 study by Stoico et al. (EASE 2025, arXiv:2505.02346) found Codon reduced both execution time and energy consumption by over 90% relative to CPython on compute-bound workloads, with a 89x speedup on the n_body benchmark. In cloud environments, this translates directly to reduced compute cost. The qualification is workload composition: programs dominated by I/O wait or dictionary operations see little benefit from Codon regardless of how compute-intensive their non-I/O logic is.
Codon is not a universal CPython replacement — and it is not Mojo: Codon targets compute-bound workloads in numerical computing, genomics, quantitative finance, and GPU programming. It does not add new language constructs to Python and does not rely on CPython for its runtime. Mojo takes the opposite approach: superset language, CPython fallback. Programs that depend on dynamic typing, runtime reflection, or arbitrary-precision integers require modification or are not suited to Codon compilation at all.

Codon occupies a well-defined and genuinely useful position in the Python ecosystem. It is not trying to make every Python program faster. It is giving domain experts who write compute-intensive Python a verifiable path to C-speed execution without abandoning the language they know. The 2025 releases — v0.18 with Codon-NumPy and the Apache license, and v0.19 with the LLVM 20 backend and expanded type coverage — have made that path considerably more accessible. For anyone running Python programs that loop over large arrays, process genomic sequences, or perform quantitative simulations, Codon is now a serious option that warrants testing against their specific workloads. The Windows limitation and the static typing constraint are real boundaries, but they are clearly documented ones. Within those boundaries, the performance results are real and independently verified.

Sources: Codon GitHub, Exaloop · Codon releases (v0.18, v0.19), GitHub · Exaloop blog, January 30, 2025 · MIT News, March 2023 · IEEE Spectrum, May 2023 · USENIX, March 2025 · USENIX, April 2023 · Codon FAQ, Exaloop · Codon Changelog, Exaloop · Codon compiler paper, ACM SIGPLAN CC '23, DOI 10.1145/3578360.3580275 · Stoico et al., energy & performance study, arXiv:2505.02346, EASE 2025 · Seq, Nature Biotechnology, 2021 · Seq, OOPSLA 2019, DOI 10.1145/3360551