Python Compression Algorithms: A Complete Guide from zlib to Zstandard

Python ships with a full toolkit of compression algorithms right out of the box. Whether you need to shrink log files, compress network payloads, or archive datasets, the standard library has you covered with five distinct compression modules—and Python 3.14 just added a sixth. This guide walks through every built-in option with working code so you can pick the right algorithm for your project.

Data compression is one of those capabilities that shows up in nearly every serious Python project. You might be compressing HTTP responses, packing data before writing it to disk, or reducing the size of a payload before sending it across a network. Python's standard library has long included modules for zlib, gzip, bz2, and lzma. Now, with Python 3.14, the Zstandard algorithm joins the family through PEP 784, along with a new unified compression package that organizes all of these modules under one roof.

How Compression Works in Python

All of Python's built-in compression modules follow a similar pattern. Each one provides at minimum two core functions: one to compress raw bytes and one to decompress them back. Many also offer file-like interfaces that let you read and write compressed files as though they were ordinary files.

Every algorithm in the standard library performs lossless compression, meaning the decompressed output is byte-for-byte identical to the original input. No data is lost. The algorithms differ in how aggressively they search for patterns, how much memory they use, and how fast they run.

Note

All compression functions in Python operate on bytes objects, not strings. If you need to compress text, encode it first with .encode("utf-8") and decode the result after decompression with .decode("utf-8").

The Classic Algorithms: zlib, gzip, and bz2

zlib — The DEFLATE Foundation

The zlib module wraps the DEFLATE algorithm, which is the same compression method used by ZIP files and PNG images. It provides the fastest compression in the standard library and is ideal when speed matters more than achieving the smallest possible output.

import zlib

# Compress a bytes object
original = b"Python compression is powerful. " * 100
compressed = zlib.compress(original, level=6)

print(f"Original size:   {len(original):,} bytes")
print(f"Compressed size: {len(compressed):,} bytes")
print(f"Ratio:           {len(compressed) / len(original):.2%}")

# Decompress back to the original
restored = zlib.decompress(compressed)
assert restored == original

The level parameter ranges from 1 (fastest, least compression) to 9 (slowest, best compression). The default level of 6 is a balanced middle ground. You can also pass zlib.Z_BEST_SPEED or zlib.Z_BEST_COMPRESSION for readability.

gzip — File-Oriented DEFLATE

The gzip module uses the same DEFLATE algorithm as zlib, but wraps the output in the gzip file format. This adds headers with metadata like timestamps and filenames. It is the standard format for compressed files on Unix-like systems and is widely used in HTTP content encoding.

import gzip

data = b"Repetitive data compresses well. " * 500

# Compress and write to a .gz file
with gzip.open("output.gz", "wb") as f:
    f.write(data)

# Read and decompress from a .gz file
with gzip.open("output.gz", "rb") as f:
    restored = f.read()

assert restored == data

# In-memory compression without touching the filesystem
compressed = gzip.compress(data, compresslevel=9)
restored = gzip.decompress(compressed)

The file interface with gzip.open() is particularly useful because it behaves like a regular file object. You can pass it to any function that expects a file handle, and the compression or decompression happens transparently.

bz2 — Better Ratios, Slower Speed

The bz2 module implements the Burrows-Wheeler compression algorithm. It typically achieves better compression ratios than zlib or gzip, especially on text-heavy data, but at the cost of slower compression and decompression speeds. It also uses more memory during the compression process.

import bz2

text = b"The bz2 algorithm excels at compressing text data. " * 500

# Simple one-shot compression
compressed = bz2.compress(text, compresslevel=9)

print(f"Original:   {len(text):,} bytes")
print(f"Compressed: {len(compressed):,} bytes")

# File-based usage
with bz2.open("archive.bz2", "wb") as f:
    f.write(text)

# Incremental compression for streaming data
compressor = bz2.BZ2Compressor(compresslevel=9)
chunks = []
for i in range(0, len(text), 1024):
    chunk = compressor.compress(text[i:i+1024])
    if chunk:
        chunks.append(chunk)
chunks.append(compressor.flush())
compressed_stream = b"".join(chunks)

Pro Tip

The incremental BZ2Compressor and BZ2Decompressor classes are essential when working with streaming data or files too large to fit in memory. Feed data in chunks, and collect the compressed output as it becomes available.

High-Ratio Compression with lzma

The lzma module provides the highest compression ratios in the standard library. It supports the .xz file format used by the xz utility as well as the legacy .lzma format. The tradeoff is that lzma is the slowest of the classic algorithms, and it uses significantly more memory during compression.

import lzma

data = b"LZMA achieves the best compression ratios. " * 1000

# Basic compression with default settings
compressed = lzma.compress(data)
print(f"Original:   {len(data):,} bytes")
print(f"Compressed: {len(compressed):,} bytes")

# File-based usage with the .xz format
with lzma.open("archive.xz", "wb") as f:
    f.write(data)

with lzma.open("archive.xz", "rb") as f:
    restored = f.read()

# Custom filter chains for fine-tuned control
custom_filters = [
    {"id": lzma.FILTER_DELTA, "dist": 5},
    {"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME},
]
compressed_custom = lzma.compress(data, filters=custom_filters)

The preset parameter accepts values from 0 to 9. Higher values use more memory and CPU time but produce smaller output. You can combine any preset with the lzma.PRESET_EXTREME flag to squeeze out a few extra bytes at the cost of even more processing time. Custom filter chains let you stack preprocessing filters like FILTER_DELTA before the main LZMA2 compressor for specialized data types.

Warning

At high preset levels, lzma can consume several hundred megabytes of RAM. Be cautious when using presets 7–9 on memory-constrained systems or when compressing many files concurrently.

Zstandard: Python 3.14's New Powerhouse

Python 3.14 introduced native support for the Zstandard compression algorithm through PEP 784. Developed by Yann Collet at Meta, Zstandard (often shortened to zstd) is designed to deliver compression ratios comparable to zlib while running at significantly higher speeds. It has become one of the preferred compression algorithms in modern systems for its balance of speed and effectiveness.

The new compression.zstd module is now part of the standard library, removing the need for third-party packages like zstandard or zstd from PyPI.

from compression import zstd

data = b"Zstandard offers an excellent speed-to-ratio tradeoff. " * 1000

# Basic compression and decompression
compressed = zstd.compress(data)
restored = zstd.decompress(compressed)

print(f"Original:   {len(data):,} bytes")
print(f"Compressed: {len(compressed):,} bytes")
assert restored == data

# File-based usage
with zstd.open("archive.zst", "wb") as f:
    f.write(data)

with zstd.open("archive.zst", "rb") as f:
    restored = f.read()

One of Zstandard's standout features is its support for dictionary-based compression. If you are compressing many small, similar pieces of data—like JSON API responses, log entries, or database records—you can train a dictionary on sample data and then use it to dramatically improve compression of individual items.

from compression import zstd

# Train a dictionary from sample data
samples = [
    b'{"user_id": 1001, "action": "login", "timestamp": "2026-03-08"}',
    b'{"user_id": 1002, "action": "purchase", "timestamp": "2026-03-08"}',
    b'{"user_id": 1003, "action": "logout", "timestamp": "2026-03-08"}',
    b'{"user_id": 1004, "action": "login", "timestamp": "2026-03-07"}',
    b'{"user_id": 1005, "action": "search", "timestamp": "2026-03-07"}',
] * 20  # zstd needs a reasonable number of samples

dictionary = zstd.train_dict(samples, dict_size=4096)

# Compress new data using the trained dictionary
new_entry = b'{"user_id": 2001, "action": "purchase", "timestamp": "2026-03-08"}'

compressed_without = zstd.compress(new_entry)
compressed_with = zstd.compress(new_entry, zstd_dict=dictionary)

print(f"Without dictionary: {len(compressed_without)} bytes")
print(f"With dictionary:    {len(compressed_with)} bytes")

Streaming Compression with ZstdCompressor

Zstandard also supports incremental compression through the ZstdCompressor class. Unlike gzip or bz2, Zstandard's incremental mode maintains internal state between chunks, making it well-suited for compressing streaming data or implementing real-time compression pipelines.

from compression.zstd import ZstdCompressor, ZstdDecompressor

# Streaming compression
compressor = ZstdCompressor()
decompressor = ZstdDecompressor()

chunks_to_compress = [
    b"First chunk of streaming data. ",
    b"Second chunk arrives later. ",
    b"Third and final chunk. ",
]

compressed_parts = []
for chunk in chunks_to_compress:
    result = compressor.compress(chunk)
    if result:
        compressed_parts.append(result)

# Flush remaining data
compressed_parts.append(compressor.flush())
compressed_stream = b"".join(compressed_parts)

Note

The compression.zstd module requires Python 3.14 or later. If you are on an earlier version of Python, you can install the third-party zstandard package from PyPI, which provides a similar (though not identical) API.

The New compression Package

Along with adding Zstandard support, Python 3.14 introduced the compression package as a unified namespace for all compression modules. This means you can now import any compression algorithm through the compression package, while the original module names continue to work for backward compatibility.

# New unified imports (Python 3.14+)
from compression import zlib
from compression import gzip
from compression import bz2
from compression import lzma
from compression import zstd

# Legacy imports still work
import zlib
import gzip
import bz2
import lzma

The compression package does not change any existing APIs. It simply provides a consistent way to discover and import compression modules. This also sets the stage for future compression algorithms to be added under the same namespace without running into naming conflicts with third-party packages on PyPI.

Comparing All Five Algorithms

Choosing the right compression algorithm depends on your specific requirements. Here is a practical comparison across the key dimensions that matter when making that decision.

Algorithm	Module	Speed	Ratio	Memory	Best For
DEFLATE	`zlib`	Fast	Good	Low	General purpose, network protocols
gzip	`gzip`	Fast	Good	Low	File archiving, HTTP compression
bzip2	`bz2`	Slow	Very Good	Moderate	Text-heavy data, archival storage
LZMA	`lzma`	Very Slow	Excellent	High	Maximum compression, software distribution
Zstandard	`compression.zstd`	Very Fast	Very Good	Moderate	Real-time systems, streaming, small data

Here is a practical script that benchmarks all five algorithms against the same data so you can see the differences on your own machine.

import zlib
import gzip
import bz2
import lzma
import time
import sys

# For Python 3.14+, uncomment the next line:
# from compression import zstd

def benchmark(name, compress_fn, decompress_fn, data):
    start = time.perf_counter()
    compressed = compress_fn(data)
    compress_time = time.perf_counter() - start

    start = time.perf_counter()
    restored = decompress_fn(compressed)
    decompress_time = time.perf_counter() - start

    assert restored == data
    ratio = len(compressed) / len(data)

    print(f"{name:12s} | {len(compressed):>10,} bytes | "
          f"ratio: {ratio:.4f} | "
          f"compress: {compress_time:.4f}s | "
          f"decompress: {decompress_time:.4f}s")

# Generate test data (mix of repetitive and random-ish content)
test_data = (b"Benchmark test with repeated content. " * 5000
             + bytes(range(256)) * 200)

print(f"Original size: {len(test_data):,} bytes")
print("-" * 78)

benchmark("zlib",  zlib.compress,  zlib.decompress,  test_data)
benchmark("gzip",  gzip.compress,  gzip.decompress,  test_data)
benchmark("bz2",   bz2.compress,   bz2.decompress,   test_data)
benchmark("lzma",  lzma.compress,  lzma.decompress,   test_data)

# Uncomment for Python 3.14+:
# benchmark("zstd",  zstd.compress,  zstd.decompress,  test_data)

Key Takeaways

Use zlib or gzip when speed is the priority. They share the same underlying DEFLATE algorithm and are the fastest options in the standard library. Choose gzip when you need the gzip file format; choose zlib when you need raw compressed bytes.
Use bz2 for better ratios on text data. The Burrows-Wheeler algorithm trades speed for compression efficiency, making it a solid choice for archiving large text files or datasets where storage savings outweigh processing time.
Use lzma when every byte counts. It delivers the best compression ratios but is the slowest and hungriest for memory. Ideal for software distribution packages and long-term archival where you compress once and decompress many times.
Use Zstandard for the best all-around performance. New in Python 3.14, the compression.zstd module delivers compression ratios close to bz2 at speeds that rival or exceed zlib. Its dictionary support makes it particularly effective for compressing many small, similar data items.
Use the new compression package on Python 3.14+. The unified compression namespace provides a clean organizational structure and avoids naming collisions with third-party packages. Legacy imports remain supported for backward compatibility.

Python's compression toolkit covers the full spectrum from fast-and-light to slow-and-thorough. With the addition of Zstandard in Python 3.14 and the new compression namespace, the standard library now offers a modern, high-performance option that fills the gap between raw speed and maximum compression. Match the algorithm to your workload—real-time applications benefit from zstd or zlib, while archival tasks can lean on lzma or bz2—and you will get the best results without reaching for any third-party packages.