Python fftconvolve: Fast FFT-Based Convolution with SciPy

Convolution is one of the foundational operations in signal processing, image manipulation, and data analysis. When working with large arrays, the standard direct convolution approach can become painfully slow. That is where scipy.signal.fftconvolve comes in, using the Fast Fourier Transform to dramatically speed up the computation.

Convolution combines two signals to produce a third signal that expresses how one modifies the shape of the other. In mathematical terms, it slides one function across another, multiplying and summing at each position. This operation appears everywhere from audio filtering and edge detection to probability distribution calculations and neural networks. However, performing convolution directly on large datasets is computationally expensive. The fftconvolve function in SciPy solves this problem by exploiting a mathematical property: convolution in the time domain is equivalent to multiplication in the frequency domain.

What Is fftconvolve and Why Use It

The scipy.signal.fftconvolve function convolves two N-dimensional arrays using the Fast Fourier Transform (FFT) method. Instead of computing the convolution directly by sliding one array across another (an operation with O(n*m) complexity for arrays of length n and m), it transforms both arrays into the frequency domain, multiplies them element-wise, and then transforms the result back. This approach has O(n log n) complexity, which makes it substantially faster for large inputs.

Direct convolution works by iterating through every possible overlap position between two arrays. For small arrays, this is perfectly fine. But once array sizes grow beyond a few hundred elements, the number of multiply-accumulate operations grows rapidly. The FFT-based method avoids this by taking advantage of the Convolution Theorem, which states that the Fourier transform of a convolution equals the product of the individual Fourier transforms.

Note

As of SciPy v0.19, the general scipy.signal.convolve function automatically chooses between direct convolution and fftconvolve based on which method it estimates will be faster. You can still call fftconvolve directly when you know the FFT approach is the right choice for your data.

The general rule of thumb is that fftconvolve outperforms direct convolution when both input arrays have more than roughly 500 elements. For smaller arrays or when only a few output values are needed, the overhead of computing two FFTs and an inverse FFT can make the direct method faster.

Syntax, Parameters, and Output Modes

The function signature is straightforward:

scipy.signal.fftconvolve(in1, in2, mode='full', axes=None)

Here is what each parameter does:

in1 is the first input array. It can be any array-like object and supports any number of dimensions. in2 is the second input array. It must have the same number of dimensions as in1, though the sizes along each dimension can differ.

The mode parameter controls the size of the output array and accepts three string values:

  • 'full' (default) — Returns the complete discrete linear convolution. For 1D arrays of length n and m, the output length is n + m - 1. This includes all positions where the two arrays overlap, including partial overlaps at the edges where zero-padding fills in the gaps.
  • 'same' — Returns an output array with the same shape as in1. The result is centered relative to the full convolution output. This mode is particularly useful in image processing where you want the output to match the dimensions of the input.
  • 'valid' — Returns only the elements of the convolution where both arrays fully overlap, with no zero-padding involved. For this mode to work, one input must be at least as large as the other in every dimension.

The axes parameter specifies which axes to compute the convolution over. When set to None (the default), the convolution is computed over all axes. This is useful when you have multidimensional data but only want to convolve along specific dimensions.

Here is a quick example showing each mode in action:

import numpy as np
from scipy.signal import fftconvolve

signal = np.array([1, 2, 3, 4, 5])
kernel = np.array([1, 0, -1])

full_result = fftconvolve(signal, kernel, mode='full')
same_result = fftconvolve(signal, kernel, mode='same')
valid_result = fftconvolve(signal, kernel, mode='valid')

print("Full:  ", full_result)   # length = 5 + 3 - 1 = 7
print("Same:  ", same_result)   # length = 5 (matches signal)
print("Valid: ", valid_result)   # length = 5 - 3 + 1 = 3

The output will be:

Full:   [ 1.  2.  2.  2.  2. -4. -5.]
Same:   [ 2.  2.  2.  2. -4.]
Valid:  [2. 2. 2.]
Pro Tip

The fftconvolve function always outputs float arrays, even when both inputs are integers. If your downstream code expects integer arrays, you will need to cast the result with .astype(int) or np.round() followed by a cast.

Practical Examples

Autocorrelation of a Signal

One common use of fftconvolve is computing the autocorrelation of a signal, which measures how similar a signal is to a delayed version of itself. To do this, you convolve the signal with its time-reversed copy:

import numpy as np
from scipy.signal import fftconvolve

rng = np.random.default_rng(42)
sig = rng.standard_normal(1000)

# Autocorrelation via fftconvolve
autocorr = fftconvolve(sig, sig[::-1], mode='full')

# The peak will be at the center (zero lag)
center = len(autocorr) // 2
print(f"Peak at index {center}: {autocorr[center]:.4f}")
print(f"This equals the signal energy: {np.sum(sig**2):.4f}")

The autocorrelation peak at zero lag equals the total energy of the signal. This technique is widely used in communications, radar processing, and time series analysis to detect repeating patterns in noisy data.

1D Signal Smoothing

Convolution with a uniform kernel produces a moving average, which is one of the simplest ways to smooth noisy data:

import numpy as np
from scipy.signal import fftconvolve

# Create a noisy sine wave
rng = np.random.default_rng(42)
t = np.linspace(0, 4 * np.pi, 2000)
clean = np.sin(t)
noisy = clean + 0.5 * rng.standard_normal(len(t))

# Smooth with a moving average kernel
window_size = 50
kernel = np.ones(window_size) / window_size
smoothed = fftconvolve(noisy, kernel, mode='same')

print(f"Original noise std:  {np.std(noisy - clean):.4f}")
print(f"Smoothed noise std:  {np.std(smoothed - clean):.4f}")

The moving average filter reduces the noise standard deviation significantly. For 2,000 data points, fftconvolve handles this convolution noticeably faster than a direct implementation would.

2D Gaussian Blur on Image Data

The fftconvolve function works seamlessly with multidimensional arrays. Here is how to apply a Gaussian blur to a 2D image array:

import numpy as np
from scipy.signal import fftconvolve
from scipy.signal.windows import gaussian

# Create a 2D Gaussian kernel
kernel_1d = gaussian(70, std=8)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_2d /= kernel_2d.sum()  # Normalize

# Apply to a sample image array (replace with your own data)
rng = np.random.default_rng(42)
image = rng.random((512, 512))
blurred = fftconvolve(image, kernel_2d, mode='same')
Warning

When using fftconvolve for image blurring, the output will have dark borders caused by zero-padding beyond the image boundaries. If you need other boundary treatments (such as reflection or wrapping), consider using scipy.ndimage.convolve or scipy.signal.convolve2d with the boundary parameter, though these alternatives are slower.

Selective Axis Convolution

The axes parameter lets you convolve along specific dimensions only. This is useful when processing batched data where each row represents an independent signal:

import numpy as np
from scipy.signal import fftconvolve

# 5 independent signals, each with 1000 samples
rng = np.random.default_rng(42)
batch = rng.standard_normal((5, 1000))

# Smooth each signal independently along axis 1
kernel = np.ones((1, 50)) / 50
smoothed = fftconvolve(batch, kernel, mode='same', axes=1)

print(f"Input shape:  {batch.shape}")
print(f"Output shape: {smoothed.shape}")

Without the axes parameter, fftconvolve would attempt to convolve across both dimensions simultaneously, which is not what you want when processing independent signals in batch.

fftconvolve vs convolve vs oaconvolve

SciPy provides three main convolution functions, each suited to different scenarios:

scipy.signal.convolve is the general-purpose function. Since version 0.19, it uses an internal heuristic (choose_conv_method) to automatically select between direct computation and FFT-based computation. When you are unsure which method is best for your data, this function is a reasonable default choice.

scipy.signal.fftconvolve always uses the FFT approach. Call this directly when you know your arrays are large enough to benefit from FFT-based computation, or when you want deterministic control over the algorithm being used.

scipy.signal.oaconvolve uses the overlap-add method. This approach breaks the convolution into smaller segments and is typically faster than fftconvolve when the two input arrays differ greatly in size. For example, convolving a million-sample audio signal with a 100-tap filter is an ideal use case for oaconvolve.

Here is a quick comparison to illustrate performance differences:

import numpy as np
from scipy.signal import convolve, fftconvolve, oaconvolve
import time

rng = np.random.default_rng(42)

# Large signal, small kernel (oaconvolve territory)
signal_large = rng.standard_normal(500_000)
kernel_small = rng.standard_normal(100)

for func in [convolve, fftconvolve, oaconvolve]:
    start = time.perf_counter()
    result = func(signal_large, kernel_small, mode='same')
    elapsed = time.perf_counter() - start
    print(f"{func.__name__:15s}: {elapsed:.4f}s")

print()

# Two similarly sized arrays (fftconvolve territory)
a = rng.standard_normal(50_000)
b = rng.standard_normal(50_000)

for func in [convolve, fftconvolve, oaconvolve]:
    start = time.perf_counter()
    result = func(a, b, mode='full')
    elapsed = time.perf_counter() - start
    print(f"{func.__name__:15s}: {elapsed:.4f}s")

The results will vary depending on your system, but as a general pattern: oaconvolve excels when the inputs are very different in size, while fftconvolve tends to win when both inputs are similarly large. The auto-selecting convolve function tries to pick the best option for you but introduces slight overhead from the decision-making process.

Common Pitfalls and Best Practices

NaN and Inf Values

One of the trickiest aspects of fftconvolve is its behavior with special floating-point values. If either input array contains any NaN or Inf values, the entire output will be NaN or Inf. This happens because the FFT spreads every input value across all frequency bins, so a single problematic value contaminates the entire result.

import numpy as np
from scipy.signal import fftconvolve, convolve

data = np.array([1.0, 2.0, np.nan, 4.0, 5.0])
kernel = np.array([1.0, 1.0, 1.0])

# FFT method: entire output is NaN
fft_result = fftconvolve(data, kernel, mode='same')
print("FFT result:", fft_result)  # [nan, nan, nan, nan, nan]

# Direct method: NaN stays localized
direct_result = convolve(data, kernel, mode='same', method='direct')
print("Direct result:", direct_result)  # [3., nan, nan, nan, 9.]

If your data contains missing values, either clean them before convolving (using interpolation or replacement) or use the direct convolution method where the NaN impact remains localized.

Float Output Type

Because the FFT operates in the complex-valued frequency domain, fftconvolve always returns floating-point arrays. Integer and object-type inputs are automatically cast to float before computation. This means you may see tiny floating-point artifacts (like 2.9999999997 instead of 3) in results that should theoretically be exact integers.

Array API Backend Support

As of SciPy v1.17, fftconvolve has experimental support for the Python Array API Standard. This means you can pass in CuPy, PyTorch, JAX, or Dask arrays instead of NumPy arrays, enabling GPU acceleration or distributed computation without changing your code. To enable this feature, set the environment variable SCIPY_ARRAY_API=1 before importing SciPy.

# Enable Array API support
import os
os.environ["SCIPY_ARRAY_API"] = "1"

import cupy as cp
from scipy.signal import fftconvolve

# Convolution runs on GPU via CuPy
gpu_signal = cp.random.randn(1_000_000)
gpu_kernel = cp.random.randn(1_000)
gpu_result = fftconvolve(gpu_signal, gpu_kernel, mode='same')

Key Takeaways

  1. Speed through FFT: fftconvolve transforms the O(n*m) convolution operation into O(n log n) by working in the frequency domain, making it dramatically faster for arrays larger than roughly 500 elements.
  2. Three output modes: Use 'full' for the complete convolution result, 'same' to match the first input's shape, and 'valid' for only the fully overlapping portion.
  3. Watch for NaN contamination: A single NaN or Inf in either input will corrupt the entire output. Clean your data first, or fall back to direct convolution with method='direct'.
  4. Choose the right tool: Use fftconvolve for similarly sized arrays, oaconvolve when inputs differ greatly in size, and convolve with method='auto' when you want SciPy to decide.
  5. GPU and backend support: With SciPy's experimental Array API support, you can accelerate fftconvolve with CuPy, JAX, or PyTorch backends by setting SCIPY_ARRAY_API=1.

The fftconvolve function is one of those tools that sits at the intersection of elegance and practicality. The mathematical insight behind it -- that convolution becomes simple multiplication in the frequency domain -- translates directly into real-world performance gains. Whether you are smoothing sensor data, blurring images, computing cross-correlations, or building custom filters, fftconvolve gives you a fast, reliable path to getting the job done.

back to articles