SciPy gives you everything you need to work with real-world signals in Python. Whether you're stripping noise from sensor data, analyzing audio frequency content, or designing digital filters for a data pipeline, scipy.signal and scipy.fft handle the heavy lifting with a clean, consistent API.
Signal processing shows up across more domains than you might expect: biomedical monitoring (ECG, EEG), audio engineering, telecommunications, vibration analysis, and financial time-series work all rely on the same core techniques. Python makes this accessible through SciPy, which as of version 1.17 includes a mature scipy.signal module for filter design and analysis, and a scipy.fft module that outperforms the older numpy.fft on most workloads. This article walks through both, building from first principles to practical noise removal.
Setup and Imports
Before anything else, install the required libraries if you haven't already. SciPy pulls in NumPy automatically, and Matplotlib handles visualization.
pip install scipy numpy matplotlib
Throughout this article, we'll use the following standard imports. It's worth keeping these consistent so your signal processing code stays readable across projects.
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
from scipy.fft import fft, ifft, rfft, irfft, fftfreq, rfftfreq
Use scipy.fft rather than numpy.fft for new code. SciPy's implementation is faster, especially for multi-dimensional arrays, and is the actively maintained path going forward. The older scipy.fftpack module is considered legacy.
The Fast Fourier Transform
The Fourier transform decomposes a signal into its component frequencies. The discrete version of this, the DFT, is computed efficiently using the Fast Fourier Transform (FFT) algorithm. The result tells you which frequencies are present in your signal and at what amplitude. This is fundamental to almost everything else in signal processing.
Start by generating a synthetic signal composed of three sine waves mixed together: one at 5 Hz, one at 20 Hz, and one at 35 Hz. This gives us something to analyze.
# Build a composite signal from three sine waves
sample_rate = 1000 # samples per second
duration = 1.0 # seconds
t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
sig = (
2.0 * np.sin(2 * np.pi * 5 * t) + # 5 Hz, amplitude 2
1.0 * np.sin(2 * np.pi * 20 * t) + # 20 Hz, amplitude 1
0.5 * np.sin(2 * np.pi * 35 * t) # 35 Hz, amplitude 0.5
)
Now compute the FFT and look at the frequency spectrum. The fftfreq function produces the corresponding frequency axis, which you need to interpret the FFT output correctly.
# Compute FFT
spectrum = fft(sig)
freq_bins = fftfreq(len(sig), d=1/sample_rate)
# Only plot the positive frequencies
positive = freq_bins > 0
magnitude = np.abs(spectrum)
plt.figure(figsize=(10, 4))
plt.plot(freq_bins[positive], magnitude[positive], color='#4b8bbe')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.title('Frequency Spectrum of Composite Signal')
plt.grid(alpha=0.2)
plt.tight_layout()
plt.show()
You'll see three distinct peaks at exactly 5, 20, and 35 Hz with amplitudes scaled to 1000, 500, and 250 respectively (the raw FFT output scales with signal length, so divide by len(sig) if you want normalized amplitudes). The key insight here: the FFT reveals what's in a signal that's impossible to see clearly in the time domain alone.
Using rfft for Real-Valued Signals
When your input is real-valued (no imaginary components), use rfft instead of fft. It can be up to twice as fast because it skips computing the redundant negative-frequency half of the spectrum. Pair it with rfftfreq for the matching frequency axis.
# rfft is faster for real-valued input
spectrum_r = rfft(sig)
freq_r = rfftfreq(len(sig), d=1/sample_rate)
plt.plot(freq_r, np.abs(spectrum_r))
plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.show()
Always match your transform to its inverse. If you used rfft, use irfft to go back to the time domain. If you used fft, use ifft. Mixing them produces wrong results without throwing an error.
Designing Digital Filters
Once you can see which frequencies are present, you often want to remove some of them. That's what filters do. SciPy provides two major filter families: FIR (Finite Impulse Response) and IIR (Infinite Impulse Response). Each has trade-offs.
IIR Filters: The Butterworth
IIR filters are computationally efficient and achieve steep roll-off with low filter orders. The Butterworth filter is the standard starting point. It has a maximally flat passband, meaning no ripple in the frequencies you want to keep.
Use signal.butter to design the filter coefficients, then signal.filtfilt to apply it. The filtfilt function applies the filter twice, once forward and once backward, which eliminates phase distortion. This matters a lot when timing relationships between events in your signal are important.
def butter_lowpass(cutoff_hz, sample_rate, order=4):
"""Design a Butterworth low-pass filter."""
nyquist = sample_rate / 2
normalized_cutoff = cutoff_hz / nyquist
b, a = signal.butter(order, normalized_cutoff, btype='low', analog=False)
return b, a
def apply_filter(data, cutoff_hz, sample_rate, order=4):
"""Apply zero-phase Butterworth low-pass filter."""
b, a = butter_lowpass(cutoff_hz, sample_rate, order)
filtered = signal.filtfilt(b, a, data)
return filtered
The cutoff frequency is normalized to the Nyquist frequency (half the sample rate) before being passed to signal.butter. This is a common stumbling block: SciPy's IIR filter functions expect a value between 0 and 1, where 1.0 represents the Nyquist frequency, not the sample rate.
For IIR filters, the cutoff is defined at half-power (-3 dB). For FIR filters designed with signal.firwin, the cutoff is at half-amplitude (-6 dB). These are different conventions, so verify which one applies when comparing filter responses.
Filter Types: Low-Pass, High-Pass, Band-Pass
Changing the btype argument to signal.butter lets you design all the standard filter shapes. A band-pass filter requires two cutoff frequencies passed as a list.
# Low-pass: keep frequencies below cutoff
b_lp, a_lp = signal.butter(4, 0.1, btype='low')
# High-pass: keep frequencies above cutoff
b_hp, a_hp = signal.butter(4, 0.3, btype='high')
# Band-pass: keep frequencies between low and high
b_bp, a_bp = signal.butter(4, [0.05, 0.3], btype='bandpass')
# Band-stop (notch): reject a band of frequencies
b_bs, a_bs = signal.butter(4, [0.09, 0.11], btype='bandstop')
Checking Filter Response
Before applying a filter to real data, visualize its frequency response with signal.freqz. This shows you exactly which frequencies pass and which are attenuated, letting you tune the order and cutoff before committing.
b, a = signal.butter(4, 0.1, btype='low')
w, h = signal.freqz(b, a, worN=8000)
plt.figure(figsize=(10, 4))
plt.plot(w / np.pi, 20 * np.log10(np.abs(h)), color='#4b8bbe')
plt.axvline(0.1, color='#FFD43B', linestyle='--', label='Cutoff')
plt.xlabel('Normalized Frequency (x pi rad/sample)')
plt.ylabel('Amplitude (dB)')
plt.title('Butterworth Low-Pass Filter Response')
plt.legend()
plt.grid(alpha=0.2)
plt.show()
Applying Filters to Noisy Signals
Now put it all together with a practical example: generating a clean signal, adding Gaussian noise, and using a Butterworth low-pass filter to recover the original.
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
# Signal parameters
sample_rate = 1000 # Hz
duration = 1.0 # seconds
t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
# Clean signal: 5 Hz sine wave
clean = np.sin(2 * np.pi * 5 * t)
# Add Gaussian noise
rng = np.random.default_rng(seed=42)
noise = rng.normal(0, 0.5, size=clean.shape)
noisy = clean + noise
# Design and apply a 15 Hz low-pass Butterworth filter
nyquist = sample_rate / 2
cutoff = 15 # Hz
b, a = signal.butter(4, cutoff / nyquist, btype='low')
filtered = signal.filtfilt(b, a, noisy)
# Plot
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)
axes[0].plot(t, clean, color='#98c379', label='Clean Signal')
axes[0].set_title('Original Signal')
axes[0].legend()
axes[1].plot(t, noisy, color='#e06c75', alpha=0.7, label='Noisy Signal')
axes[1].set_title('After Adding Noise')
axes[1].legend()
axes[2].plot(t, filtered, color='#4b8bbe', label='Filtered Signal')
axes[2].set_title('After Low-Pass Filter (15 Hz cutoff)')
axes[2].set_xlabel('Time (s)')
axes[2].legend()
plt.tight_layout()
plt.show()
The filter recovers the underlying 5 Hz sine wave cleanly because the noise energy is spread across all frequencies, and the 15 Hz cutoff removes everything above it while the 5 Hz component sits well inside the passband.
Avoid filtering by directly zeroing out FFT bins. This creates abrupt cutoffs in the frequency domain, which produce ringing artifacts in the time domain. Use signal.butter and signal.filtfilt for real-world signals. FFT-domain zeroing is useful for understanding concepts but should not be used in production pipelines.
Comparing lfilter vs filtfilt
SciPy provides two main functions for applying IIR filters: signal.lfilter and signal.filtfilt. Knowing when to use each matters.
# lfilter: causal, introduces phase delay
filtered_causal = signal.lfilter(b, a, noisy)
# filtfilt: zero-phase, no delay, requires full signal
filtered_zerophase = signal.filtfilt(b, a, noisy)
lfilter is a causal filter — it only uses past samples, making it suitable for real-time processing where future samples aren't available. It introduces a phase delay proportional to the filter order. filtfilt processes the signal twice (forward and backward) to cancel phase distortion, but requires the entire signal in memory. Use filtfilt for offline analysis; use lfilter for streaming or real-time applications.
Spectrograms and Time-Frequency Analysis
The FFT gives you the overall frequency content of a signal, but it tells you nothing about when those frequencies occur. A spectrogram solves this by computing the FFT over short overlapping windows as it slides across the signal. The result is a 2D map of frequency content over time.
SciPy provides this through signal.ShortTimeFFT (the modern API introduced in SciPy 1.12) as well as the legacy signal.spectrogram function. Here's a complete example using the more accessible legacy function, which remains fully supported.
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
# Create a chirp signal: frequency sweeps from 1 Hz to 50 Hz over 10 seconds
sample_rate = 400 # Hz
t = np.linspace(0, 10, 10 * sample_rate, endpoint=False)
chirp = signal.chirp(t, f0=1, f1=50, t1=10, method='linear')
# Add some noise
rng = np.random.default_rng(seed=0)
chirp_noisy = chirp + 0.3 * rng.standard_normal(chirp.shape)
# Compute spectrogram
frequencies, times, Sxx = signal.spectrogram(
chirp_noisy,
fs=sample_rate,
window='hann',
nperseg=256,
noverlap=200
)
# Plot
plt.figure(figsize=(12, 5))
plt.pcolormesh(times, frequencies, 10 * np.log10(Sxx), shading='gouraud', cmap='inferno')
plt.colorbar(label='Power (dB)')
plt.ylabel('Frequency (Hz)')
plt.xlabel('Time (s)')
plt.title('Spectrogram of Chirp Signal')
plt.tight_layout()
plt.show()
The chirp sweeps linearly from 1 Hz to 50 Hz, and the spectrogram makes that sweep visible as a diagonal line rising from bottom-left to top-right. This kind of time-frequency view is essential when analyzing signals where content changes over time, such as speech, engine vibration signatures, or seismic recordings.
Tuning the Spectrogram Parameters
The nperseg and noverlap parameters control the trade-off between time resolution and frequency resolution. Shorter windows give better time resolution but smear frequency content; longer windows resolve frequencies more precisely but blur timing.
- nperseg is the number of samples per window segment. Common values are powers of 2 (256, 512, 1024) for FFT efficiency.
- noverlap is how many samples adjacent windows share. A value of 75% overlap (e.g., 192 out of 256) is a standard starting point.
- window applies a windowing function to each segment to reduce spectral leakage. The Hann window is the safe default for general use.
Spectral leakage happens because the FFT assumes your signal is periodic within the window. Windowing (multiplying by a Hann, Hamming, or Blackman window) tapers the signal edges, significantly reducing leakage. Always window your FFT segments unless you have a specific reason not to.
Welch's Method for Power Spectral Density
When you care about the average frequency content across a signal rather than how it changes over time, Welch's method gives a smoother, more reliable power spectral density estimate than a single FFT. It averages the FFT results from overlapping segments.
# Welch's method: averaged PSD estimate
f_welch, psd = signal.welch(
noisy,
fs=sample_rate,
window='hann',
nperseg=256,
noverlap=128
)
plt.figure(figsize=(10, 4))
plt.semilogy(f_welch, psd, color='#4b8bbe')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Power Spectral Density (V^2/Hz)')
plt.title("Welch's PSD Estimate")
plt.grid(alpha=0.2)
plt.show()
Using a logarithmic y-axis (semilogy) is standard for PSD plots because signal power spans several orders of magnitude across frequencies. Welch's method is widely used in vibration analysis, audio engineering, and any domain where you need a stable estimate of a signal's frequency profile.
Key Takeaways
- Use scipy.fft, not numpy.fft: The SciPy implementation is faster for most workloads and is the actively maintained module. Use
rfftwhen your input is real-valued to get up to twice the speed. - Design filters with scipy.signal.butter: The Butterworth filter is a reliable general-purpose choice. Normalize your cutoff to the Nyquist frequency, not the sample rate, when calling
signal.butter. - Choose filtfilt for offline analysis, lfilter for real-time:
filtfilteliminates phase distortion but requires the full signal.lfilteris causal and processes sample by sample. - Visualize the filter response before applying it:
signal.freqzshows you exactly what your filter does before it touches your data. This catches design mistakes early. - Use spectrograms for time-varying signals: A single FFT hides when frequencies occur. Use
signal.spectrogramorsignal.ShortTimeFFTwhen timing matters, andsignal.welchfor stable average power estimates.
Signal processing with SciPy rewards learning the fundamentals. Once you understand what the FFT is actually telling you and what a filter is actually doing, the API becomes straightforward. The patterns covered here — transform, analyze, filter, reconstruct — scale from toy examples to production data pipelines with little modification.