How to Debug Python Code: A Complete, Practical Guide

Every developer eventually comes face-to-face with a piece of code that does not do what they expected. The variable holds the wrong value. The loop runs one time too many. The function returns None instead of a dictionary. Welcome to debugging — the craft that, by some estimates, consumes roughly 75% of a developer's working time. This guide covers not just the standard tools you will find in any reference manual, but the cognitive strategies behind effective debugging and the precise language-level mechanisms Python provides to make the hunt shorter and more systematic.

Brian Kernighan, co-author of The Elements of Programming Style (with P.J. Plauger, 2nd edition, 1978), captured the fundamental challenge: he observed that debugging is at least twice as difficult as writing code, and if you write code at the edge of your ability, you will lack the capacity to debug it. The lesson is quiet but persistent: write code that is readable, and you will make your future debugging sessions dramatically less painful. (Source: Brian W. Kernighan and P.J. Plauger, The Elements of Programming Style, 2nd ed., McGraw-Hill, 1978, Chapter 2.)

Python, by design, leans heavily into readability. Guido van Rossum, in a Dropbox blog interview titled "The Mind at Work" (2019), explained that code is written primarily to communicate with other coders. That philosophy permeates every layer of Python's debugging ecosystem, from its standard library tools to the precision of the error messages the interpreter itself produces. (Source: "The Mind at Work: Guido van Rossum on how Python makes thinking in code easier," Dropbox Blog, 2019.)

This article walks you through the real techniques Python developers use to find and fix bugs. Not a list of tips you will forget tomorrow — real code, real examples, real understanding. Where this guide differs from other debugging references: every technique is mapped to a specific Python version, every PEP is cited with its authors, and the cognitive reasoning behind each approach is explained so you internalize the why, not just the how.

Start With What Python Already Tells You: Reading Tracebacks

Before reaching for any debugging tool, learn to actually read the traceback Python gives you when something goes wrong. A traceback is not a wall of noise; it is a map, and it reads from bottom to top. This is a fundamental cognitive shift that separates experienced Python developers from beginners: the instinct is to look at the top, but the answer lives at the bottom.

Consider this code:

def calculate_average(numbers):
    total = sum(numbers)
    return total / len(numbers)

def process_report(data):
    scores = data.get("scores")
    avg = calculate_average(scores)
    return {"average": avg}

result = process_report({"name": "Q1 Report"})

Running this produces:

Traceback (most recent call last):
  File "report.py", line 10, in <module>
    result = process_report({"name": "Q1 Report"})
  File "report.py", line 7, in process_report
    avg = calculate_average(scores)
  File "report.py", line 2, in calculate_average
    total = sum(numbers)
TypeError: 'NoneType' object is not iterable

The last line is the actual error. The frames above it show you the call chain that led to the failure. Read it: process_report called calculate_average with scores, which was None because the dictionary had no "scores" key. The bug is not in the sum() call — it is in the missing data validation. This is a critical distinction: the line where the error appears and the line where the error originates are often different places. Effective debugging means tracing backward from the symptom to the cause.

PEP 657: Fine-Grained Error Locations (Python 3.11+)

Python's traceback system received a major upgrade with PEP 657, authored by Pablo Galindo Salgado, Batuhan Taskaya, and Ammar Askar. Introduced in Python 3.11, this PEP added column-level offset information to tracebacks, so the interpreter can point to the exact part of a line that caused the error. (Source: PEP 657, Python Enhancement Proposals.)

Before PEP 657, a line like result = (a + b) @ (c + d) would produce a vague error pointing only to the line. After PEP 657, the traceback uses caret markers to pinpoint the problem:

Traceback (most recent call last):
  File "math_ops.py", line 1, in <module>
    result = (a + b) @ (c + d)
             ~~~~~~~~^~~~~~~~~
ValueError: operands could not be broadcast together with shapes (1,2) (2,3)

Why this matters beyond convenience: in production logs, column-level precision eliminates the ambiguity that previously required developers to add temporary variables just to isolate which subexpression on a complex line failed. This is especially valuable for chained attribute access like user.profile.settings.theme, where the caret now shows you exactly which attribute was None.

Note

You can disable PEP 657's column markers for memory-constrained environments by setting the PYTHONNODEBUGRANGES environment variable or passing the -X no_debug_ranges flag. On average, enabling debug ranges adds roughly 5% to the memory used by compiled bytecode (the .pyc files), which is negligible in all but the most constrained deployments.

PEP 626: Precise Line Numbers (Python 3.10+)

PEP 626, authored by Mark Shannon and implemented in Python 3.10, ensured that line numbers reported during tracing and in tracebacks are always accurate. Before this change, certain Python constructs — multi-line statements, optimized-away code, implicit returns — could report incorrect line numbers, confusing both human readers and automated tools like profilers and coverage analyzers. PEP 626 guarantees that tracing events fire for every executed line and only for lines that are executed. (Source: PEP 626, Python Enhancement Proposals.)

A practical consequence: if you use coverage.py or similar code coverage tools, results on Python 3.10+ are meaningfully more accurate. Lines that the interpreter optimizes away no longer produce phantom "covered" or "uncovered" results, which means your coverage reports now reflect what actually ran.

Kernighan also wrote, in The Practice of Programming (with Rob Pike, Addison-Wesley, 1999), that thoughtful use of print statements remains among the most effective debugging techniques. Print-based debugging has a reputation as the amateur approach, but professionals use it constantly because it works, it is fast, and it requires no setup. The key word is judiciously. Strategic printing means asking a specific question and placing the print exactly where the answer lives.

def merge_configs(base, override):
    result = base.copy()
    for key, value in override.items():
        if isinstance(value, dict) and key in result:
            print(f"[DEBUG] Merging nested dict for key: {key}")
            print(f"[DEBUG]   base[{key}] = {result[key]}")
            print(f"[DEBUG]   override[{key}] = {value}")
            result[key] = merge_configs(result[key], value)
        else:
            result[key] = value
    return result

Notice the cognitive structure here: each print answers a specific question. "Which key is being merged?" "What was the base value?" "What is the override value?" This is hypothesis-driven printing. The undisciplined approach — scattering print("here") and print("here 2") throughout the code — produces noise without signal.

Pro Tip

Use a consistent prefix like [DEBUG] so you can grep for all debug prints and remove them cleanly: grep -rn "\[DEBUG\]" . Use print(f"x: {x!r}") — the !r format spec calls repr(), which shows strings with their quotes and None explicitly. This is the difference between seeing a blank line and seeing '' or None — a distinction that has saved countless debugging hours.

For anything beyond quick exploration, reach for the logging module. It lets you set severity levels (DEBUG, INFO, WARNING, etc.), direct output to files, and leave debug-level statements in production code without them appearing in normal output:

import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

def process_payment(amount, currency):
    logger.debug("Processing payment: amount=%s, currency=%s", amount, currency)
    if amount <= 0:
        logger.warning("Invalid payment amount: %s", amount)
        raise ValueError("Payment amount must be positive")
    # ... processing logic
    logger.info("Payment processed successfully")

A key difference between print and logging that many developers overlook: logging uses lazy string formatting. The %s placeholders in the example above are only interpolated if the message actually gets emitted. If the logging level is set to WARNING, the DEBUG message's string is never constructed at all — a meaningful performance difference in hot loops.

PDB: Python's Built-In Interactive Debugger

The pdb module has been part of Python's standard library since the early days. It is an interactive, command-line debugger that lets you pause execution, inspect variables, step through code line by line, evaluate expressions, and set conditional breakpoints.

PEP 553: The breakpoint() Built-in (Python 3.7+)

Before Python 3.7, entering the debugger required typing import pdb; pdb.set_trace() — a 27-character incantation that PEP 553's author, Barry Warsaw, noted he often mistypes, for example omitting the semicolon or typing a dot instead of an underscore. PEP 553 introduced the breakpoint() built-in function: a single, universal entry point into the debugger that can also be reconfigured to use any debugger you prefer. (Source: PEP 553, authored by Barry Warsaw, Python Enhancement Proposals.)

def find_outliers(data, threshold=2.0):
    mean = sum(data) / len(data)
    variance = sum((x - mean) ** 2 for x in data) / len(data)
    std_dev = variance ** 0.5

    breakpoint()  # Execution pauses here

    outliers = [x for x in data if abs(x - mean) > threshold * std_dev]
    return outliers

results = find_outliers([10, 12, 11, 300, 13, 9, 11, 14])

When the interpreter hits breakpoint(), you drop into the PDB prompt. The essential PDB commands are:

  • p expression — print the value of any expression
  • pp expression — pretty-print, which formats complex data structures more readably
  • n (next) — execute the current line and move to the next one
  • s (step) — step into a function call on the current line
  • c (continue) — resume normal execution until the next breakpoint
  • l (list) — show source code around the current position
  • ll (longlist) — show the full source of the current function
  • w (where) — print the full stack trace
  • u / d (up/down) — navigate up or down the call stack to inspect caller frames
  • q (quit) — exit the debugger and abort the program

You can also set conditional breakpoints directly from the PDB prompt. For example, b 7, len(data) > 100 tells PDB to only pause at line 7 if len(data) > 100 — invaluable when debugging a function that gets called thousands of times but only fails on specific inputs.

Pro Tip

In Python 3.14, PDB gained a powerful improvement: hardcoded breakpoints (via breakpoint() or set_trace()) now reuse the same Pdb instance, meaning custom settings like display expressions and commands persist across multiple breakpoints in the same session. Previously, each breakpoint() call created a fresh instance and lost your configuration.

The PYTHONBREAKPOINT Environment Variable

One of PEP 553's contributions that deserves more attention is the PYTHONBREAKPOINT environment variable. It lets you swap the debugger without touching your code:

# Use the default PDB debugger
python my_script.py

# Use IPython's richer debugger
PYTHONBREAKPOINT=IPython.core.debugger.set_trace python my_script.py

# Disable ALL breakpoints (for production/CI)
PYTHONBREAKPOINT=0 python my_script.py

The cognitive insight here is that PYTHONBREAKPOINT separates the decision to place a breakpoint (which is a code concern) from the decision about which debugger to use (which is an environment concern). This means you can commit breakpoint() calls in test harnesses without coupling your team to a specific debugger.

Pro Tip

Set PYTHONBREAKPOINT=0 in your CI/CD pipeline and deployment environment to completely disable all breakpoint() calls without modifying a single line of source code. If a stray breakpoint accidentally makes it into production, the program will not pause.

PEP 768: Remote Debugging in Python 3.14

Python 3.14, released on October 7, 2025, introduced one of the most significant debugging improvements in years: the ability to attach a debugger to a running Python process without stopping or restarting it. PEP 768, authored by Pablo Galindo Salgado, Matt Wozniski, and Ivona Stojanovic, added a zero-overhead debugging interface to CPython. (Source: PEP 768, Python Enhancement Proposals; What's New in Python 3.14, Python Documentation.)

# Attach pdb to a running Python process by PID
python -m pdb -p 12345

This connects an interactive PDB session to the live process, letting you inspect variables, set breakpoints, and examine state — all without a restart. Under the hood, this uses the new sys.remote_exec() function, which safely injects Python code into a running process at designated safe checkpoints in the interpreter loop. Note that attaching to another process typically requires elevated privileges on the relevant platform: CAP_SYS_PTRACE on Linux, administrative rights on Windows, and root or specific entitlements on macOS.

A key distinction from tools like GDB is safety: before PEP 768, third-party tools that attached to running Python processes did so by forcibly injecting code that could execute at any point, including during garbage collection or memory allocation, risking crashes or interpreter corruption. PEP 768 coordinates through designated safe evaluation points in CPython's loop, making the operation stable for production environments. Scripts injected remotely run in the target process's main thread and have full access to its state, but execution is deferred until the interpreter reaches a safe checkpoint — which means a process blocked in a long time.sleep() or network I/O call may not respond immediately.

You can also use sys.remote_exec() programmatically for more advanced scenarios, such as dynamically changing a log level, dumping internal state, or toggling feature flags in a running service without restarting it:

import sys
from tempfile import NamedTemporaryFile

# Write a debugging script to a temp file
with NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
    f.write('import logging; logging.getLogger("myapp").setLevel(logging.DEBUG)')
    script_path = f.name

# Execute it in a remote process (requires appropriate privileges)
sys.remote_exec(12345, script_path)
Note

Remote debugging can be disabled by setting PYTHON_DISABLE_REMOTE_DEBUG=1, using the -X disable-remote-debug flag, or compiling Python with --without-remote-debug. For production deployments where you want remote debugging capabilities but with controlled access, configure your container or process manager to restrict which users hold CAP_SYS_PTRACE.

Beyond PDB: Assertions and Defensive Debugging

Not all debugging happens in an interactive session. Some of the most effective debugging is preventive — writing code that catches bugs close to their origin rather than letting them propagate and surface as confusing errors three function calls later.

Assert Statements and the __debug__ Constant

Python's assert statement lets you declare an invariant — something that must be true at a given point — and raise an AssertionError immediately if it is not:

def normalize_scores(scores, max_score):
    assert max_score > 0, f"max_score must be positive, got {max_score}"
    assert all(0 <= s <= max_score for s in scores), \
        f"All scores must be between 0 and {max_score}, got {scores}"

    return [s / max_score for s in scores]

What many developers do not know is that assert statements are controlled by Python's __debug__ built-in constant, which is True under normal execution and False when the interpreter runs with the -O (optimize) flag. This means assertions compile to zero code in optimized mode — they are literally removed from the bytecode, not just skipped at runtime. You can leverage this for expensive validation checks that should run during development but have zero production overhead:

if __debug__:
    # This entire block, including the function call,
    # is removed from bytecode when Python runs with -O
    validate_graph_consistency(graph)
Warning

Never use assert for input validation in production code — use if / raise ValueError for that. Because assertions are stripped out by -O, any security or data-integrity checks that rely on assert will silently vanish in optimized deployments. Reserve assertions for internal consistency checks during development and testing.

Type Checking as a Debugging Strategy

Python's type hints, combined with tools like mypy, can catch entire categories of bugs before your code ever runs:

from typing import Optional

def get_user_email(user_id: int) -> Optional[str]:
    """Returns the user's email, or None if not found."""
    ...

def send_notification(email: str, message: str) -> None:
    """Sends a notification. Email must not be None."""
    ...

# mypy will flag this: Argument 1 has incompatible type "Optional[str]";
# expected "str"
email = get_user_email(42)
send_notification(email, "Hello!")  # Bug caught before runtime

Running mypy across your codebase is like running a static debugger — it identifies potential None dereferences, type mismatches, and incorrect function signatures without executing a single line. For teams that adopt strict type checking, entire classes of runtime errors simply disappear. The --strict flag enables all optional checks and is worth adopting on new projects from day one.

Debugging Techniques for Common Bug Patterns

The Off-By-One Error

Off-by-one errors are arguably the most common bug in all of programming. In Python, they often appear in slicing and range operations:

# Bug: processes items 0 through n-2, missing the last item
def process_batch(items, batch_size):
    results = []
    for i in range(0, len(items) - batch_size, batch_size):
        batch = items[i:i + batch_size]
        results.append(process(batch))
    return results

The fix requires understanding that range(0, len(items), batch_size) gives you the right starting indices, and that the slice items[i:i + batch_size] naturally handles the last partial batch because Python slicing does not raise an error when the end index exceeds the sequence length. This behavior is by design — "abc"[0:100] returns "abc", not an error — and it is one of the many ways Python's semantics help you avoid fencepost errors when you understand them.

The Mutable Default Argument

This is Python's most widely documented gotcha, and it bites even experienced developers:

def add_item(item, items=[]):
    items.append(item)
    return items

print(add_item("a"))  # ['a']
print(add_item("b"))  # ['a', 'b'] --- Wait, what?

The default list is created once when the function is defined, not each time it is called. Every call that uses the default shares the same list object. This happens because default argument values are evaluated at function definition time and stored as an attribute of the function object — you can actually inspect the current state of the default via add_item.__defaults__. The fix:

def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

Silent Failures From Bare except

Catching all exceptions without discrimination is one of the fastest ways to create bugs that are nearly impossible to find:

# This will silently swallow keyboard interrupts, system exits,
# and every other exception --- a debugging nightmare
try:
    result = fragile_operation()
except:
    result = default_value

Always catch specific exception types. If you genuinely need a catch-all, use except Exception (which excludes SystemExit, KeyboardInterrupt, and GeneratorExit) and log the error:

try:
    result = fragile_operation()
except Exception as e:
    logger.exception("fragile_operation failed: %s", e)
    result = default_value

The logger.exception() method automatically includes the full traceback in the log output, giving you the information you need when something goes wrong in production. This is strictly superior to logger.error() in exception handlers because it preserves the traceback without requiring you to manually format it.

Variable Shadowing and Scope Confusion

A less commonly discussed pattern that causes persistent debugging confusion is variable shadowing — when a name in an inner scope hides a name from an outer scope:

data = [1, 2, 3, 4, 5]

def compute_stats(data):
    # 'data' here shadows the module-level 'data'
    total = sum(data)
    mean = total / len(data)

    # Bug: list comprehension variable 'x' is fine,
    # but what about 'data' inside a nested function?
    def filtered():
        return [x for x in data if x > mean]  # Which 'data'?

    return mean, filtered()

Python's LEGB rule (Local, Enclosing, Global, Built-in) governs name resolution, but when debugging, knowing the rule and correctly predicting which binding a name resolves to are different skills. The locals() and globals() built-ins are your friends here: at any breakpoint, p locals() shows you exactly what names are in scope and what they point to.

Debugging Async and Concurrent Code

Asynchronous Python code has its own failure modes that synchronous debugging tools don't fully address. When a coroutine fails, the traceback you see only shows the chain of awaits leading to that failure — not the full history of how the task got scheduled. This makes bugs harder to localize than in synchronous code, because the call that created the task and the call that failed in the task can appear completely disconnected in the traceback.

Asyncio's Built-In Debug Mode

Python's asyncio module has a built-in debug mode that enables more aggressive checking and logging. You activate it by setting the PYTHONASYNCIODEBUG environment variable or programmatically:

import asyncio

async def main():
    await asyncio.sleep(0)

# Enable debug mode
asyncio.run(main(), debug=True)

In debug mode, asyncio logs a warning whenever a coroutine takes longer than 100ms to execute a single step (the default slow callback threshold, configurable via loop.slow_callback_duration), logs all exceptions in tasks that are garbage-collected without being awaited, and raises errors for common mistakes like calling coroutines without await. This last check catches one of async's most silent failures: a forgotten await returns a coroutine object instead of executing it, producing no error, just silently doing nothing.

Tracking Down Forgotten Awaits

A forgotten await is among the hardest async bugs to spot visually because the code looks almost correct:

import asyncio

async def fetch_data():
    await asyncio.sleep(1)  # simulates I/O
    return {"status": "ok"}

async def process():
    # Bug: missing await — result is a coroutine object, not a dict
    result = fetch_data()
    print(result.get("status"))  # AttributeError: 'coroutine' object has no attribute 'get'

asyncio.run(process())

With asyncio debug mode active, Python also emits a RuntimeWarning: coroutine 'fetch_data' was never awaited when the coroutine object is garbage collected. Without debug mode, the warning may never appear if the coroutine gets collected silently. Running your test suite with PYTHONASYNCIODEBUG=1 in CI is a practical safeguard for this entire class of bug.

Python 3.14's Asyncio Introspection: ps, pstree, and Call Graph Utilities

Python 3.14 introduced powerful asyncio introspection capabilities that go far beyond what was previously available. The new command-line tools python -m asyncio ps PID and python -m asyncio pstree PID let you inspect a live async process from the outside, without modifying its code. (Source: What's New in Python 3.14, Python Documentation.)

The ps subcommand produces a flat table listing all current asyncio tasks, their names, their coroutine stacks, and which tasks are awaiting them. The pstree subcommand renders the same information as a visual hierarchical tree showing how coroutines relate to each other. These commands use PEP 768's remote execution under the hood, meaning they work on running processes without stopping them.

# View all asyncio tasks in a running process
python -m asyncio ps 12345

# View the async call tree (hierarchical)
python -m asyncio pstree 12345

For programmatic introspection within your own code, Python 3.14 also added asyncio.capture_call_graph() and asyncio.print_call_graph(), which trace the full async call graph of a running coroutine or task. These functions show you the actual await chain — not just where you are, but who is waiting for you, and who you are waiting for:

import asyncio

async def inner():
    # Print the full async call graph from here
    asyncio.print_call_graph()
    await asyncio.sleep(1)

async def outer():
    async with asyncio.TaskGroup() as tg:
        tg.create_task(inner())

asyncio.run(outer())

This is a game-changer for debugging deadlocks and stuck tasks in production async services, because you can now see the complete picture of task dependencies without inserting logging.

Race Conditions and Non-Deterministic Bugs

Race conditions in Python async code typically arise from shared mutable state accessed across await points. Unlike threads, async code running in a single thread is cooperative — but any await is a potential yield point where another coroutine can run and mutate shared state. The debugging strategy is to make your await points explicit and to add assertions on the state of shared data structures immediately before and after each yield point while diagnosing the bug. If the assertion fires, you have found the mutation window. The fix is usually a lock (asyncio.Lock) or restructuring to eliminate the shared state.

Faulthandler: Debugging Crashes and Hangs

Python's faulthandler module, available since Python 3.3, is an underused tool that provides two capabilities no other standard library module offers: dumping Python tracebacks on segmentation faults (and other fatal signals), and dumping tracebacks on demand for hung or unresponsive processes.

Enable it at startup to get meaningful traceback output even when your program crashes at the C level:

# Enable via environment variable (recommended for all development):
# PYTHONFAULTHANDLER=1 python my_script.py

# Or programmatically:
import faulthandler
faulthandler.enable()

For debugging hung processes, faulthandler can dump a traceback for all threads after a timeout, and repeat the dump at a specified interval:

import faulthandler
import sys

# Dump all thread tracebacks every 30 seconds to stderr
faulthandler.dump_traceback_later(30, repeat=True, file=sys.stderr)

On Unix systems, you can also register a signal handler that dumps tracebacks on demand by sending a signal to the process:

import faulthandler
import signal

# Dump tracebacks when the process receives SIGUSR1
faulthandler.register(signal.SIGUSR1)

# Then from another terminal: kill -USR1 <pid>

This is invaluable for production processes that occasionally become unresponsive: send the signal, get a full picture of what every thread is doing at that moment, and diagnose the deadlock or infinite loop without restarting.

Using traceback and sys for Programmatic Debugging

Sometimes you need to capture and inspect error information programmatically rather than interactively. Python's traceback module and sys.exc_info() give you the tools:

import traceback
import sys

def risky_operation():
    return 1 / 0

try:
    risky_operation()
except ZeroDivisionError:
    tb_str = traceback.format_exc()
    print(f"Captured traceback:\n{tb_str}")

    exc_type, exc_value, exc_tb = sys.exc_info()
    print(f"Exception type: {exc_type.__name__}")
    print(f"Exception value: {exc_value}")
    print(f"Exception occurred at line: {exc_tb.tb_lineno}")

PEP 3134, authored by Ka-Ping Yee, added exception chaining to Python 3, which ensures that when one exception triggers another, both are preserved in the traceback. This is invaluable for debugging complex error handling: (Source: PEP 3134, Python Enhancement Proposals.)

try:
    config = load_config("settings.json")
except FileNotFoundError as e:
    raise RuntimeError("Cannot start without configuration") from e

The resulting traceback shows both exceptions with a clear "The above exception was the direct cause of the following exception" message, so you can trace the full chain of failures. Without the from e clause, Python still chains exceptions implicitly, but the message reads "During handling of the above exception, another exception occurred" — a subtle but important distinction for debugging, because it tells you whether the chaining was deliberate (explicit from) or accidental (the handler itself raised an error).

Third-Party Debugging Tools Worth Knowing

While pdb is always available, several third-party tools extend it significantly. ipdb drops you into an IPython session instead of the bare PDB prompt, giving you tab completion, syntax highlighting, and better introspection. With the PYTHONBREAKPOINT variable from PEP 553, switching is effortless: PYTHONBREAKPOINT=ipdb.set_trace python my_script.py.

pudb provides a full-screen, terminal-based visual debugger reminiscent of Turbo Pascal's debugging interface, showing your source code, variables, stack, and breakpoints all in one view. py-spy is a sampling profiler that can attach to running Python processes without modifying them, generating flame graphs that show you where your program spends its time — invaluable for diagnosing performance bugs. It works without requiring any changes to the target process and without the safety guarantees of PEP 768, making it suitable for quick profiling but not for modifying live state.

debugpy is the debug adapter used by VS Code's Python extension and implements the Debug Adapter Protocol (DAP), allowing any DAP-compatible editor to attach to a Python process for interactive debugging. For environments where a terminal debugger is inconvenient, debugpy lets you set breakpoints, inspect variables, and step through code from within your editor's GUI. Memray, developed by Bloomberg, is a memory profiler that tracks every allocation and deallocation in a Python process and can generate flame graphs, tree views, and live dashboards of memory usage — a more targeted alternative to tracemalloc when you need allocation-level precision. (Source: bloomberg/memray, GitHub.)

icecream (ic()) is worth special mention: it is a drop-in replacement for debug print() that automatically prints the expression and its value, the filename, line number, and parent function. It removes the tedium of manually formatting debug output and is trivially disabled for production via ic.disable():

from icecream import ic

def compute(x, y):
    result = x * y + 1
    ic(result)  # Prints: ic| result: 43
    return result

Debugging in Production and Containers

The rules change fundamentally when the bug lives in production. You typically cannot attach an interactive debugger, cannot easily reproduce the environment, and must not degrade performance for real users while investigating. The failures that are hardest to diagnose in production stem from insufficient observability built in during development — not from lacking the right tool at the moment of crisis.

Structured Logging as a First-Class Debugging Interface

The single highest-leverage thing you can do before a bug appears in production is to structure your logs. Plain-text log messages are difficult to filter, aggregate, or query at scale. Structured logging emits JSON (or another parseable format) that log aggregation platforms like Datadog, Splunk, or the ELK stack can index and query:

import logging
import json

class StructuredFormatter(logging.Formatter):
    def format(self, record):
        log_data = {
            "level": record.levelname,
            "message": record.getMessage(),
            "module": record.module,
            "line": record.lineno,
        }
        if record.exc_info:
            log_data["exception"] = self.formatException(record.exc_info)
        return json.dumps(log_data)

handler = logging.StreamHandler()
handler.setFormatter(StructuredFormatter())
logger = logging.getLogger("myapp")
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)

With structured logs in place, hunting for a specific bug becomes a query rather than a grep. You can filter by module, join on request_id, and see every log line from a single failing request in chronological order. This is the production equivalent of a step-through debugger session. For real-world systems, add a correlation ID (such as a request ID or trace ID) as a field in every log message — it transforms your logs from a firehose into a thread you can follow.

Debugging Inside Docker Containers

Containers add an isolation layer that breaks the naive assumption that you can simply attach pdb to a process. A few patterns that work well in containerized environments:

Remote PDB over a socket. The remote-pdb package wraps PDB to listen on a TCP port rather than the terminal. Set it up in your container, expose the port in your docker-compose.yml, and connect from your host with netcat or telnet. Because PEP 553 makes PYTHONBREAKPOINT configurable, you can switch to remote PDB without modifying source code: PYTHONBREAKPOINT=remote_pdb.set_trace.

PEP 768 remote exec across container boundaries. In Python 3.14, sys.remote_exec() requires the caller to share the same process namespace as the target. Inside a Docker container, you can docker exec -it <container> python3.14 -m pdb -p <pid> to attach directly, as long as the container was started with --cap-add=SYS_PTRACE or the equivalent security context in Kubernetes (securityContext.capabilities.add: ["SYS_PTRACE"] in the pod spec).

Core dumps as a last resort. If a Python process is crashing with a segmentation fault (usually a C extension issue, not pure Python), you can configure the container to write core dumps and analyze them with gdb and the libpython GDB extension, which exposes Python frames within a C-level core dump. Combine this with faulthandler.enable() to get a Python-level traceback printed to stderr before the core dump is written.

Debugging Memory Leaks

Memory leaks in Python are often not "leaks" in the C sense — the garbage collector handles reference cycles — but they manifest as unbounded growth in long-running processes because objects are being accumulated in some container (a cache, a list, a dictionary) that is never pruned. The standard approach is to profile memory at two points in time and diff the result:

import tracemalloc

tracemalloc.start()

# ... code that may be accumulating memory ...

snapshot1 = tracemalloc.take_snapshot()

# ... run the suspicious operation N more times ...

snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')

for stat in top_stats[:10]:
    print(stat)

tracemalloc, introduced in Python 3.4, is part of the standard library and has zero overhead when not active. In production, you can conditionally enable it via an environment variable or a signal handler — send SIGUSR1 to the process to start tracing, SIGUSR2 to dump a snapshot. Combined with PEP 768's sys.remote_exec(), you can trigger a tracemalloc snapshot in a live process without modifying or restarting it. The objgraph third-party library offers a complementary view: it can draw reference graphs showing what is holding a set of objects alive, which is particularly useful for diagnosing reference cycles that the garbage collector misses due to custom __del__ methods.

A Debugging Mindset

Effective debuggers form hypotheses, test them systematically, and narrow the problem space with each test. They resist the temptation to change code randomly and hope for the best. Guido van Rossum has characterized debugging as a process of discovering where reality diverges from the developer's mental model — a gap that can be a single misplaced character or a fundamental algorithmic misunderstanding.

There is a cognitive pattern behind chronic debugging difficulty that is worth naming: assumption debt. Every time you wrote a line of code with an assumption you didn't verify or assert — that a value would always be a non-empty list, that a dictionary key would always exist, that a function would always return a string — you accumulated assumption debt. Bugs don't find the code that's wrong; they find the gap between what you assumed and what actually happened. The debugging mindset is, in part, the discipline of surfacing and paying down that debt systematically.

Here is a practical debugging workflow that works for the majority of problems:

  1. Reproduce the bug reliably. If you cannot trigger it on demand, you cannot verify that your fix works. If reproducibility is difficult, narrow the conditions: what inputs, what timing, what state?
  2. Read the full error message and traceback. The answer is often right there. PEP 657's column markers (Python 3.11+) often point directly at the failing subexpression.
  3. Identify your assumptions. Before forming a hypothesis, list everything you assumed about the values and state at the failure point. One of those assumptions is wrong.
  4. Form a hypothesis. "I think the bug is that user_data is None when it reaches line 42."
  5. Test the hypothesis. Add a breakpoint(), a print(), or an assert at the relevant location.
  6. Use binary search for large codebases. If you have no hypothesis, bisect the code path: add an assertion at the midpoint between where you know the state is correct and where you know it is wrong. This halves your search space with each test.
  7. Fix the root cause, not the symptom. If user_data is None, find why it is None.
  8. Write a test that fails before the fix and passes after. This prevents the bug from returning.
Kernighan and Pike describe a technique of explaining code to someone — even a non-programmer — as a way to surface bugs. One computer center reportedly placed a teddy bear near its help desk, requiring students to explain their problem to the bear before talking to staff. — Adapted from Brian Kernighan & Rob Pike, The Practice of Programming (Addison-Wesley, 1999)

This technique, widely known as "rubber duck debugging," works because the act of formulating a precise explanation of the code's behavior forces you to close the gap between your mental model and what the code actually does. No tools required. The reason it works is neurological: verbalizing forces sequential, explicit processing rather than the parallel, pattern-matching mode your brain defaults to when reading code silently.

When Not to Reach for the Debugger

There is a version of debugging that gets omitted in guides: knowing when the interactive debugger is not the right tool. A step-through session is expensive — it requires reproducibility, a specific environment, and sustained attention. For bugs that are intermittent, environment-specific, or that only appear under load, the better investment is improving your observability infrastructure: add structured logging at decision points, add assertions at invariant boundaries, and make your application tell you what it is doing continuously, rather than waiting until it fails to find out.

Write the test first. The test forces you to state the expected behavior precisely, and that precision often reveals the bug before you ever run the debugger. If you already have the test and it is failing, you do not need to step through the entire program — you need to find the first assertion that contradicts your assumption. That is a search problem, and hypothesis-driven elimination is faster than interactive stepping through 500 lines.

Conclusion

Python's debugging ecosystem has evolved substantially through deliberate language-level improvements. PEP 553 (Barry Warsaw) gave us the clean breakpoint() built-in. PEP 626 (Mark Shannon) ensured line numbers are always accurate. PEP 657 (Pablo Galindo Salgado, Batuhan Taskaya, Ammar Askar) added column-level precision to tracebacks. PEP 3134 (Ka-Ping Yee) made exception chaining a first-class part of the language. PEP 768 (Pablo Galindo Salgado, Matt Wozniski, Ivona Stojanovic), shipped in Python 3.14 (October 2025), introduced zero-overhead remote debugging for live processes. Python 3.14 also added asyncio ps and pstree for external async task inspection and capture_call_graph() for programmatic async introspection. Each of these addressed a real pain point that Python developers encounter daily.

The arc of these improvements points in a consistent direction: the interpreter is getting better at telling you what went wrong, where, and why — without you having to stop the world to find out. But no tool replaces the discipline of thinking clearly about your code. Write readable functions. Use meaningful variable names. Add assertions that document your assumptions. Handle errors explicitly. Structure your logs before you need them. Use faulthandler so crashes produce useful output. And when a bug does appear — because they always do — approach it as a search problem with a knowable answer, not a mystery to endure. The bugs are in there. Python gives you excellent tools for the hunt, and now, for the first time, it can tell you what is happening while the program is still running.

back to articles