Code review is one of the highest-leverage activities in software development. In Python projects, a well-structured review process catches bugs before they reach production, enforces consistency across the codebase, and helps every developer on the team write better code over time. This guide walks through how to build a Python code review process that actually works -- covering what to look for manually, which tools to automate, and how to structure the whole thing so it scales with your team.
Whether you are reviewing pull requests on a team or auditing your own side project before pushing to production, code review is where quality gets baked in. Python's flexibility -- dynamic typing, duck typing, multiple ways to solve the same problem -- makes reviews especially important. Without them, a Python codebase can drift toward inconsistency fast. The good news is that the Python ecosystem has matured significantly, and the current generation of tools makes it possible to automate large portions of the review process while reserving human attention for the things that actually require judgment.
Why Python Code Review Matters
A code review is not just a gate that PRs have to pass through. When done well, it serves several purposes at once. First, it catches defects. A second set of eyes will spot logic errors, off-by-one mistakes, and unhandled edge cases that the original author missed. Second, it enforces consistency. When everyone on the team follows the same conventions for naming, structure, and error handling, the codebase becomes significantly easier to navigate and maintain. Third, it transfers knowledge. Reviewing code written by a colleague is one of the fastest ways to learn new patterns, libraries, and techniques.
Without a shared set of review criteria, though, the process tends to devolve into subjective arguments about style preferences. The reviewer blocks the PR because they prefer single quotes over double quotes. The author pushes back. Neither side is wrong, but the team just burned 30 minutes on something a formatter could have handled in milliseconds. A structured review process separates the things that tools should enforce from the things that require human judgment, and that distinction is what makes the whole system sustainable.
If your team does not yet have a shared style guide, PEP 8 is the standard starting point. It defines conventions for indentation (4 spaces), line length (79 characters, though many teams extend this to 88 or 120), naming conventions, and import ordering. Adopting PEP 8 as a baseline removes an entire class of subjective debates from your review process.
What to Look for in a Manual Review
Automated tools handle formatting, unused imports, and basic error patterns extremely well. But there is a whole category of issues that only a human reviewer can catch. These are the things to focus your attention on during a manual review.
Correctness and Logic
Does the code actually do what it claims to do? Read through the logic path and check for edge cases. What happens when the input list is empty? What happens when the API returns a 500? What if the user passes None where a string is expected? These are the questions that a linter cannot answer for you.
# Before review: handles the happy path only
def get_average(numbers):
return sum(numbers) / len(numbers)
# After review: handles the edge case
def get_average(numbers: list[float]) -> float | None:
if not numbers:
return None
return sum(numbers) / len(numbers)
Pythonic Idioms
Python has a strong culture of idiomatic code. Reviewers should look for opportunities to replace verbose patterns with cleaner Pythonic alternatives. This includes using list comprehensions instead of manual for loops that build lists, using with statements for resource management, leveraging enumerate() instead of manual index tracking, and using dict.get() with a default instead of checking membership first.
# Non-Pythonic
result = []
for item in data:
if item.is_valid():
result.append(item.name)
# Pythonic
result = [item.name for item in data if item.is_valid()]
Error Handling
Watch for bare except clauses that swallow all exceptions silently. Every try/except block should catch specific exception types and handle them meaningfully. Logging the error, re-raising it, or returning a clear error state are all valid approaches. Silently passing on a broad Exception is almost never the right call.
# Dangerous: hides bugs
try:
process_data(payload)
except:
pass
# Better: catch specific exceptions, handle them explicitly
try:
process_data(payload)
except ValidationError as e:
logger.warning("Invalid payload: %s", e)
raise
except ConnectionError as e:
logger.error("Service unreachable: %s", e)
return None
Naming and Readability
Variable and function names should describe their purpose clearly enough that a reader does not need to trace through the implementation to understand what something does. Single-letter variables are fine in tight loops or mathematical code, but d as a name for a database connection or x for a user record is a readability problem. Functions should do one thing and their name should reflect what that one thing is.
Duplication
Check whether the PR introduces logic that already exists elsewhere in the codebase. Duplicated code is a maintenance burden because changes need to be replicated across every copy. If you spot duplication during review, suggest extracting the shared logic into a utility function or module.
Keep your review comments constructive and specific. Instead of "this is wrong," try "this will raise a ZeroDivisionError when the list is empty -- consider adding a guard clause." Specific feedback is easier to act on and creates a more productive review culture.
Automate with Linting, Formatting, and Type Checking
The single biggest improvement you can make to your review process is automating the things that do not require human judgment. Formatting, import ordering, unused variable detection, and basic error patterns should all be handled by tools before a reviewer ever looks at the code.
Ruff: The Modern All-in-One Tool
Ruff has rapidly become the standard linter and formatter for Python projects. Written in Rust, it replaces what used to require a combination of Flake8, Black, isort, pyupgrade, and several other tools. It runs in milliseconds even on large codebases, which makes it practical to use as a pre-commit hook without slowing down development.
Ruff supports over 800 lint rules and maintains compatibility with Flake8 rule sets. It can also auto-fix many issues, including removing unused imports, upgrading old syntax patterns, and reordering imports. Its formatter produces output that is nearly identical to Black, making migration straightforward for teams already using Black.
# Install Ruff
pip install ruff
# Run the linter on your project
ruff check .
# Auto-fix what it can
ruff check --fix .
# Format all files (replaces Black)
ruff format .
A typical pyproject.toml configuration for Ruff looks like this:
[tool.ruff]
line-length = 88
target-version = "py312"
[tool.ruff.lint]
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # pyflakes
"I", # isort
"B", # flake8-bugbear
"C4", # flake8-comprehensions
"UP", # pyupgrade
"SIM", # flake8-simplify
]
[tool.ruff.format]
quote-style = "double"
indent-style = "space"
Type Checking with mypy
Python's dynamic typing is powerful but it creates a class of bugs that only appear at runtime. Type hints combined with a static type checker like mypy catch these errors before the code ever runs. During review, look for functions that accept or return ambiguous types without annotations. Adding type hints is not just about catching bugs -- it also serves as documentation, making function signatures self-describing.
# Without type hints: what does this return?
def fetch_user(user_id):
...
# With type hints: intent is clear
def fetch_user(user_id: int) -> User | None:
...
# Install and run mypy
pip install mypy
mypy your_project/
Astral, the company behind Ruff, has released ty -- a Rust-based type checker designed as a faster alternative to mypy and Pyright. It is still in early stages, but worth watching if you are already using Ruff and want to keep your entire toolchain in the Astral ecosystem.
Security-Focused Review with Bandit
Code review should include a security lens, and Bandit is the go-to static analysis tool for finding common security issues in Python code. Maintained by the Python Code Quality Authority (PyCQA), Bandit parses Python files into Abstract Syntax Trees and runs security-focused checks against the AST nodes. It ships with 47 built-in checks covering injection vulnerabilities, weak cryptographic practices, hardcoded credentials, insecure use of temporary files, and more.
# Install Bandit
pip install bandit
# Scan your project
bandit -r your_project/
# Output as JSON for CI integration
bandit -r your_project/ -f json -o bandit_results.json
Here are examples of the kinds of issues Bandit flags:
import subprocess
# Bandit flags this: shell=True with user input is a
# command injection risk (B602)
subprocess.call(user_input, shell=True)
# Safer alternative
subprocess.call(["ls", "-la"], shell=False)
# Bandit flags this: use of insecure MD5 hash (B303)
import hashlib
hashlib.md5(password.encode())
# Safer alternative
hashlib.sha256(password.encode())
Bandit catches common patterns through AST analysis, but it does not perform data flow or taint analysis. Complex vulnerabilities that span multiple function calls or modules may require additional tools like Semgrep or a commercial SAST platform. Use Bandit as your first line of defense, not your only one.
Building a Review Checklist Your Team Can Use
A review checklist turns ad-hoc feedback into a repeatable process. The key is organizing items by priority so reviewers know which issues are blockers and which are suggestions. Here is a practical checklist organized into three tiers.
Critical (Must Fix Before Merge)
- Security vulnerabilities: No hardcoded secrets, no shell injection, no use of deprecated cryptographic functions. Bandit should pass cleanly at high severity.
- Correctness: The code does what the PR description says it does. Edge cases are handled. Error paths do not fail silently.
- Test coverage: New functionality has corresponding unit tests. Test coverage should not decrease with the PR. The
unittestorpytestsuite passes. - No broken imports or runtime errors: The linter passes. Type checking passes with no new errors.
High Priority (Should Fix)
- PEP 8 compliance: Code follows the project's style guide. Ruff should pass with no errors.
- Type annotations: Public functions have type hints. Return types are specified. Complex data structures use proper typing.
- Documentation: Public APIs have docstrings. Complex logic has inline comments explaining the "why," not the "what."
- No code duplication: Shared logic is extracted into reusable functions or modules.
Medium Priority (Recommended Improvements)
- Pythonic idioms: List comprehensions, context managers, and generator expressions are used where appropriate.
- Performance: No obvious N+1 queries, unnecessary loops over large datasets, or blocking calls in async code.
- Naming clarity: Variables and functions have descriptive names. Abbreviations are avoided unless they are well-established conventions.
- Commit hygiene: Commits are focused and descriptive. No large unfocused commits that mix unrelated changes.
Putting It All Together with Pre-Commit Hooks
The pre-commit framework lets you run your entire automated toolchain before code is even committed. This means formatting, linting, type checking, and security scanning all happen automatically, and only the issues that require human judgment make it to the PR review stage.
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.9.10
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.15.0
hooks:
- id: mypy
additional_dependencies: [types-requests]
- repo: https://github.com/PyCQA/bandit
rev: 1.9.3
hooks:
- id: bandit
args: ["-c", "pyproject.toml"]
additional_dependencies: ["bandit[toml]"]
# Install pre-commit and set up hooks
pip install pre-commit
pre-commit install
# Now every git commit will automatically run
# Ruff (lint + format), mypy, and Bandit
With this setup, the automated tools handle formatting disputes, catch unused imports, flag type errors, and identify security anti-patterns -- all before the code reaches a human reviewer. The reviewer can then focus entirely on logic, architecture, edge cases, and whether the approach makes sense for the project.
Add these same checks to your CI/CD pipeline as well. Pre-commit hooks are great for catching issues locally, but they can be bypassed with --no-verify. Running the same checks in CI ensures that nothing slips through even if a developer skips the local hooks.
Key Takeaways
- Separate human work from machine work: Automate formatting, linting, type checking, and security scanning so reviewers can focus on logic, architecture, and edge cases.
- Use Ruff as your all-in-one linter and formatter: It replaces Flake8, Black, isort, and several other tools in a single Rust-based binary that runs in milliseconds.
- Add type checking with mypy: Type hints catch an entire class of runtime bugs at analysis time and make function signatures self-documenting.
- Scan for security issues with Bandit: Its 47 built-in checks cover injection flaws, weak cryptography, hardcoded secrets, and other common Python security anti-patterns.
- Build a prioritized review checklist: Organize review criteria into critical blockers, high-priority items, and recommended improvements so the team has shared, objective pass/fail criteria.
- Enforce everything with pre-commit hooks and CI: Automate the toolchain so that every commit and every pull request is checked consistently, without relying on reviewers to catch mechanical issues.
A solid Python code review process is not about slowing development down. It is about catching problems early, keeping the codebase consistent, and making sure every developer on the team is learning from each other's work. Start with the tools, build the checklist, and iterate from there. The return on investment compounds with every PR.