If you have spent any time writing Python, you have almost certainly run into a situation where you wrote the same chunk of code twice. Maybe you calculated a total in one part of your script, then needed the exact same logic somewhere else, so you copied it. Then you found a bug, fixed it in one place, forgot to fix it in the other, and spent twenty minutes wondering why your output was wrong.
That is precisely the problem functions solve. They are one of the most foundational concepts in all of programming, and Python makes them genuinely easy to work with. This article covers what functions are, how to define them, why they matter far beyond just "avoiding copy-paste," and how they show up in real-world Python code across industries.
What Is a Function, Really?
A function is a named, reusable block of code that performs a specific task. You define it once, and then you can call it as many times as you need, from anywhere in your program, without rewriting the logic.
Python uses the def keyword to introduce a function definition. The basic syntax looks like this:
def function_name(parameters):
"""Optional docstring explaining what the function does."""
# code goes here
return value # optional
The def keyword signals to the Python interpreter that what follows is a function definition, not just a sequence of statements. The function name follows the same naming rules as variables. Per PEP 8 — Python's official style guide, authored by Guido van Rossum, Barry Warsaw, and Alyssa Coghlan — function names should be lowercase with words separated by underscores (snake_case), and descriptive enough that someone reading your code can understand what the function does without reading every line inside it.
"Function names should be lowercase, with words separated by underscores as necessary to improve readability." — PEP 8, Function and Variable Names
The Fibonacci Example: Breaking It Down
The classic introductory example in Python's own documentation uses a Fibonacci series generator, and it is worth examining carefully because it demonstrates several important ideas at once.
def fib(n): # write Fibonacci series less than n
"""Print a Fibonacci series less than n."""
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
print()
When you call fib(2000), Python prints:
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
Let's walk through exactly what is happening here.
The function signature is def fib(n):. The name is fib, and it accepts one parameter, n, which represents the upper boundary. Any number in the Fibonacci sequence that reaches or exceeds n will not be printed.
The docstring is the triple-quoted string immediately inside the function: """Print a Fibonacci series less than n.""". This is not just a comment — Python stores it as a formal attribute of the function object. You can retrieve it by calling help(fib) or accessing fib.__doc__. Docstring conventions are governed by PEP 257, which states that every public function, method, and module should have one. Every function you write for a real project should have one.
The initialization a, b = 0, 1 uses Python's tuple unpacking to assign both variables in a single line. a starts at 0 (the first Fibonacci number) and b starts at 1 (the second).
The while loop runs as long as a is less than n. On each pass, it prints the current value of a using end=' ' to keep all the numbers on one line separated by spaces rather than printing each on its own line.
The update step a, b = b, a+b is elegant and worth pausing on. On the right side, Python evaluates both b and a+b before any assignment happens. So the new value of a becomes the old b, and the new value of b becomes the old sum. This is Python's simultaneous assignment at work, and it eliminates the need for a temporary variable.
The final print() with no arguments simply outputs a newline at the end, so the terminal cursor moves to a new line after all the numbers.
Simultaneous assignment like a, b = b, a+b is one of Python's most useful features. It evaluates the entire right side before making any assignments, so you never need a temporary variable to swap or update values in tandem.
Why def Matters: The Mechanics Behind the Keyword
When Python encounters a def statement, it does not execute the code inside the function. Instead, it creates a function object and assigns it to the name you provided. This is worth understanding because it means functions in Python are first-class objects — they can be stored in variables, passed as arguments to other functions, and returned from functions.
"Functions in Python are first-class objects. Programming language researchers define a 'first-class object' as a program entity that can be created at runtime, assigned to a variable or element in a data structure, passed as an argument to a function, and returned as the result of a function." — Luciano Ramalho, Fluent Python, 2nd Edition, Chapter 7
fib(2000) # calls the function
print(fib) # prints something like: <function fib at 0x...>
my_func = fib # assigns the function object to a new name
my_func(100) # calls the same function through the new name
This behavior is what makes Python so powerful for patterns like callbacks, decorators, and higher-order functions. Even if you are just getting started, knowing that def creates an object — not just a chunk of code — gives you a more accurate mental model of how Python works.
It is also worth noting that the Python tutorial itself confirms this directly: when you type fib at the interpreter without parentheses, Python shows you the function object, not an error. Assigning it to another name does not copy the function — both names point to the same object in memory.
Parameters vs. Arguments: A Distinction Worth Making
The terms "parameter" and "argument" are often used interchangeably in casual conversation, but they mean different things.
A parameter is the variable listed in the function definition. In def fib(n):, the parameter is n.
An argument is the actual value you pass when calling the function. In fib(2000), the argument is 2000.
Python supports several kinds of parameters:
Positional parameters are the most common. They are matched by position: the first argument goes to the first parameter, the second to the second, and so on.
def greet(name, greeting):
print(f"{greeting}, {name}!")
greet("Alice", "Hello") # Hello, Alice!
Default parameters have a fallback value that is used when no argument is supplied.
def greet(name, greeting="Hello"):
print(f"{greeting}, {name}!")
greet("Alice") # Hello, Alice!
greet("Alice", "Hey") # Hey, Alice!
Keyword arguments let you specify which parameter you are targeting by name, regardless of order.
greet(greeting="Good morning", name="Alice")
Understanding these distinctions becomes critical once your functions grow beyond a single parameter.
*args and **kwargs: Flexible Signatures
Python provides two special parameter forms for writing functions that accept a variable number of inputs. These show up constantly in real-world code, and not knowing them means you will misread a significant portion of Python libraries.
*args collects any number of positional arguments into a tuple. The asterisk is the actual syntax; args is just a naming convention.
def total(*args):
"""Return the sum of any number of arguments."""
return sum(args)
print(total(1, 2, 3)) # 6
print(total(10, 20, 30, 40)) # 100
print(total()) # 0
**kwargs collects any number of keyword arguments into a dictionary. The double asterisk is the syntax; kwargs is the convention.
def describe_host(**kwargs):
"""Print key-value pairs about a network host."""
for key, value in kwargs.items():
print(f" {key}: {value}")
describe_host(ip="192.168.1.10", os="Linux", open_ports="22,80,443")
When combining parameter types, the required order is: positional parameters, then *args, then keyword-only parameters, then **kwargs. Getting this order wrong raises a SyntaxError.
def log_event(severity, *messages, source="unknown", **metadata):
"""Log messages with optional source and arbitrary metadata."""
print(f"[{severity.upper()}] from {source}")
for msg in messages:
print(f" - {msg}")
for k, v in metadata.items():
print(f" {k} = {v}")
log_event("warning", "Disk at 90%", "CPU spike detected",
source="server-01", datacenter="us-east")
You will see *args and **kwargs in the signatures of many standard library functions and popular frameworks. Recognizing them immediately tells you the function is designed to be flexible. When you see def __init__(self, *args, **kwargs) in library code, it almost always means the class is designed to forward arguments to a parent class via super().__init__(*args, **kwargs).
The Return Statement and Its Absence
The fib function above does not use a return statement. It prints directly. That works fine for a demonstration, but in production code you almost always want your functions to return values rather than print them. Here is why.
A function that prints its output is only useful for displaying something. A function that returns its output can be used in calculations, passed to other functions, stored in a variable, or written to a file.
Here is a version of fib that returns a list instead of printing:
def fib_list(n):
"""Return a list containing the Fibonacci series less than n."""
result = []
a, b = 0, 1
while a < n:
result.append(a)
a, b = b, a+b
return result
series = fib_list(2000)
print(series) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597]
print(sum(series)) # can now do math on the result
print(len(series)) # can check how many numbers were generated
The return statement exits the function immediately and sends the specified value back to the caller. If a function has no return statement, or has one with no value, Python automatically returns None.
Scope: What Happens Inside Stays Inside
Variables defined inside a function are local to that function. They do not exist outside of it, and they do not interfere with variables of the same name elsewhere in your program.
def calculate_area(radius):
pi = 3.14159265
area = pi * radius ** 2
return area
print(calculate_area(5)) # 78.53981625
print(pi) # NameError: name 'pi' is not defined
The variable pi inside calculate_area is completely isolated. This is called local scope, and it is one of the most important safety features of functions. It means you can write a function without worrying that it will accidentally overwrite some other variable in your program.
Python's scope rules follow the LEGB model — a term widely used by Python educators and documented in depth by sources like Real Python. LEGB stands for the four scope levels Python searches, in order:
- Local — names defined inside the current function
- Enclosing — names in the local scope of any enclosing (outer) functions, relevant only for nested functions
- Global — names defined at the top level of the current module
- Built-in — names built into Python itself, such as
len,print, andrange
"The LEGB rule defines the order in which Python looks for names. When you reference a given name, Python looks for that name sequentially in the local, enclosing, global, and built-in scope levels if they all exist." — Real Python, "Python Scope and the LEGB Rule"
The key practical takeaway for function authors is that local variables inside functions are safe from interference with the rest of your program. If Python cannot find a name in local scope, it works outward through the LEGB hierarchy — and if the name is not found anywhere, it raises a NameError.
The Mutable Default Argument Trap
This is one of the most common beginner mistakes in Python, and it is not covered in many introductory tutorials. Default parameter values are evaluated once at function definition time, not each time the function is called. When the default value is a mutable object like a list or dictionary, this leads to behavior that surprises nearly every programmer the first time they see it.
# DANGER: This function is broken in a non-obvious way
def add_item(item, collection=[]):
"""Add item to a collection and return it."""
collection.append(item)
return collection
print(add_item("apple")) # ['apple'] — looks fine
print(add_item("banana")) # ['apple', 'banana'] — wait, what?
print(add_item("cherry")) # ['apple', 'banana', 'cherry'] — the list persists
The list [] was created once, when Python compiled the function definition. Every call that does not supply a collection argument is modifying that same list object. The fix is a standard Python idiom: use None as the default, then create the mutable object inside the function body.
# CORRECT: Use None as the sentinel
def add_item(item, collection=None):
"""Add item to a collection and return it."""
if collection is None:
collection = []
collection.append(item)
return collection
print(add_item("apple")) # ['apple']
print(add_item("banana")) # ['banana'] — fresh list each time
PEP 8 and common Python style guides warn against using mutable objects as default argument values. Python's own documentation for functools and the standard library frequently uses the None-as-sentinel pattern for exactly this reason. If you see a function with a list or dict as a default, treat it as a code smell until proven otherwise.
Real-World Applications of Python Functions
Theory is only useful if it connects to real use cases. Here are several concrete examples of how functions are used in actual Python projects across different domains.
Data Cleaning in Analytics Pipelines
Anyone working with data knows that raw data is rarely clean. A common pattern is to define a function that standardizes or sanitizes a value, then apply it across an entire dataset.
def clean_phone_number(raw):
"""Strip non-numeric characters and return a 10-digit string."""
digits = ''.join(filter(str.isdigit, raw))
if len(digits) == 11 and digits[0] == '1':
digits = digits[1:] # remove country code
return digits if len(digits) == 10 else None
# Applied to a list
phone_numbers = ["(555) 867-5309", "1-800-555-0199", "invalid"]
cleaned = [clean_phone_number(p) for p in phone_numbers]
print(cleaned) # ['5558675309', '8005550199', None]
This function is defined once and applied everywhere. If the cleaning logic needs to change, you update one function and the change propagates throughout your entire codebase.
Security and Cybersecurity Tools
In cybersecurity work, functions are used extensively to encapsulate validation logic, hash comparisons, and input sanitization. Here is a simplified demonstration of the pattern — wrapping hashing logic inside named functions so any developer on the team calls verify_password() rather than reimplementing the logic from scratch:
import hashlib
def hash_password(password, salt=""):
"""Return a SHA-256 hash of the password combined with a salt.
NOTE: For demonstration only. See security warning below.
"""
combined = password + salt
return hashlib.sha256(combined.encode()).hexdigest()
def verify_password(input_password, stored_hash, salt=""):
"""Check whether an input password matches a stored hash."""
return hash_password(input_password, salt) == stored_hash
The example above demonstrates the function-wrapping concept cleanly, but plain SHA-256 string concatenation is not safe for real password storage. The official Python hashlib documentation states explicitly that general-purpose hash functions like SHA-256 are not suitable for hashing passwords because they are too fast — an attacker with a GPU can compute hundreds of millions of SHA-256 hashes per second and crack common passwords quickly even with a static salt.
For production password storage, use a purpose-built key derivation function: hashlib.pbkdf2_hmac (built into Python's standard library), or a third-party library like bcrypt, scrypt, or Argon2. The OWASP Password Storage Cheat Sheet recommends Argon2id as the first choice as of 2024. A secure pattern looks like this:
import hashlib, os
def hash_password_secure(password: str) -> tuple[bytes, bytes]:
"""Hash a password using PBKDF2-HMAC-SHA256 with a random salt.
Returns (salt, key) — both must be stored together.
"""
salt = os.urandom(32) # cryptographically random, 32 bytes
key = hashlib.pbkdf2_hmac(
'sha256',
password.encode('utf-8'),
salt,
600_000 # OWASP 2024 recommendation for SHA-256
)
return salt, key
def verify_password_secure(password: str, salt: bytes, stored_key: bytes) -> bool:
"""Verify a password against its stored salt and key."""
key = hashlib.pbkdf2_hmac('sha256', password.encode('utf-8'), salt, 600_000)
return key == stored_key
Sources: Python hashlib docs, OWASP Password Storage Cheat Sheet.
Wrapping this logic in named functions — whether the simplified version or the production-grade version — means any developer on the team calls a clearly named function rather than reimplementing hashing logic from scratch, reducing both errors and audit complexity.
Automation Scripts
Functions are at the heart of automation. Consider a script that processes files in a directory:
import os
def get_file_extension(filename):
"""Return the lowercase file extension without the dot."""
_, ext = os.path.splitext(filename)
return ext.lower().lstrip('.')
def categorize_file(filename):
"""Return a category string based on file extension."""
ext = get_file_extension(filename)
categories = {
'pdf': 'document',
'docx': 'document',
'txt': 'document',
'jpg': 'image',
'png': 'image',
'gif': 'image',
'mp4': 'video',
'mov': 'video',
}
return categories.get(ext, 'other')
Notice that categorize_file calls get_file_extension. Functions can and should call other functions. This is called composition, and it is how large programs are built from small, testable, readable pieces.
API and Web Development
In web frameworks like Flask or FastAPI, route handlers are functions. Every endpoint in your web application is defined using a function.
from flask import Flask, jsonify
app = Flask(__name__)
def calculate_bmi(weight_kg, height_m):
"""Calculate Body Mass Index given weight in kg and height in meters."""
if height_m <= 0:
raise ValueError("Height must be greater than zero.")
return round(weight_kg / (height_m ** 2), 2)
@app.route('/bmi/<float:weight>/<float:height>')
def bmi_endpoint(weight, height):
try:
bmi = calculate_bmi(weight, height)
return jsonify({"bmi": bmi})
except ValueError as e:
return jsonify({"error": str(e)}), 400
The business logic (the actual BMI math) lives in calculate_bmi, separate from the web layer. This separation makes both parts easier to test and maintain.
Machine Learning Preprocessing
In data science and machine learning, functions handle feature engineering — transforming raw data into a form the model can use.
def normalize(values):
"""Normalize a list of numbers to the range [0, 1]."""
min_val = min(values)
max_val = max(values)
if max_val == min_val:
return [0.0] * len(values)
return [(v - min_val) / (max_val - min_val) for v in values]
raw_scores = [45, 78, 92, 33, 67, 100, 55]
normalized = normalize(raw_scores)
print(normalized)
# [0.179, 0.672, 0.866, 0.0, 0.507, 1.0, 0.328]
Best Practices for Writing Functions
Writing a function is easy. Writing a good function takes a bit more intentionality. The practices below are drawn from PEP 8, PEP 257, and widely-cited sources in the Python community, including Robert C. Martin's "Clean Code" principles as they apply to Python.
One function, one job. A function should do one thing and do it well. If you find yourself using the word "and" to describe what your function does ("it validates the input and saves it to the database and sends an email"), it should probably be three functions. This is the Single Responsibility Principle applied at the function level.
Name it clearly. process_data() tells you almost nothing. remove_duplicate_entries() tells you exactly what to expect. PEP 8 recommends verb-noun combinations that describe the action and the subject — names like calculate_tax(), validate_email(), and fetch_user_records() are self-documenting.
Keep it short. Python community guidelines generally suggest functions should be short enough to read in a single screen. There is no universal line-count rule, but if your function is requiring you to scroll, consider whether it can be split. Short functions are easier to test, easier to read, and easier to reuse. A common heuristic in the Python community is that if a function exceeds 20 to 30 lines, it is worth asking whether it can be decomposed.
Write the docstring first. PEP 257 states that all public modules, functions, classes, and methods should have docstrings. Writing the docstring before the function body forces you to think clearly about the function's purpose, parameters, and return value before you start coding — a habit that catches design problems early.
Never use mutable objects as default arguments. As covered in the section above, this is one of Python's most reliable ways to introduce subtle bugs. Always use None as a sentinel and create the mutable object inside the function body.
Avoid side effects when possible. A function that modifies a global variable or directly changes a mutable object passed to it is harder to reason about. Pure functions — those that take inputs and return outputs without touching anything outside their local scope — are the easiest to test and debug. When side effects are necessary, document them clearly in the docstring.
"It's good practice to include docstrings in code that you write, so try to make a habit of it." — Python Official Tutorial, Section 4.9: Defining Functions
A quick test: if you can describe what your function does in one short sentence without using the word "and," it probably has the right scope. If you need "and," split it.
Functions Are the Building Blocks of Everything
Every significant Python program, whether it is a web application, a machine learning model, a network scanner, or an automation script, is built from functions. They are the primary unit of organization in Python code. Understanding how to define them, name them, scope them, compose them, and avoid their gotchas is not just a beginner skill — it is the foundation that everything else builds on.
The Fibonacci example in the Python docs is deceptively simple. It fits in eight lines. But the concepts it demonstrates — def, parameters, local variables, loops, simultaneous assignment, and the relationship between a function and its caller — are the same concepts you will use every day no matter how advanced your code becomes.
The mutable default trap, the *args/**kwargs signature forms, and the LEGB scope model are not advanced topics — they are things you will encounter within your first few real projects. Learning them now, with the correct mental model, means you will recognize the patterns immediately when you see them rather than spending hours debugging surprising behavior.
Start writing functions early, write them often, and make a habit of asking yourself: "If I needed to do this again somewhere else in my code, would I have to copy and paste?" If the answer is yes, it is time to wrap it in a function.
Sources and Further Reading
- Python Official Tutorial, Section 4.9: Defining Functions — the canonical reference for
def, docstrings, and the Fibonacci example - PEP 8: Style Guide for Python Code — naming conventions for functions, variables, and modules
- PEP 257: Docstring Conventions — formal specification for Python docstrings
- Real Python: Python Scope and the LEGB Rule — comprehensive coverage of scope resolution
- Python hashlib documentation — official guidance on hashing and the explicit warning against SHA-256 for password storage
- OWASP Password Storage Cheat Sheet — production recommendations for password hashing algorithms
- Luciano Ramalho, Fluent Python, 2nd Edition, Chapter 7: Functions as First-Class Objects — O'Reilly Media, 2022