Every new Python developer eventually reaches the same turning point. They spend hours searching for a package to handle some common task — parsing dates, compressing files, sending an email — only to discover that Python already ships with exactly what they need, no installation required.
This is not a coincidence. Python was designed with a "batteries included" philosophy, meaning the standard library covers an enormous range of everyday programming needs right out of the box. You do not need to pip install your way through basic system operations, file management, pattern matching, or internet access. The tools are already there.
This guide walks through the most useful modules in Python's standard library, with real application context so you can see how these tools actually get used in production code — not just in contrived examples.
The os and shutil Modules: Working with Your Operating System
The os module is one of the first you will reach for when your Python scripts need to interact with the underlying operating system. Whether you are building a file processing pipeline, automating a deployment script, or writing a backup tool, os gives you a portable interface to the OS that works across Windows, macOS, and Linux.
import os
# Find out where you are
print(os.getcwd())
# Navigate directories
os.chdir('/var/log')
# Run a shell command
os.system('ls -lah')
A practical use case: imagine you are writing a script that processes log files dropped into a specific directory each night. You need to check whether the directory exists, create it if not, list its contents, and move processed files to an archive folder. The os module handles all of this.
import os
import shutil
log_dir = '/var/app/logs/incoming'
archive_dir = '/var/app/logs/archive'
# Create directories if they do not exist
os.makedirs(archive_dir, exist_ok=True)
# List files
for filename in os.listdir(log_dir):
full_path = os.path.join(log_dir, filename)
if os.path.isfile(full_path):
# Process... then archive
shutil.move(full_path, os.path.join(archive_dir, filename))
Notice the use of shutil alongside os. While os handles low-level system interaction, shutil (short for shell utilities) provides higher-level file operations. Copying a file, moving it, or deleting an entire directory tree are all cleaner through shutil than through raw os calls.
Always use import os rather than from os import *. The os module contains a function called open() that behaves very differently from Python's built-in open(). Wildcard imports can shadow built-ins in ways that produce confusing bugs.
The glob Module: Finding Files with Wildcards
When os.listdir() gives you too much (everything in the directory) and you want to filter by pattern, glob is the answer.
import glob
# Find all Python files in the current directory
python_files = glob.glob('*.py')
# Find all log files recursively
all_logs = glob.glob('/var/log/**/*.log', recursive=True)
A realistic scenario: you are building a data ingestion script for a reporting pipeline. Every day, your system receives CSV files named with a date stamp like sales_2025_03_04.csv. You want to grab only the files from a specific month.
import glob
march_files = glob.glob('/data/sales/sales_2025_03_*.csv')
for filepath in sorted(march_files):
print(f"Processing: {filepath}")
# hand off to your processing function
This is far cleaner than manually filtering os.listdir() results with string comparisons. The glob module supports Unix-style wildcards including * (match anything), ? (match one character), and ** (recursive directory matching when recursive=True).
The sys and argparse Modules: Building Real Command-Line Tools
Scripts that other people use need proper command-line interfaces. The sys module gives you the raw argv list of arguments passed to your script, while argparse gives you a complete, production-quality argument parser with help text, type validation, and optional vs. required arguments.
Here is a script that uses argparse to build a simple log analysis tool:
import argparse
import sys
parser = argparse.ArgumentParser(
prog='logcheck',
description='Scan log files for error patterns'
)
parser.add_argument('logfile', help='Path to the log file')
parser.add_argument('--level', choices=['ERROR', 'WARN', 'INFO'], default='ERROR')
parser.add_argument('--limit', type=int, default=50, help='Max lines to return')
args = parser.parse_args()
try:
with open(args.logfile) as f:
matches = [line for line in f if args.level in line]
for line in matches[:args.limit]:
print(line.rstrip())
except FileNotFoundError:
sys.stderr.write(f"Error: File not found: {args.logfile}\n")
sys.exit(1)
Notice the use of sys.stderr for error output. When a script's stdout is being piped into another command, error messages written to stdout get mixed into the data stream. Writing errors to stderr keeps them separate, visible in the terminal, and out of your data pipeline. sys.exit(1) signals to the calling shell that the script failed — exit code 0 means success; any non-zero value means failure. This matters when your script is called from a cron job, CI pipeline, or shell script that checks exit codes.
The re Module: Pattern Matching That Actually Scales
String methods like .replace() and .split() cover a lot of ground, but there are patterns they simply cannot handle. Email validation, extracting all URLs from a document, parsing structured log lines with variable formatting — these require regular expressions.
Python's re module provides full regular expression support:
import re
log_line = '2025-03-04 14:22:31 ERROR [auth] Failed login attempt from 192.168.1.45'
# Extract the IP address
ip_match = re.search(r'\b\d{1,3}(\.\d{1,3}){3}\b', log_line)
if ip_match:
print(ip_match.group()) # 192.168.1.45
# Extract all email addresses from a document
text = "Contact support@example.com or billing@example.org for help."
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print(emails) # ['support@example.com', 'billing@example.org']
A practical use for re.sub(): sanitizing user input before storing it. Say you want to strip HTML tags from a field:
import re
raw_input = "<p>Hello <b>world</b>, welcome to <em>Python</em>!</p>"
clean = re.sub(r'<[^>]+>', '', raw_input)
print(clean) # Hello world, welcome to Python!
When string methods are sufficient, use them — they are faster and easier to read. But when the pattern is complex or variable, regular expressions earn their place.
The math, random, and statistics Modules: Numbers Without Third-Party Dependencies
For a lot of numeric work, you do not need NumPy or SciPy. Python's built-in math modules handle a substantial range of calculations.
The math module wraps the underlying C library for floating-point operations:
import math
# Calculate the hypotenuse of a right triangle
a, b = 3, 4
hypotenuse = math.sqrt(a**2 + b**2) # 5.0
# More cleanly, use math.hypot directly
hypotenuse = math.hypot(a, b) # 5.0
# Logarithms, trig, constants
print(math.log(1024, 2)) # 10.0
print(math.cos(math.pi)) # -1.0
print(math.floor(3.7)) # 3
The random module is useful for simulations, shuffling, sampling, and testing. A common application is generating test data:
import random
user_ids = list(range(1, 10001))
test_sample = random.sample(user_ids, 100) # 100 unique IDs, no repeats
# Randomly select a fallback server from a pool
servers = ['server-a', 'server-b', 'server-c']
selected = random.choice(servers)
The statistics module handles descriptive statistics without any imports beyond the standard library:
import statistics
response_times = [120, 135, 98, 210, 145, 102, 189, 134, 156, 201]
print(f"Mean: {statistics.mean(response_times):.1f} ms")
print(f"Median: {statistics.median(response_times)} ms")
print(f"Stdev: {statistics.stdev(response_times):.1f} ms")
This is useful for quick performance analysis, monitoring scripts, or any situation where you need basic analytics without pulling in pandas.
The urllib and smtplib Modules: Reaching Out Over the Network
Web requests in production Python usually go through the requests library because of its cleaner interface. But urllib.request is available without any installation, which matters when you are writing scripts for environments where you cannot add dependencies, or when you are automating tasks on a fresh server.
from urllib.request import urlopen
import json
# Fetch data from a public API
with urlopen('https://api.github.com/repos/python/cpython') as response:
data = json.loads(response.read().decode())
print(f"CPython has {data['stargazers_count']:,} stars on GitHub")
print(f"Open issues: {data['open_issues_count']:,}")
For sending email from Python without a third-party library, smtplib works well for notification systems, alert scripts, and automated reporting:
import smtplib
from email.mime.text import MIMEText
def send_alert(to_address, subject, body, smtp_host='localhost'):
msg = MIMEText(body)
msg['Subject'] = subject
msg['From'] = 'alerts@yourdomain.com'
msg['To'] = to_address
with smtplib.SMTP(smtp_host) as server:
server.sendmail(msg['From'], [to_address], msg.as_string())
send_alert(
'devops@yourcompany.com',
'Disk Usage Warning',
'Disk usage on /var has exceeded 85%.'
)
The smtplib module is low-level. For TLS connections or authentication against services like Gmail's SMTP server, you would add server.starttls() and server.login() calls. The email package handles building complex messages with attachments, HTML content, and proper MIME encoding.
The datetime Module: Dates Without the Headache
Date handling is one of those areas where many developers reach for third-party libraries (like arrow or pendulum) before discovering how capable the built-in datetime module actually is.
from datetime import date, datetime, timedelta
# Get today's date
today = date.today()
# Calculate a deadline 30 days from now
deadline = today + timedelta(days=30)
print(deadline.strftime("%B %d, %Y")) # e.g., April 03, 2025
# Parse a date string
event_date = datetime.strptime('2025-06-15 09:30:00', '%Y-%m-%d %H:%M:%S')
print(f"Event is on a {event_date.strftime('%A')}") # e.g., Sunday
A practical use case: calculating how many business-relevant days remain until a deadline, or generating date ranges for database queries.
from datetime import date, timedelta
def date_range(start, end):
"""Generate all dates between start and end (inclusive)."""
current = start
while current <= end:
yield current
current += timedelta(days=1)
start = date(2025, 3, 1)
end = date(2025, 3, 31)
march_dates = list(date_range(start, end))
print(f"March has {len(march_dates)} days")
The datetime module also supports timezone-aware objects through datetime.timezone, which is critical for any application where users or systems span multiple time zones.
The zlib and zipfile Modules: Handling Compressed Data
Compression is not just for saving disk space — it matters for transfer speed, log rotation, and archiving. Python handles multiple compression formats natively.
import zipfile
import os
# Create a zip archive of all log files
with zipfile.ZipFile('logs_march.zip', 'w', zipfile.ZIP_DEFLATED) as zf:
for filename in os.listdir('/var/log/app'):
if filename.endswith('.log'):
zf.write(os.path.join('/var/log/app', filename), filename)
# Extract a specific file from an archive
with zipfile.ZipFile('backup.zip', 'r') as zf:
zf.extract('config.json', path='/tmp/restore')
For lower-level compression (useful when you need to compress data in memory, not just files), zlib is the right tool:
import zlib
original = b'This is some repetitive repetitive repetitive data.'
compressed = zlib.compress(original)
ratio = len(compressed) / len(original)
print(f"Compressed to {ratio:.0%} of original size")
# Round-trip verification
assert zlib.decompress(compressed) == original
The timeit Module: Knowing Which Code Is Actually Faster
When you have two ways to write something and want to know which one performs better, do not guess. Use timeit.
from timeit import timeit
# Compare two ways to build a string
join_time = timeit(
stmt="result = ', '.join(str(i) for i in range(100))",
number=10000
)
concat_time = timeit(
stmt="""
result = ''
for i in range(100):
result += str(i) + ', '
""",
number=10000
)
print(f"join: {join_time:.4f}s")
print(f"concat: {concat_time:.4f}s")
In practice, str.join() is substantially faster than concatenation in a loop for string building. timeit removes the guesswork. For profiling entire programs rather than small snippets, the profile and pstats modules give you a detailed view of where time is actually being spent — useful when optimizing a slow script and you need to identify the real bottleneck rather than speculating.
The doctest and unittest Modules: Testing as You Go
Two different philosophies for testing live in the standard library, and both are worth knowing.
doctest is the lighter-weight option. It finds Python interactive session examples in your docstrings and verifies that they still produce the right output:
def celsius_to_fahrenheit(c):
"""Convert Celsius to Fahrenheit.
>>> celsius_to_fahrenheit(0)
32.0
>>> celsius_to_fahrenheit(100)
212.0
>>> celsius_to_fahrenheit(-40)
-40.0
"""
return (c * 9 / 5) + 32
if __name__ == '__main__':
import doctest
doctest.testmod()
When the module is run directly, doctest checks every example. If the output matches, it passes silently. If something breaks, you get a clear diff showing what was expected versus what actually happened. This approach keeps documentation and tests in sync because they are the same thing.
unittest is the heavier framework, modeled after Java's JUnit. It is better suited for larger test suites where you want setup/teardown, test discovery, and the ability to run subsets of tests:
import unittest
def parse_price(price_string):
"""Parse a price string like '$12.99' into a float."""
return float(price_string.replace('$', '').replace(',', ''))
class TestParsePrice(unittest.TestCase):
def test_basic_price(self):
self.assertEqual(parse_price('$12.99'), 12.99)
def test_large_price(self):
self.assertEqual(parse_price('$1,299.00'), 1299.00)
def test_invalid_input(self):
with self.assertRaises(ValueError):
parse_price('not a price')
if __name__ == '__main__':
unittest.main()
The sqlite3, json, and csv Modules: Data In, Data Out
Python's "batteries included" approach shines in data handling. Three modules cover the most common data interchange needs.
sqlite3 gives you a full relational database with no server setup required:
import sqlite3
conn = sqlite3.connect('inventory.db')
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
quantity INTEGER,
price REAL
)
''')
cursor.execute('INSERT INTO products (name, quantity, price) VALUES (?, ?, ?)',
('Widget A', 500, 9.99))
conn.commit()
for row in cursor.execute('SELECT * FROM products WHERE quantity > 0'):
print(row)
conn.close()
json handles the ubiquitous data format used by virtually every API:
import json
# Parse JSON from an API response
api_response = '{"status": "ok", "count": 42, "items": ["a", "b", "c"]}'
data = json.loads(api_response)
print(data['count']) # 42
# Write structured data to a file
config = {'debug': False, 'max_retries': 3, 'timeout': 30}
with open('config.json', 'w') as f:
json.dump(config, f, indent=2)
csv handles spreadsheet-style data, which still powers a significant portion of business data exchange:
import csv
with open('employees.csv', newline='') as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['name']} - Department: {row['department']}")
Why This Matters
The Python standard library represents decades of accumulated problem-solving. When you reach for it first, you avoid dependency sprawl, reduce the attack surface of your application, and write code that runs on any Python installation without a setup step.
The modules covered here are not the full extent of what ships with Python — far from it. The standard library also includes threading and multiprocessing tools, networking primitives, HTML parsing, email handling, cryptographic hashing, logging infrastructure, and much more. But these twelve areas represent the modules you will reach for most often, the ones where having a solid mental model pays dividends daily.
The next time you find yourself about to pip install something for a common task, check the standard library first. There is a good chance the tool you need is already there.