You type "black holes" into Wikipedia, click a link to "general relativity," then to "Albert Einstein," then to "Zurich" — and suddenly it is 40 minutes later. This tutorial teaches you to recreate that experience in Python, building a program that automatically follows Wikipedia articles one link at a time using the public Wikipedia REST API.
By the end of this tutorial you will have a working Python script that starts on any Wikipedia topic you choose, prints the article title and a short summary, picks the first linked article, and repeats that process for as many hops as you set. Along the way you will learn how functions work, how loops repeat code, how Python reads data from the internet, and what JSON actually is — all in the context of a project you can run right now.
What You Will Build
The finished program is roughly 30 lines of Python. When you run it you will see output that looks like this:
Hop 1: Black hole
A black hole is a region of spacetime where gravity is so strong
that nothing — not even light — can escape...
Hop 2: General relativity
General relativity is Einstein's geometric theory of gravitation...
Hop 3: Albert Einstein
Albert Einstein was a German-born theoretical physicist who developed
the theory of relativity...
Hop 4: Zürich
Zürich is the largest city in Switzerland and the capital of the
canton of Zürich...
Each hop is one call to the Wikipedia API. The program reads the response, prints the summary, then extracts the title of the first linked article to use as the next starting point. That is the entire loop — simple on the surface, but it touches several real programming concepts all at once.
You need Python 3.7 or newer installed. Open a terminal and run python --version to check. You also need an internet connection, since the program fetches live data from Wikipedia on every run.
- What it is
- A third-party Python library that makes sending HTTP requests — the same kind your browser sends — straightforward.
- How you install it
pip install requestsin your terminal. Only needs to be done once.
- What it is
- A free web service from the Wikimedia Foundation. You send it an article title and it returns a JSON object with the summary, linked pages, images, and more.
- Key endpoint
https://en.wikipedia.org/api/rest_v1/page/summary/{title}— replace{title}with any article name.
- What it is
- A text format for structured data that uses key-value pairs inside curly braces. A Python dictionary and a JSON object look almost identical.
- How Python handles it
- Calling
response.json()on a requests Response object converts the JSON text into a real Python dictionary automatically.
What Python Needs: The requests Library and JSON
Python can do a lot out of the box, but fetching data from the internet is not in its standard library. For that you install requests, a library that has become the standard choice for HTTP in Python because it keeps the code readable. Open a terminal and run the install command:
pip install requests
Once installed, you bring it into your script with an import statement at the top. You also import time — a standard library module — so you can add a small pause between requests and avoid hitting Wikipedia's servers too rapidly.
import requests
import time
An import statement tells Python to load a library so you can use its functions. Everything inside requests is now available to your script through the requests name.
Reading the Wikipedia API response
The Wikipedia REST API returns data in JSON format. Here is a simplified version of what the API sends back when you ask for the "Black hole" article:
{
"title": "Black hole",
"extract": "A black hole is a region of spacetime...",
"content_urls": {
"desktop": {
"page": "https://en.wikipedia.org/wiki/Black_hole"
}
}
}
After calling response.json(), Python turns that text into a dictionary. You access the title with data["title"] and the summary with data["extract"], exactly the same way you would access any Python dictionary key.
Paste https://en.wikipedia.org/api/rest_v1/page/summary/Python_(programming_language) into your browser's address bar. You will see the raw JSON response that your Python script will receive when it fetches the same URL.
Build the correct Python import statement that brings in the requests library:
import followed by the library name to make that library available in your script. from is used for a different pattern (from x import y). include and load are not valid Python keywords for this purpose — they come from other languages.
Writing Your First Fetch Function
Rather than repeating the same request code over and over, you wrap it in a function. A function is a named block of code that you can call by name whenever you need it. Here is the fetch function for this project:
def fetch_article(title):
url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{title}"
headers = {"User-Agent": "RabbitHoleExplorer/1.0 (python-tutorial)"}
response = requests.get(url, headers=headers)
if response.status_code != 200:
return None
return response.json()
Let's walk through each line. def fetch_article(title): defines the function. The word in the parentheses, title, is a parameter — it is a placeholder for whatever article name you pass in when you call the function. The f-string on the second line builds the URL dynamically by inserting that title into the Wikipedia API address.
The headers dictionary sets a User-Agent string, which is a short description of your program. The Wikimedia Foundation asks that all API users include one so they can understand traffic patterns. Skipping it still works, but it is polite practice. requests.get() sends the HTTP request and stores the server's reply in the variable response.
The if block checks the HTTP status code. A status code of 200 means "everything is fine." Any other number means something went wrong — the article was not found, the server is busy, or the network failed. Returning None in that case gives the calling code a clear signal to stop or skip. If the status is 200, the function returns response.json(), which converts the raw text response into a Python dictionary.
Wikipedia article titles with spaces need underscores in the URL: Black_hole not Black hole. The program handles this automatically when it extracts titles from the API response, but if you change the starting topic at the top of the script, use underscores.
The fetch function below has one bug. Click the line you think is wrong, then hit check.
response.JSON() to response.json(). Python method names are case-sensitive. JSON() does not exist on a Response object — only the lowercase json() does. This is a common mistake when coming from languages that are less strict about casing.
How to Build a Wikipedia Rabbit Hole Explorer in Python
The steps below take you from a blank file to a complete, running program. Each step introduces one new concept before adding code so nothing appears without context.
-
Install the requests library
Open a terminal and run
pip install requests. This downloads the library to your Python environment. You only need to do this once. To confirm it installed correctly, runpython -c "import requests; print(requests.__version__)"— you should see a version number printed. -
Write the fetch_article() function
Create a new file named
rabbit_hole.py. Addimport requestsandimport timeat the top, then write thefetch_article(title)function shown in the section above. This function is the entire data-retrieval engine of the project. -
Write the rabbit hole loop
Below the function, set a starting topic and a hop count, then write a
forloop that callsfetch_article(), prints the title and summary, and extracts the next linked article title from thelinked_pageslist in the response. Addtime.sleep(1)at the bottom of each iteration to pause one second between requests. -
Add error handling
Wrap the
fetch_article()call in the loop with anif data is None: breakcheck. Also guard against a missinglinked_pageskey usingdata.get("linked_pages", [])so the program does not crash on articles with no links. -
Run the complete program
Save the file, open your terminal in the same folder, and run
python rabbit_hole.py. Watch the chain of articles print one by one. Try changing theSTART_TOPICvariable to any subject you find interesting and re-run the script to explore a different path.
Here is the complete program, combining all the pieces above:
import requests
import time
# ── Configuration ──────────────────────────────────────────
START_TOPIC = "Black_hole" # use underscores for spaces
HOP_COUNT = 5 # how many articles to follow
SUMMARY_LEN = 200 # characters to print per summary
# ───────────────────────────────────────────────────────────
def fetch_article(title):
"""Fetch a Wikipedia article summary by title.
Returns a dict on success, or None on failure."""
url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{title}"
headers = {"User-Agent": "RabbitHoleExplorer/1.0 (python-tutorial)"}
try:
response = requests.get(url, headers=headers, timeout=10)
except requests.exceptions.RequestException:
return None
if response.status_code != 200:
return None
return response.json()
def run_rabbit_hole(start, hops):
current = start
for hop in range(1, hops + 1):
data = fetch_article(current)
if data is None:
print(f"Could not fetch '{current}'. Stopping.")
break
title = data.get("title", current)
extract = data.get("extract", "No summary available.")
links = data.get("linked_pages", [])
print(f"\nHop {hop}: {title}")
print(extract[:SUMMARY_LEN] + ("..." if len(extract) > SUMMARY_LEN else ""))
if not links:
print("No linked pages found. Stopping.")
break
# The next article is the first item in linked_pages
current = links[0].get("title", "").replace(" ", "_")
if not current:
print("Could not determine next article. Stopping.")
break
time.sleep(1) # be polite to the Wikipedia API
if __name__ == "__main__":
run_rabbit_hole(START_TOPIC, HOP_COUNT)
"Simple is better than complex." — The Zen of Python, Tim Peters (PEP 20)
Python Learning Summary Points
- A function defined with
deflets you name a block of code and reuse it without rewriting it. Parameters liketitlemake functions flexible — the same function can fetch any Wikipedia article by receiving a different title each time it is called. - The
requests.get(url)call sends an HTTP GET request — the same kind your browser sends when you visit a page. The response object has astatus_codeattribute and ajson()method that converts the response body into a Python dictionary. - A
forloop combined withrange(1, hops + 1)runs the loop body exactlyhopstimes, with the variablehopincrementing on each pass. You usebreakto exit the loop early if an error occurs. - Dictionary
.get(key, default)is safer than direct key access because it returns the default value instead of raising aKeyErrorwhen the key is missing — critical for API data where any field might be absent. - The
if __name__ == "__main__":block at the bottom ensures the program only runs when you execute the file directly, not when another script imports it. It is a best practice even for short scripts.
From here, there are many directions to take the project. You could write the visited titles to a text file to save the trail, add a limit on how many characters to print so long articles do not flood the terminal, build a loop that asks the user to confirm each hop, or swap Wikipedia for a different public API entirely. The core skills — functions, HTTP requests, JSON parsing, and loops — transfer directly to any of those extensions.
Frequently Asked Questions
A Wikipedia rabbit hole explorer is a Python program that fetches a Wikipedia article summary, then automatically follows a linked article from that page — repeating the process as many times as you choose. It demonstrates how to use the requests library, parse JSON responses, write reusable functions, and use loops to chain actions together.
No prior experience is required. The tutorial is written for absolute beginners. It introduces variables, functions, loops, and HTTP requests step by step, with explanations of every line of code before the full program is assembled.
The project uses two libraries: requests, which is a third-party library for making HTTP calls, and time, which is part of the Python standard library and provides the sleep() function used to pause between API calls. You install requests with pip install requests.
The Wikipedia REST API is a free, publicly available web service provided by the Wikimedia Foundation. It returns article summaries, links, images, and other data in JSON format. No API key is required for basic usage, though Wikimedia requests that you set a descriptive User-Agent header in your requests.
JSON (JavaScript Object Notation) is a lightweight text format for structured data. It organizes information as key-value pairs inside curly braces, similar to a Python dictionary. APIs commonly use JSON because it is easy to read, easy for programs to parse, and language-independent.
requests.get(url) sends an HTTP GET request to the URL you provide and returns a Response object. You can check response.status_code to confirm the request succeeded (200 means OK), and call response.json() to automatically parse the JSON body into a Python dictionary.
A function is a named, reusable block of code defined with the def keyword. In this project, wrapping the Wikipedia fetch logic in a function lets you call it once per loop iteration without rewriting the same code each time. It also makes the program easier to read and modify.
The program uses a for loop combined with range() to repeat the fetch-and-follow cycle a set number of times. On each iteration, it fetches the current article, prints the title and summary, picks the first linked article from the response, then uses that article title as the starting point for the next iteration.
response.json() parses the HTTP response body as JSON and returns a Python dictionary. You can then access specific values using dictionary key syntax — for example, data['title'] to get the article title or data['extract'] to get the summary text.
Save the code to a file named rabbit_hole.py, then open a terminal, navigate to the folder containing the file, and run python rabbit_hole.py. Make sure you have Python 3 installed and have run pip install requests beforehand.