Build a Wikipedia Rabbit Hole Explorer in Python

Q: What Python libraries does this project use?

The project uses two libraries: requests, which is a third-party library for making HTTP calls, and json, which is part of the Python standard library and handles parsing the data returned by the Wikipedia API. You install requests with pip install requests.

Q: How does requests.get() work in Python?

requests.get(url) sends an HTTP GET request to the URL you provide and returns a Response object. You can check response.status_code to confirm the request succeeded (200 means OK), and call response.json() to automatically parse the JSON body into a Python dictionary.

Q: How does the loop work in the rabbit hole explorer?

The program uses a for loop combined with range() to repeat the fetch-and-follow cycle a set number of times. On each iteration, it fetches the current article, prints the title and summary, picks the first linked article from the response, then uses that article title as the starting point for the next iteration.

Q: What does response.json() return?

response.json() parses the HTTP response body as JSON and returns a Python dictionary. You can then access specific values using dictionary key syntax, for example data['title'] to get the article title or data['extract'] to get the summary text.

Q: How do I run this Python program?

Save the code to a file named rabbit_hole.py, then open a terminal, navigate to the folder containing the file, and run python rabbit_hole.py. Make sure you have Python 3 installed and have run pip install requests beforehand.

You type "black holes" into Wikipedia, click a link to "general relativity," then to "Albert Einstein," then to "Zurich" — and suddenly it is 40 minutes later. This tutorial teaches you to recreate that experience in Python, building a program that automatically follows Wikipedia articles one link at a time using the public Wikipedia REST API.

By the end of this tutorial you will have a working Python script that starts on any Wikipedia topic you choose, prints the article title and a short summary, picks the first linked article, and repeats that process for as many hops as you set. Along the way you will learn how functions work, how loops repeat code, how Python reads data from the internet, and what JSON actually is — all in the context of a project you can run right now.

What You Will Build

The finished program is roughly 30 lines of Python. When you run it you will see output that looks like this:

output

Hop 1: Black hole
A black hole is a region of spacetime where gravity is so strong
that nothing — not even light — can escape...

Hop 2: General relativity
General relativity is Einstein's geometric theory of gravitation...

Hop 3: Albert Einstein
Albert Einstein was a German-born theoretical physicist who developed
the theory of relativity...

Hop 4: Zürich
Zürich is the largest city in Switzerland and the capital of the
canton of Zürich...

Each hop is one call to the Wikipedia API. The program reads the response, prints the summary, then extracts the title of the first linked article to use as the next starting point. That is the entire loop — simple on the surface, but it touches several real programming concepts all at once.

Before you start

You need Python 3.7 or newer installed. Open a terminal and run python --version to check. You also need an internet connection, since the program fetches live data from Wikipedia on every run.

What it is: A third-party Python library that makes sending HTTP requests — the same kind your browser sends — straightforward.
How you install it: pip install requests in your terminal. Only needs to be done once.

What it is: A free web service from the Wikimedia Foundation. You send it an article title and it returns a JSON object with the summary, linked pages, images, and more.
Key endpoint: https://en.wikipedia.org/api/rest_v1/page/summary/{title} — replace {title} with any article name.

What it is: A text format for structured data that uses key-value pairs inside curly braces. A Python dictionary and a JSON object look almost identical.
How Python handles it: Calling response.json() on a requests Response object converts the JSON text into a real Python dictionary automatically.

What Python Needs: The requests Library and JSON

Python can do a lot out of the box, but fetching data from the internet is not in its standard library. For that you install requests, a library that has become the standard choice for HTTP in Python because it keeps the code readable. Open a terminal and run the install command:

terminal

pip install requests

Once installed, you bring it into your script with an import statement at the top. You also import time — a standard library module — so you can add a small pause between requests and avoid hitting Wikipedia's servers too rapidly.

python

import requests
import time

An import statement tells Python to load a library so you can use its functions. Everything inside requests is now available to your script through the requests name.

Reading the Wikipedia API response

The Wikipedia REST API returns data in JSON format. Here is a simplified version of what the API sends back when you ask for the "Black hole" article:

json

{
  "title": "Black hole",
  "extract": "A black hole is a region of spacetime...",
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.org/wiki/Black_hole"
    }
  }
}

After calling response.json(), Python turns that text into a dictionary. You access the title with data["title"] and the summary with data["extract"], exactly the same way you would access any Python dictionary key.

Try it in your browser first

Paste https://en.wikipedia.org/api/rest_v1/page/summary/Python_(programming_language) into your browser's address bar. You will see the raw JSON response that your Python script will receive when it fetches the same URL.

code builder click a token to place it

Build the correct Python import statement that brings in the requests library:

your code will appear here...

requests from import include load

Why: Python uses import followed by the library name to make that library available in your script. from is used for a different pattern (from x import y). include and load are not valid Python keywords for this purpose — they come from other languages.

Writing Your First Fetch Function

Rather than repeating the same request code over and over, you wrap it in a function. A function is a named block of code that you can call by name whenever you need it. Here is the fetch function for this project:

python

def fetch_article(title):
    url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{title}"
    headers = {"User-Agent": "RabbitHoleExplorer/1.0 (python-tutorial)"}
    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        return None
    return response.json()

Let's walk through each line. def fetch_article(title): defines the function. The word in the parentheses, title, is a parameter — it is a placeholder for whatever article name you pass in when you call the function. The f-string on the second line builds the URL dynamically by inserting that title into the Wikipedia API address.

The headers dictionary sets a User-Agent string, which is a short description of your program. The Wikimedia Foundation asks that all API users include one so they can understand traffic patterns. Skipping it still works, but it is polite practice. requests.get() sends the HTTP request and stores the server's reply in the variable response.

The if block checks the HTTP status code. A status code of 200 means "everything is fine." Any other number means something went wrong — the article was not found, the server is busy, or the network failed. Returning None in that case gives the calling code a clear signal to stop or skip. If the status is 200, the function returns response.json(), which converts the raw text response into a Python dictionary.

Titles with spaces

Wikipedia article titles with spaces need underscores in the URL: Black_hole not Black hole. The program handles this automatically when it extracts titles from the API response, but if you change the starting topic at the top of the script, use underscores.

spot the bug click the line that contains the bug

The fetch function below has one bug. Click the line you think is wrong, then hit check.

The fix: Change response.JSON() to response.json(). Python method names are case-sensitive. JSON() does not exist on a Response object — only the lowercase json() does. This is a common mistake when coming from languages that are less strict about casing.

The rabbit hole loop — each iteration fetches an article, prints its summary, then passes the first linked title back into the loop as the next starting point.

How to Build a Wikipedia Rabbit Hole Explorer in Python

The steps below take you from a blank file to a complete, running program. Each step introduces one new concept before adding code so nothing appears without context.

Install the requests library

Open a terminal and run pip install requests. This downloads the library to your Python environment. You only need to do this once. To confirm it installed correctly, run python -c "import requests; print(requests.__version__)" — you should see a version number printed.
Write the fetch_article() function

Create a new file named rabbit_hole.py. Add import requests and import time at the top, then write the fetch_article(title) function shown in the section above. This function is the entire data-retrieval engine of the project.
Write the rabbit hole loop

Below the function, set a starting topic and a hop count, then write a for loop that calls fetch_article(), prints the title and summary, and extracts the next linked article title from the linked_pages list in the response. Add time.sleep(1) at the bottom of each iteration to pause one second between requests.
Add error handling

Wrap the fetch_article() call in the loop with an if data is None: break check. Also guard against a missing linked_pages key using data.get("linked_pages", []) so the program does not crash on articles with no links.
Run the complete program

Save the file, open your terminal in the same folder, and run python rabbit_hole.py. Watch the chain of articles print one by one. Try changing the START_TOPIC variable to any subject you find interesting and re-run the script to explore a different path.

Here is the complete program, combining all the pieces above:

python

import requests
import time

# ── Configuration ──────────────────────────────────────────
START_TOPIC = "Black_hole"   # use underscores for spaces
HOP_COUNT   = 5              # how many articles to follow
SUMMARY_LEN = 200            # characters to print per summary
# ───────────────────────────────────────────────────────────


def fetch_article(title):
    """Fetch a Wikipedia article summary by title.
    Returns a dict on success, or None on failure."""
    url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{title}"
    headers = {"User-Agent": "RabbitHoleExplorer/1.0 (python-tutorial)"}
    try:
        response = requests.get(url, headers=headers, timeout=10)
    except requests.exceptions.RequestException:
        return None
    if response.status_code != 200:
        return None
    return response.json()


def run_rabbit_hole(start, hops):
    current = start
    for hop in range(1, hops + 1):
        data = fetch_article(current)

        if data is None:
            print(f"Could not fetch '{current}'. Stopping.")
            break

        title   = data.get("title", current)
        extract = data.get("extract", "No summary available.")
        links   = data.get("linked_pages", [])

        print(f"\nHop {hop}: {title}")
        print(extract[:SUMMARY_LEN] + ("..." if len(extract) > SUMMARY_LEN else ""))

        if not links:
            print("No linked pages found. Stopping.")
            break

        # The next article is the first item in linked_pages
        current = links[0].get("title", "").replace(" ", "_")
        if not current:
            print("Could not determine next article. Stopping.")
            break

        time.sleep(1)   # be polite to the Wikipedia API


if __name__ == "__main__":
    run_rabbit_hole(START_TOPIC, HOP_COUNT)

"Simple is better than complex." — The Zen of Python, Tim Peters (PEP 20)

Python Learning Summary Points

A function defined with def lets you name a block of code and reuse it without rewriting it. Parameters like title make functions flexible — the same function can fetch any Wikipedia article by receiving a different title each time it is called.
The requests.get(url) call sends an HTTP GET request — the same kind your browser sends when you visit a page. The response object has a status_code attribute and a json() method that converts the response body into a Python dictionary.
A for loop combined with range(1, hops + 1) runs the loop body exactly hops times, with the variable hop incrementing on each pass. You use break to exit the loop early if an error occurs.
Dictionary .get(key, default) is safer than direct key access because it returns the default value instead of raising a KeyError when the key is missing — critical for API data where any field might be absent.
The if __name__ == "__main__": block at the bottom ensures the program only runs when you execute the file directly, not when another script imports it. It is a best practice even for short scripts.

From here, there are many directions to take the project. You could write the visited titles to a text file to save the trail, add a limit on how many characters to print so long articles do not flood the terminal, build a loop that asks the user to confirm each hop, or swap Wikipedia for a different public API entirely. The core skills — functions, HTTP requests, JSON parsing, and loops — transfer directly to any of those extensions.

check your understanding question 1 of 5

Frequently Asked Questions

What is a Wikipedia rabbit hole explorer in Python?

A Wikipedia rabbit hole explorer is a Python program that fetches a Wikipedia article summary, then automatically follows a linked article from that page — repeating the process as many times as you choose. It demonstrates how to use the requests library, parse JSON responses, write reusable functions, and use loops to chain actions together.

Do I need any prior programming experience for this tutorial?

No prior experience is required. The tutorial is written for absolute beginners. It introduces variables, functions, loops, and HTTP requests step by step, with explanations of every line of code before the full program is assembled.

What Python libraries does this project use?

The project uses two libraries: requests, which is a third-party library for making HTTP calls, and time, which is part of the Python standard library and provides the sleep() function used to pause between API calls. You install requests with pip install requests.

What is the Wikipedia API and is it free to use?

The Wikipedia REST API is a free, publicly available web service provided by the Wikimedia Foundation. It returns article summaries, links, images, and other data in JSON format. No API key is required for basic usage, though Wikimedia requests that you set a descriptive User-Agent header in your requests.

What is JSON and why does the Wikipedia API use it?

JSON (JavaScript Object Notation) is a lightweight text format for structured data. It organizes information as key-value pairs inside curly braces, similar to a Python dictionary. APIs commonly use JSON because it is easy to read, easy for programs to parse, and language-independent.

How does requests.get() work in Python?

requests.get(url) sends an HTTP GET request to the URL you provide and returns a Response object. You can check response.status_code to confirm the request succeeded (200 means OK), and call response.json() to automatically parse the JSON body into a Python dictionary.

What is a Python function and why use one in this project?

A function is a named, reusable block of code defined with the def keyword. In this project, wrapping the Wikipedia fetch logic in a function lets you call it once per loop iteration without rewriting the same code each time. It also makes the program easier to read and modify.

How does the loop work in the rabbit hole explorer?

The program uses a for loop combined with range() to repeat the fetch-and-follow cycle a set number of times. On each iteration, it fetches the current article, prints the title and summary, picks the first linked article from the response, then uses that article title as the starting point for the next iteration.

What does response.json() return?

response.json() parses the HTTP response body as JSON and returns a Python dictionary. You can then access specific values using dictionary key syntax — for example, data['title'] to get the article title or data['extract'] to get the summary text.

How do I run this Python program?

Save the code to a file named rabbit_hole.py, then open a terminal, navigate to the folder containing the file, and run python rabbit_hole.py. Make sure you have Python 3 installed and have run pip install requests beforehand.

Build a Wikipedia Rabbit Hole Explorer in Python: Absolute Beginners Tutorial

What You Will Build

What Python Needs: The requests Library and JSON

Reading the Wikipedia API response

Writing Your First Fetch Function

How to Build a Wikipedia Rabbit Hole Explorer in Python

Python Learning Summary Points

Frequently Asked Questions