Python System Administration with Ansible

Ansible is one of the go-to automation tools for system administrators, and Python is the engine that makes it all work. From writing playbooks that configure entire server fleets to building custom modules that solve problems unique to your environment, the combination of Python and Ansible gives sysadmins a powerful, agentless framework for managing infrastructure at scale. This article walks through how the two technologies intersect and how you can use Python to extend Ansible far beyond its built-in capabilities.

If you manage servers, deploy applications, or handle configuration across distributed infrastructure, you have likely encountered Ansible. It is agentless, meaning it does not require any software installed on the target machines beyond a Python interpreter and SSH access. That simplicity is a big part of its appeal. But what makes it especially relevant for Python developers is that Ansible's entire execution engine, its modules, its plugins, and its extension points are all built on Python. As of 2026, the latest ansible-core release is version 2.20.3, which requires Python 3.12 or higher on the control node. The broader Ansible community package has reached version 13.4.0, bundling ansible-core with a curated set of community collections. Understanding how Python powers Ansible is the first step toward using it effectively for system administration.

How Ansible Uses Python Under the Hood

When you run an Ansible playbook, the control node (your workstation or CI server where Ansible is installed) reads the YAML playbook, resolves variable references, and determines which tasks to execute on which hosts. For each task, Ansible packages the corresponding module as a self-contained Python script, transfers it to the target node over SSH, and executes it remotely. The module runs, performs its work (installing a package, copying a file, restarting a service), and returns a JSON result back to the control node. Once the result is received, the temporary module script is removed from the target.

This architecture means two things. First, the target node needs Python installed and accessible in its system PATH. Starting with Ansible 10.0, the minimum supported Python version on both control and managed nodes is 3.10, ensuring access to modern language features and security improvements. Second, because every module is fundamentally a Python script, you can inspect, modify, or write your own modules using the same language you already know.

Note

Ansible communicates with remote hosts using the Paramiko library (a Python implementation of SSH2) or the standard OpenSSH client. No agent daemon is needed on target machines. This agentless design reduces maintenance overhead and avoids opening extra ports on your servers.

The Ansible ecosystem includes over 4,000 modules spread across hundreds of collections. These modules cover file management, user administration, package installation, cloud provisioning (AWS, Azure, GCP), network device configuration, container orchestration, and much more. Each one follows the same pattern: accept parameters as a JSON object, perform work on the target, and return a JSON result indicating whether anything changed.

Writing Playbooks for Common Sysadmin Tasks

A playbook is a YAML file that defines an ordered list of tasks to execute against a set of hosts. For system administrators, playbooks replace the ad hoc shell scripts and manual SSH sessions that often pile up into unmaintainable chaos. Because playbooks are declarative, you describe the desired state of your systems rather than writing step-by-step commands. Ansible figures out what needs to change and only makes the changes that are necessary, which is the principle of idempotency.

Here is a practical playbook that handles a common sysadmin workflow: hardening a freshly provisioned Linux server by updating packages, creating a non-root admin user, configuring SSH, and enabling a firewall.

---
- name: Harden a new Linux server
  hosts: webservers
  become: yes

  vars:
    admin_user: deployer
    ssh_port: 2222
    allowed_services:
      - ssh
      - http
      - https

  tasks:
    - name: Update all packages to the latest version
      ansible.builtin.apt:
        upgrade: dist
        update_cache: yes
        cache_valid_time: 3600

    - name: Create the admin user with sudo privileges
      ansible.builtin.user:
        name: "{{ admin_user }}"
        groups: sudo
        shell: /bin/bash
        create_home: yes
        state: present

    - name: Deploy the authorized SSH key for the admin user
      ansible.posix.authorized_key:
        user: "{{ admin_user }}"
        key: "{{ lookup('file', '~/.ssh/id_ed25519.pub') }}"
        state: present

    - name: Disable root SSH login
      ansible.builtin.lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^PermitRootLogin'
        line: 'PermitRootLogin no'
      notify: Restart SSH

    - name: Change the SSH listening port
      ansible.builtin.lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^#?Port '
        line: "Port {{ ssh_port }}"
      notify: Restart SSH

    - name: Enable and start UFW firewall
      community.general.ufw:
        state: enabled
        policy: deny

    - name: Allow required services through the firewall
      community.general.ufw:
        rule: allow
        name: "{{ item }}"
      loop: "{{ allowed_services }}"

  handlers:
    - name: Restart SSH
      ansible.builtin.service:
        name: sshd
        state: restarted

Notice how each task uses a fully qualified collection name (FQCN) like ansible.builtin.apt instead of just apt. This is a best practice that became the standard in recent Ansible versions. It makes playbooks unambiguous about which module is being called, which matters when you have custom modules or multiple collections installed that might share names.

Pro Tip

Use handlers to defer service restarts until the end of a play, rather than restarting after every configuration change. In the example above, if both the root login and port changes are made, SSH only restarts once. This prevents unnecessary downtime and avoids race conditions during configuration updates.

The become: yes directive tells Ansible to escalate privileges using sudo on the remote host. Variables defined in the vars block make the playbook reusable. You can override them from the command line, from an external variables file, or from Ansible Vault for sensitive data like passwords and API keys.

Organizing Playbooks with Roles

As your playbook collection grows, dumping everything into a single YAML file becomes difficult to maintain. Ansible roles provide a standardized directory structure for breaking playbooks into reusable components. A role contains its own tasks, handlers, variables, templates, and files in a predictable layout.

roles/
  webserver/
    tasks/
      main.yml
    handlers/
      main.yml
    templates/
      nginx.conf.j2
    defaults/
      main.yml
    vars/
      main.yml

Each subdirectory has a specific purpose. The tasks directory holds the main task list, handlers defines service-restart triggers, templates stores Jinja2 templates that Ansible renders with variable substitution, defaults provides default variable values that users can override, and vars contains variables with higher precedence that are internal to the role. This structure makes it straightforward to share roles across teams or publish them to Ansible Galaxy for the broader community.

Building Custom Python Modules for Ansible

The built-in modules cover a huge range of tasks, but every infrastructure has its quirks. You may need to interact with an internal API, parse a proprietary log format, or enforce a company-specific compliance check that no existing module handles. That is where custom modules come in. Since Ansible modules are Python scripts, you can write one using familiar tools and patterns.

A custom module follows a specific contract. It imports AnsibleModule from ansible.module_utils.basic, declares an argument specification that defines the parameters it accepts, performs its work, and then calls either module.exit_json() on success or module.fail_json() on failure. The return data must be JSON-serializable.

Here is a custom module that checks disk usage on a target host and fails the play if any filesystem exceeds a configurable threshold. This is the kind of health check that system administrators often need before triggering deployments or maintenance windows.

#!/usr/bin/python

from ansible.module_utils.basic import AnsibleModule
import shutil
import os


DOCUMENTATION = r'''
---
module: disk_usage_check
short_description: Check disk usage against a threshold
description:
    - Checks disk usage on specified mount points.
    - Fails if usage exceeds the defined threshold percentage.
options:
    mount_points:
        description: List of mount points to check.
        required: false
        type: list
        elements: str
        default: ["/"]
    threshold:
        description: Maximum allowed usage percentage (0-100).
        required: false
        type: int
        default: 85
author:
    - PythonCodeCrack
'''


def get_disk_usage(path):
    """Return disk usage statistics for a given path."""
    try:
        total, used, free = shutil.disk_usage(path)
        percent_used = (used / total) * 100
        return {
            "path": path,
            "total_gb": round(total / (1024 ** 3), 2),
            "used_gb": round(used / (1024 ** 3), 2),
            "free_gb": round(free / (1024 ** 3), 2),
            "percent_used": round(percent_used, 1),
        }
    except OSError as e:
        return {"path": path, "error": str(e)}


def main():
    module = AnsibleModule(
        argument_spec=dict(
            mount_points=dict(
                type='list',
                elements='str',
                default=['/']
            ),
            threshold=dict(
                type='int',
                default=85
            ),
        ),
        supports_check_mode=True,
    )

    mount_points = module.params['mount_points']
    threshold = module.params['threshold']

    results = []
    violations = []

    for mount in mount_points:
        if not os.path.exists(mount):
            module.fail_json(
                msg=f"Mount point does not exist: {mount}"
            )

        usage = get_disk_usage(mount)

        if "error" in usage:
            module.fail_json(
                msg=f"Error checking {mount}: {usage['error']}"
            )

        results.append(usage)

        if usage["percent_used"] > threshold:
            violations.append(
                f"{mount} is at {usage['percent_used']}%"
            )

    if violations:
        module.fail_json(
            msg=f"Disk usage exceeds {threshold}%: "
                + ", ".join(violations),
            disk_usage=results,
            changed=False,
        )

    module.exit_json(
        changed=False,
        msg="All mount points within acceptable usage.",
        disk_usage=results,
    )


if __name__ == '__main__':
    main()

To use this module, place it in a library/ directory adjacent to your playbook. Ansible automatically discovers modules in that location with no additional configuration needed.

---
- name: Pre-deployment health check
  hosts: all
  tasks:
    - name: Verify disk usage is within limits
      disk_usage_check:
        mount_points:
          - /
          - /var
          - /home
        threshold: 90
      register: disk_result

    - name: Display disk usage report
      ansible.builtin.debug:
        var: disk_result.disk_usage
Note

Custom modules should be self-contained. They cannot rely on external libraries that are not present on the target node unless you handle installation as a preceding task in your playbook. The standard library modules like shutil and os used above are always available.

The supports_check_mode=True parameter is worth highlighting. It tells Ansible that this module can be invoked in dry-run mode (--check flag) without making changes. For a read-only module like this disk usage checker, check mode works without extra logic. For modules that modify state, you would add conditional checks to skip the actual modification when module.check_mode is True.

Distributing Modules with Collections

If you build modules that your team or organization reuses regularly, packaging them as an Ansible Collection is the recommended distribution method. Collections bundle modules, roles, plugins, and documentation into a single versioned artifact that can be published on Ansible Galaxy or a private Galaxy server. This is the same mechanism that all community and vendor-provided content uses, so your custom modules integrate cleanly with the broader ecosystem.

Dynamic Inventory with Python Scripts

Static inventory files work fine when you have a handful of servers that rarely change. In cloud environments where instances spin up and down constantly, a hardcoded list of hostnames becomes stale within minutes. Dynamic inventory scripts solve this by querying an external source (a cloud provider API, a CMDB, a service registry) at runtime to produce a current list of hosts.

A dynamic inventory script is a Python program that, when called with the --list argument, prints a JSON object describing your hosts and groups. Ansible calls this script automatically before each playbook run.

#!/usr/bin/env python3

"""
Dynamic inventory script that queries an internal
service registry API for active hosts.
"""

import argparse
import json
import urllib.request
import ssl


REGISTRY_URL = "https://registry.internal.example.com/api/hosts"


def fetch_hosts():
    """Query the service registry and return host data."""
    ctx = ssl.create_default_context()
    req = urllib.request.Request(
        REGISTRY_URL,
        headers={"Accept": "application/json"},
    )

    with urllib.request.urlopen(req, context=ctx) as resp:
        data = json.loads(resp.read().decode("utf-8"))

    return data


def build_inventory(hosts):
    """Transform raw host data into Ansible inventory format."""
    inventory = {
        "_meta": {
            "hostvars": {}
        }
    }

    for host in hosts:
        hostname = host["fqdn"]
        role = host.get("role", "ungrouped")
        region = host.get("region", "unknown")

        # Create role-based groups
        if role not in inventory:
            inventory[role] = {"hosts": [], "vars": {}}
        inventory[role]["hosts"].append(hostname)

        # Create region-based groups
        region_group = f"region_{region}"
        if region_group not in inventory:
            inventory[region_group] = {"hosts": [], "vars": {}}
        inventory[region_group]["hosts"].append(hostname)

        # Store per-host variables
        inventory["_meta"]["hostvars"][hostname] = {
            "ansible_host": host.get("ip_address", hostname),
            "ansible_port": host.get("ssh_port", 22),
            "environment": host.get("environment", "production"),
            "os_version": host.get("os_version", "unknown"),
        }

    return inventory


def main():
    parser = argparse.ArgumentParser(
        description="Dynamic Ansible inventory script"
    )
    parser.add_argument(
        "--list",
        action="store_true",
        help="List all hosts and groups",
    )
    parser.add_argument(
        "--host",
        help="Get variables for a specific host",
    )
    args = parser.parse_args()

    if args.list:
        hosts = fetch_hosts()
        inventory = build_inventory(hosts)
        print(json.dumps(inventory, indent=2))
    elif args.host:
        hosts = fetch_hosts()
        inventory = build_inventory(hosts)
        hostvars = inventory["_meta"]["hostvars"]
        host_data = hostvars.get(args.host, {})
        print(json.dumps(host_data, indent=2))
    else:
        parser.print_help()


if __name__ == "__main__":
    main()

To use a dynamic inventory script, make it executable and reference it with the -i flag when running your playbook:

chmod +x inventory_registry.py
ansible-playbook -i inventory_registry.py site.yml

Ansible also supports inventory plugins written in Python, which provide a more structured and configurable alternative to standalone scripts. Plugins like amazon.aws.ec2, azure.azcollection.azure_rm, and google.cloud.gcp_compute use this plugin architecture to pull real-time inventory from cloud providers. For custom infrastructure, you can write your own inventory plugin following the same pattern.

Pro Tip

Cache your dynamic inventory results when working with large environments. Ansible supports inventory caching through the cache plugin system, which prevents hammering your API on every playbook run. Set cache_plugin to jsonfile in your ansible.cfg and configure a timeout that matches how frequently your infrastructure changes.

Ansible Lightspeed and the AI-Assisted Future

Red Hat has been investing in bringing generative AI capabilities directly into the Ansible workflow through Ansible Lightspeed, powered by IBM watsonx Code Assistant. The tool integrates into Visual Studio Code through the Ansible VS Code extension, allowing you to describe a task in plain English and receive a generated playbook snippet in return. Rather than looking up module names and parameter syntax from documentation, you type a natural language description of what you want to accomplish, and Lightspeed generates syntactically correct Ansible code trained on curated Ansible content from Galaxy, GitHub, and Red Hat subject matter experts.

Lightspeed includes two distinct components. The coding assistant works inside VS Code and can generate single tasks, multiple tasks, or entire playbooks from a prompt. The intelligent assistant is embedded directly in the Ansible Automation Platform UI and helps administrators with installation, configuration, troubleshooting, and daily platform operations. As of early 2026, Red Hat has also been working on positioning Ansible as an execution layer for agentic AI systems, where AI agents make decisions and Ansible handles the actual infrastructure changes through ephemeral MCP (Model Context Protocol) server instances that provide role-based access control and short-lived execution contexts.

For Python developers, this is worth paying attention to because the underlying model is specifically trained for Ansible content rather than being a general-purpose code generator. The post-processing pipeline runs Ansible Lint and Ansible Risk Insights against generated suggestions, so the output tends to follow established best practices like using fully qualified collection names and consistent formatting. That said, generated code should always be reviewed before running it against production infrastructure.

Important

Ansible Lightspeed requires an active Ansible Automation Platform subscription from Red Hat and a separate subscription to IBM watsonx Code Assistant. A free trial is available for evaluation. Always validate AI-generated playbooks in a staging environment before applying them to production systems.

Key Takeaways

  1. Python is the foundation of Ansible: Every Ansible module is a Python script that runs on the target node, returns JSON, and cleans up after itself. Understanding this execution model helps you troubleshoot failures and write better automation.
  2. Playbooks replace brittle shell scripts: YAML-based playbooks provide idempotent, declarative configuration management. Use roles and fully qualified collection names to keep large playbook repositories organized and unambiguous.
  3. Custom modules extend Ansible to fit your environment: When the 4,000-plus built-in modules do not cover your use case, writing a custom Python module is straightforward. Import AnsibleModule, define your argument spec, perform your logic, and return JSON.
  4. Dynamic inventory keeps automation current: Python inventory scripts and plugins query live data sources to produce host lists at runtime, eliminating the maintenance burden of static inventory files in dynamic cloud environments.
  5. AI-assisted automation is maturing: Ansible Lightspeed represents a significant shift in how playbooks are authored, but it supplements rather than replaces the need to understand what your automation is doing and why.

The combination of Python and Ansible gives system administrators a workflow that scales from managing a handful of servers to orchestrating thousands of nodes across hybrid cloud environments. Whether you are writing your first playbook, building a custom module for an internal API, or exploring AI-assisted content creation, the underlying principle stays the same: define the desired state, let the tool figure out how to get there, and keep Python in your back pocket for everything the built-in modules cannot handle.

back to articles