Game Theory Applications with Python

Game theory provides a mathematical framework for analyzing strategic decisions where the outcome for each participant depends on the choices made by everyone involved. Python offers powerful libraries that make it possible to model these interactions, compute Nash equilibria, simulate tournaments, and explore evolutionary dynamics -- all without writing solvers from scratch. And as AI systems increasingly act as autonomous agents in multi-player environments, the questions game theory asks have never been more urgently practical.

Whether you are modeling competitive pricing strategies, analyzing cybersecurity attack-defense scenarios, or studying evolutionary biology, game theory gives you a rigorous way to reason about conflict and cooperation. Python's ecosystem for game theory has matured significantly, with Nashpy (currently version 0.0.43, requiring Python 3.10 or higher, authored by Vince Knight and James Campbell) for two-player equilibrium computation, Axelrod (version 4.14.0) for iterated Prisoner's Dilemma research, and Gambit for multi-player game solving. This article walks through practical implementations of each, building from foundational concepts to advanced simulations -- and ends with a look at the frontier where game theory meets large language model agents.

Game Theory Fundamentals for Python Developers

At its core, game theory studies situations where multiple decision-makers (called players) choose from available strategies, and the resulting payoff for each player depends on the combination of strategies chosen by all players. These interactions are typically represented using a payoff matrix, where rows correspond to one player's strategies and columns correspond to the other player's strategies.

A critical question game theory immediately forces you to ask is: what counts as "rational" here? Classical game theory assumes players are fully rational, know the game structure completely, and aim to maximize their own payoff. These assumptions are analytically tractable -- but they are also contested. Behavioral economics has documented systematic deviations from this ideal in human subjects. That tension matters when you choose how to frame the payoffs in your own models.

Consider the classic Prisoner's Dilemma. Two suspects are arrested and interrogated separately. Each can either cooperate with the other (stay silent) or defect (betray the other). The payoff matrix encodes the consequences of every possible outcome:

# Prisoner's Dilemma payoff matrix
# Rows = Player 1 strategies, Columns = Player 2 strategies
# Format: (Player 1 payoff, Player 2 payoff)

#                   Player 2
#               Cooperate   Defect
# Player 1
# Cooperate      (3, 3)     (0, 5)
# Defect         (5, 0)     (1, 1)

import numpy as np

# Payoff matrix for Player 1
player1_payoffs = np.array([
    [3, 0],   # Cooperate: (vs Cooperate, vs Defect)
    [5, 1]    # Defect:    (vs Cooperate, vs Defect)
])

# Payoff matrix for Player 2
player2_payoffs = np.array([
    [3, 5],   # vs Cooperate: (Cooperate, Defect)
    [0, 1]    # vs Defect:    (Cooperate, Defect)
])

The key concepts you need before writing any game theory code are dominant strategies (a strategy that yields a better payoff regardless of what the opponent does), Nash equilibrium (a set of strategies where no player benefits by changing their strategy unilaterally), and mixed strategies (probability distributions over available strategies rather than a single deterministic choice).

Before you reach for a solver, there is a question the textbooks often skip: are the payoffs you assigned actually accurate? The numbers in a payoff matrix encode assumptions about value -- financial cost, reputational damage, time lost, opportunity forgone. Getting those numbers wrong produces equilibria that are mathematically precise and practically meaningless. Calibrating payoffs from real data, domain expertise, or historical outcomes is as important as choosing the right algorithm to solve the game.

Note

In the Prisoner's Dilemma, Defect is a dominant strategy for both players -- each player does better by defecting regardless of what the other chooses. The Nash equilibrium is (Defect, Defect) with payoffs (1, 1), even though mutual cooperation (3, 3) would leave both players better off. This tension between individual rationality and collective benefit is what makes the Prisoner's Dilemma so widely studied across economics, biology, political science, and computer science.

Computing Nash Equilibria with Nashpy

Nashpy is a lightweight Python library focused on two-player game analysis. It supports several algorithms for finding Nash equilibria, including support enumeration, vertex enumeration, and the Lemke-Howson algorithm. Install it with pip install nashpy. The library also supports fictitious play, a learning dynamic where players iteratively best-respond to the historical frequency of their opponent's strategies -- useful for studying whether players can discover equilibria through experience rather than prior knowledge.

Modeling Rock Paper Scissors

Rock Paper Scissors is a zero-sum game, meaning one player's gain is exactly the other player's loss. The payoff matrix assigns +1 for a win, -1 for a loss, and 0 for a tie:

import nashpy as nash
import numpy as np

# Rock Paper Scissors payoff matrix for the row player
# Rows: Rock, Paper, Scissors
# Columns: Rock, Paper, Scissors
A = np.array([
    [ 0, -1,  1],   # Rock:     ties Rock, loses to Paper, beats Scissors
    [ 1,  0, -1],   # Paper:    beats Rock, ties Paper, loses to Scissors
    [-1,  1,  0]    # Scissors: loses to Rock, beats Paper, ties Scissors
])

# Create the game (zero-sum: column player payoffs = -A)
rps = nash.Game(A)

# Find Nash equilibria using support enumeration
equilibria = list(rps.support_enumeration())
for eq in equilibria:
    print(f"Player 1 strategy: {eq[0]}")
    print(f"Player 2 strategy: {eq[1]}")
    print()

# Explore fictitious play: do players converge to the equilibrium
# through repeated experience without knowing it in advance?
iterations = 200
np.random.seed(42)
play_counts = list(rps.fictitious_play(iterations=iterations))
final_row, final_col = play_counts[-1]
total = sum(final_row)
print("Fictitious play final row frequencies (should approach 1/3 each):")
for strategy, count in zip(["Rock", "Paper", "Scissors"], final_row):
    print(f"  {strategy}: {count / total:.3f}")

The support enumeration output reveals a single mixed-strategy Nash equilibrium where both players randomize uniformly -- each strategy is played with probability 1/3. This matches the intuition that any predictable pattern in Rock Paper Scissors can be exploited by an opponent. The fictitious play simulation shows how players can converge toward this equilibrium through repeated experience, even without prior knowledge of the equilibrium itself.

Non-Zero-Sum Games: The Battle of the Sexes

Many real-world interactions are not zero-sum. In the Battle of the Sexes game, two players want to coordinate on the same activity but have different preferences. Player 1 prefers Option A, Player 2 prefers Option B, but both prefer coordination over miscoordination:

import nashpy as nash
import numpy as np

# Battle of the Sexes
# Player 1 prefers Option A, Player 2 prefers Option B
A = np.array([
    [3, 0],   # Player 1 payoffs
    [0, 2]
])

B = np.array([
    [2, 0],   # Player 2 payoffs
    [0, 3]
])

game = nash.Game(A, B)

# Find all Nash equilibria
equilibria = list(game.support_enumeration())
for i, eq in enumerate(equilibria):
    row_strategy, col_strategy = eq
    payoffs = game[row_strategy, col_strategy]
    print(f"Equilibrium {i + 1}:")
    print(f"  Player 1 strategy: {row_strategy}")
    print(f"  Player 2 strategy: {col_strategy}")
    print(f"  Expected payoffs: Player 1 = {payoffs[0]:.2f}, "
          f"Player 2 = {payoffs[1]:.2f}")
    print()

This game has three Nash equilibria: two pure-strategy equilibria (both choose A, or both choose B) and one mixed-strategy equilibrium where each player randomizes. The mixed equilibrium yields lower expected payoffs than either pure equilibrium, illustrating why coordination mechanisms like communication or conventions matter in practice. In organizations, this is the formal argument for why standards, contracts, and pre-agreed protocols create value even when participants have partially conflicting preferences.

Pro Tip

The support_enumeration() method works well for small games but can be slow for large strategy spaces. For games with many strategies, consider vertex_enumeration() or lemke_howson_enumeration(). If you need to handle more than two players, look into the Gambit library, which supports n-player games and provides both a Python API and command-line tools. Nashpy's GitHub repository also links to a companion game theory textbook by Vince Knight that explains the mathematical underpinning of each algorithm in detail.

Building a Payoff Matrix Solver from Scratch

Understanding how Nash equilibrium computation works under the hood makes you a better practitioner. For a two-player, two-strategy game, you can find mixed-strategy equilibria by solving a system of equations. The idea is that at a mixed equilibrium, each player must be indifferent between their strategies -- otherwise they would switch entirely to the better one. This indifference condition is the engine behind the math.

import numpy as np

def find_mixed_equilibrium_2x2(A, B):
    """
    Find the mixed-strategy Nash equilibrium for a 2x2 game.

    Parameters:
        A: 2x2 numpy array of Player 1 payoffs
        B: 2x2 numpy array of Player 2 payoffs

    Returns:
        (p, q) where p is Player 1's probability of playing
        row 0, and q is Player 2's probability of playing column 0.
        Returns None if no fully mixed equilibrium exists.
    """
    # Player 2 must make Player 1 indifferent between rows:
    # q*A[0,0] + (1-q)*A[0,1] = q*A[1,0] + (1-q)*A[1,1]
    denom_q = (A[0, 0] - A[0, 1] - A[1, 0] + A[1, 1])
    if denom_q == 0:
        return None
    q = (A[1, 1] - A[0, 1]) / denom_q

    # Player 1 must make Player 2 indifferent between columns
    denom_p = (B[0, 0] - B[1, 0] - B[0, 1] + B[1, 1])
    if denom_p == 0:
        return None
    p = (B[1, 1] - B[1, 0]) / denom_p

    if 0 <= p <= 1 and 0 <= q <= 1:
        return (p, q)
    return None


# Test with the Prisoner's Dilemma
A = np.array([[3, 0], [5, 1]])
B = np.array([[3, 5], [0, 1]])

result = find_mixed_equilibrium_2x2(A, B)
if result:
    p, q = result
    print(f"Mixed equilibrium: P1 cooperates with prob {p:.3f}, "
          f"P2 cooperates with prob {q:.3f}")
else:
    print("No interior mixed equilibrium found.")
    print("This game has a dominant strategy equilibrium.")

# Test with Matching Pennies (unique mixed equilibrium)
A_mp = np.array([[1, -1], [-1, 1]])
B_mp = np.array([[-1, 1], [1, -1]])

result_mp = find_mixed_equilibrium_2x2(A_mp, B_mp)
if result_mp:
    p, q = result_mp
    print(f"\nMatching Pennies equilibrium:")
    print(f"  P1 plays Heads with prob {p:.3f}")
    print(f"  P2 plays Heads with prob {q:.3f}")

The Prisoner's Dilemma returns no mixed equilibrium because it has a dominant-strategy equilibrium -- both players always defect. Matching Pennies has a unique mixed equilibrium at (0.5, 0.5), where both players randomize equally between Heads and Tails.

Notice what happens if one player deviates slightly from their equilibrium mix. In Matching Pennies, if Player 1 plays Heads with probability 0.6 instead of 0.5, Player 2 now has a strictly better response -- always play Tails. The equilibrium is not a mountain peak that rewards deviation; it is a knife's edge. This fragility matters whenever you use equilibrium analysis to inform real decisions, because real players rarely land exactly on a mixed equilibrium and stay there.

Extending to Larger Games with Linear Programming

For games larger than 2x2, mixed-strategy equilibria in zero-sum games can be computed using linear programming. The scipy.optimize.linprog function handles this efficiently and is the same mathematical structure behind randomized patrolling schedules and inspection protocols:

from scipy.optimize import linprog
import numpy as np

def solve_zero_sum_game(payoff_matrix):
    """
    Solve a zero-sum game using linear programming.
    Returns the optimal mixed strategy for the row player
    and the value of the game.
    """
    m, n = payoff_matrix.shape

    # Shift matrix to handle negative values
    shift = abs(payoff_matrix.min()) + 1
    M = payoff_matrix + shift

    c = np.zeros(m + 1)
    c[-1] = -1  # Minimize -v

    A_ub = np.zeros((n, m + 1))
    for j in range(n):
        A_ub[j, :m] = -M[:, j]
        A_ub[j, -1] = 1
    b_ub = np.zeros(n)

    A_eq = np.zeros((1, m + 1))
    A_eq[0, :m] = 1
    b_eq = np.array([1.0])

    bounds = [(0, None)] * m + [(None, None)]

    result = linprog(c, A_ub=A_ub, b_ub=b_ub,
                     A_eq=A_eq, b_eq=b_eq, bounds=bounds)

    strategy = result.x[:m]
    value = result.x[-1] - shift
    return strategy, value


# Solve Rock Paper Scissors
rps_matrix = np.array([
    [ 0, -1,  1],
    [ 1,  0, -1],
    [-1,  1,  0]
])

strategy, value = solve_zero_sum_game(rps_matrix)
print(f"Optimal strategy: {np.round(strategy, 4)}")
print(f"Game value: {value:.4f}")
# Expected: strategy ≈ [0.333, 0.333, 0.333], value ≈ 0.0

The solver confirms the intuitive result: play each option with equal probability (1/3, 1/3, 1/3), and the game value is 0 (fair, no advantage to either player). The LP formulation generalizes to any zero-sum game of any size -- it is the same structure used in some real-world randomized security patrol schedules.

Simulating the Iterated Prisoner's Dilemma with Axelrod

The Axelrod library brings Robert Axelrod's famous iterated Prisoner's Dilemma tournaments to Python. The original 1980 tournament invited researchers to submit strategies that would play the Prisoner's Dilemma repeatedly against every other strategy. Only 14 strategies entered. The winner was Tit For Tat. Axelrod identified four properties of that winning strategy: it was never first to defect (nice), it retaliated immediately (retaliatory), it forgave after retaliation (forgiving), and it was easy for an opponent to read (clear). The modern library ships with over 200 strategies. Install it with pip install axelrod.

Running a Tournament

import axelrod as axl
import numpy as np

# Select a variety of well-known strategies
players = [
    axl.TitForTat(),
    axl.Cooperator(),
    axl.Defector(),
    axl.Random(),
    axl.TitFor2Tats(),
    axl.GrimTrigger(),
    axl.WinStayLoseShift(),
    axl.Grudger(),
]

# Create and run the tournament with a fixed seed for reproducibility
tournament = axl.Tournament(players, turns=200, repetitions=5, seed=42)
results = tournament.play()

# Display rankings with normalised average scores
print("Tournament Rankings:")
print("-" * 50)
for i, name in enumerate(results.ranked_names):
    avg_score = np.mean(
        results.normalised_scores[results.ranking[i]]
    )
    print(f"  {i + 1}. {name:35s}  avg: {avg_score:.3f}")

Tit For Tat and similar reciprocal strategies tend to finish near the top. Pure cooperators get exploited by defectors, while pure defectors miss out on the mutual cooperation bonus that reciprocal strategies earn when playing each other. Win Stay Lose Shift (also known as Pavlov) is particularly interesting: it cooperates after mutual cooperation or mutual defection, and defects after unilateral outcomes. It recovers from accidental defections more gracefully than Tit For Tat -- which can lock into mutual defection after a single noisy move.

Research Note

A 2024 study published in PLOS Computational Biology (Glynatsi, Knight, and Harper, DOI: 10.1371/journal.pcbi.1012644) reexamined Axelrod's original results using over 195 strategies in thousands of tournaments under diverse conditions. Strategies considered dominant in Axelrod's controlled scenarios frequently failed against a wider variety of opponents. The key finding: winning strategies are not only nice and reciprocal, but also clever, slightly envious, and highly adaptable to their surrounding strategic environment. Adaptability proved more decisive than any single behavioral principle. The analyses relied entirely on the Axelrod-Python library, demonstrating the library's role as the de facto standard for reproducible iterated Prisoner's Dilemma research.

Creating a Custom Strategy

Defining your own strategy in Axelrod is straightforward. Here is a strategy called "Cautious Reciprocator" that starts by cooperating, tracks the opponent's cooperation rate, and defects if that rate drops below a threshold:

import axelrod as axl

class CautiousReciprocator(axl.Player):
    """
    Cooperates as long as the opponent's cooperation rate
    stays above a configurable threshold.
    """
    name = "Cautious Reciprocator"
    classifier = {
        "memory_depth": float("inf"),
        "stochastic": False,
        "long_run_time": False,
        "inspects_source": False,
        "manipulates_source": False,
        "manipulates_state": False,
    }

    def __init__(self, threshold=0.5):
        super().__init__()
        self.threshold = threshold

    def strategy(self, opponent):
        if not self.history:
            return axl.Action.C

        coop_rate = opponent.history.cooperations / len(opponent.history)
        return axl.Action.C if coop_rate >= self.threshold else axl.Action.D


# Test the custom strategy
player1 = CautiousReciprocator(threshold=0.6)
player2 = axl.Random(p=0.7)  # Cooperates 70% of the time

match = axl.Match([player1, player2], turns=20, seed=7)
interactions = match.play()

print("Match history:")
for round_num, (a1, a2) in enumerate(interactions, 1):
    print(f"  Round {round_num:2d}: "
          f"CautiousReciprocator={a1}, Random={a2}")

print(f"\nFinal scores: {match.final_score()}")

A natural follow-up question: what threshold value optimizes performance across a mixed population? You can explore this systematically by running the custom strategy against the full library of opponents at different threshold values. Setting threshold=0.5 makes the strategy more forgiving; setting it above 0.7 makes it stricter. Neither is universally better -- the optimal threshold depends on the composition of the population you expect to encounter. This is the kind of question evolutionary simulation can answer that single-match analysis cannot.

Evolutionary Game Theory and Replicator Dynamics

Evolutionary game theory extends classical game theory by modeling how strategy populations change over time. Instead of assuming rational players, it models populations of agents using different strategies. Strategies that earn higher payoffs spread faster; lower-performing strategies decline. The replicator equation formalizes this dynamic -- and notably, it does not assume anyone is consciously optimizing. It only requires that more successful strategies spread faster, however that spreading occurs: through imitation, learning, or literal reproduction.

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

def replicator_dynamics(payoff_matrix, x0, timesteps=1000, dt=0.01):
    """
    Simulate replicator dynamics for a symmetric game.

    Parameters:
        payoff_matrix: n x n numpy array of payoffs
        x0: initial population shares (must sum to 1)
        timesteps: number of simulation steps
        dt: time step size

    Returns:
        history: array of shape (timesteps, n)
    """
    n = len(x0)
    x = np.array(x0, dtype=float)
    history = np.zeros((timesteps, n))

    for t in range(timesteps):
        history[t] = x
        fitness = payoff_matrix @ x
        avg_fitness = x @ fitness
        # Replicator equation: dx_i/dt = x_i * (f_i - f_avg)
        dx = x * (fitness - avg_fitness) * dt
        x = np.maximum(x + dx, 0)
        x = x / x.sum()

    return history


# Hawk-Dove game
# V = value of resource, C = cost of fighting
V, C = 4, 6

hawk_dove_matrix = np.array([
    [(V - C) / 2, V],     # Hawk vs [Hawk, Dove]
    [0, V / 2]            # Dove vs [Hawk, Dove]
])

# Start with 80% Hawks, 20% Doves
x0 = [0.8, 0.2]
history = replicator_dynamics(hawk_dove_matrix, x0,
                              timesteps=2000, dt=0.01)

# ESS for Hawk-Dove: proportion of Hawks = V/C
print(f"Expected ESS Hawk proportion (V/C): {V/C:.4f}")
print(f"Final simulated Hawk proportion:    {history[-1, 0]:.4f}")
print(f"Final simulated Dove proportion:    {history[-1, 1]:.4f}")

# Plot
time = np.arange(len(history)) * 0.01
plt.figure(figsize=(10, 6))
plt.plot(time, history[:, 0], label="Hawk", linewidth=2)
plt.plot(time, history[:, 1], label="Dove", linewidth=2)
plt.axhline(y=V/C, color="gray", linestyle="--", alpha=0.5,
            label=f"ESS = {V/C:.2f}")
plt.xlabel("Time")
plt.ylabel("Population Share")
plt.title("Replicator Dynamics: Hawk-Dove Game")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("hawk_dove_dynamics.png", dpi=150)
print("Plot saved as hawk_dove_dynamics.png")

In the Hawk-Dove game with V=4 and C=6, the population converges to an evolutionarily stable strategy (ESS) where 2/3 of the population plays Hawk and 1/3 plays Dove. This matches the mixed Nash equilibrium of the one-shot game, demonstrating the connection between classical and evolutionary game theory.

It is worth varying V and C systematically. When V exceeds C, the ESS is pure Hawk -- aggression takes over entirely. When C dwarfs V, the ESS is a stable mixed population. This formalizes why escalation dynamics differ between domains: a dispute over a small, replaceable resource behaves differently from one over a scarce, high-value asset. Try running the simulation with V=8, C=6 and V=4, C=10 to see the qualitative shift.

Moran Process with Axelrod

The Axelrod library supports the Moran process, a stochastic model of evolution in finite populations. At each step, one player is chosen for reproduction (proportional to fitness) and replaces a randomly chosen player:

import axelrod as axl

# Population with three strategy types
players = [
    axl.TitForTat(), axl.TitForTat(),
    axl.Cooperator(), axl.Cooperator(),
    axl.Defector(), axl.Defector(),
]

# Run the Moran process with a fixed seed for reproducibility
mp = axl.MoranProcess(players, turns=100, seed=13)
populations = mp.play()

print("Moran Process Evolution:")
for i, pop in enumerate(populations):
    print(f"  Generation {i}: {dict(pop)}")

print(f"\nWinning strategy: {mp.winning_strategy_name}")

Because the Moran process is stochastic, results vary between runs. Over many repetitions, Tit For Tat tends to dominate -- it earns high mutual-cooperation payoffs against itself and cooperators while defending against defectors. But this is not guaranteed: in small populations, random drift can occasionally let Defectors win even when initially outnumbered. The seed parameter makes your results reproducible and is essential for scientific reporting.

Real-World Applications

Game theory is not just an academic exercise. Here are areas where Python game theory implementations provide practical value -- along with deeper considerations than standard treatments typically cover.

Cybersecurity: Nash Games and Stackelberg Games

Security teams can model the interaction between an attacker choosing which vulnerability to exploit and a defender allocating limited resources. The payoff matrix reflects the cost of a successful breach versus the cost of defense:

import nashpy as nash
import numpy as np

# Cybersecurity attack-defense game
# Attacker chooses: Web App, Network, Social Engineering
# Defender allocates budget to: Web App, Network, Training

# Defender payoffs (negative = cost of breach, 0 = defended)
defender_payoffs = np.array([
    [ 0, -8, -5],
    [-7,  0, -6],
    [-9, -4,  0]
])

# Attacker payoffs (positive = successful breach)
attacker_payoffs = np.array([
    [ 0,  8,  5],
    [ 7,  0,  6],
    [ 9,  4,  0]
])

game = nash.Game(attacker_payoffs, defender_payoffs)
equilibria = list(game.support_enumeration())

print("Attack-Defense Nash Equilibria:")
strategies = ["Web App", "Network", "Social Eng"]
for eq in equilibria:
    print(f"\n  Attacker mixed strategy:")
    for s, p in zip(strategies, eq[0]):
        if p > 0.001:
            print(f"    {s}: {p:.1%}")
    print(f"  Defender mixed strategy:")
    for s, p in zip(strategies, eq[1]):
        if p > 0.001:
            print(f"    {s}: {p:.1%}")

The equilibrium tells both players their optimal randomization. For the defender, this translates directly into a resource allocation strategy -- rather than concentrating all budget in one area, the equilibrium provides a principled way to spread defenses according to threat severity.

But this simultaneous-move framing has an important limitation: it assumes attacker and defender act without knowledge of each other's current choice. In many real security scenarios, the defender commits first -- announcing a patching schedule, publishing a security policy, or deploying observable defenses. The attacker then responds to what is visible. This sequential structure is better captured by a Stackelberg game, where the defender is the leader and the attacker is a follower who best-responds to the defender's committed strategy.

Research published in the Journal of Cybersecurity (Huang et al., 2024, DOI: 10.1093/cybsec/tyae009) examined Stackelberg interdependent security games and found that sequential structure changes strategic incentives significantly -- in some network configurations, defenders overinvest relative to socially optimal levels because the presence of attackers can turn coordination among defenders into competition.

from scipy.optimize import minimize
import numpy as np

def stackelberg_security(attacker_payoffs, defender_payoffs, n_strategies):
    """
    Solve a Stackelberg security game where the defender
    commits first and the attacker best-responds.
    Uses backward induction via scipy minimize.
    """
    def defender_loss(defender_mix):
        # Attacker best-responds to the defender's committed mix
        attacker_expected = attacker_payoffs @ defender_mix
        best_attack_idx = np.argmax(attacker_expected)
        # Minimize the defender's worst-case loss
        return -defender_payoffs[best_attack_idx] @ defender_mix

    constraints = [{"type": "eq", "fun": lambda x: np.sum(x) - 1}]
    bounds = [(0, 1)] * n_strategies
    x0 = np.ones(n_strategies) / n_strategies

    result = minimize(defender_loss, x0, method="SLSQP",
                      bounds=bounds, constraints=constraints)
    return result.x, -result.fun


att = np.array([[0, 8, 5], [7, 0, 6], [9, 4, 0]], dtype=float)
def_ = np.array([[0, -8, -5], [-7, 0, -6], [-9, -4, 0]], dtype=float)

stackelberg_mix, value = stackelberg_security(att, def_, n_strategies=3)
strategies = ["Web App", "Network", "Social Eng"]

print("Stackelberg defender strategy (commits first):")
for s, p in zip(strategies, stackelberg_mix):
    print(f"  {s}: {p:.1%}")
print(f"Worst-case defender expected value: {value:.2f}")

Compare the Stackelberg result with the Nash equilibrium above. The Stackelberg defender typically concentrates more heavily on high-threat attack vectors, since committing to a strategy allows the defender to anticipate the rational attacker's best response. This framing is more realistic for defenders who publish security policies or make observable infrastructure choices before attackers decide where to strike.

Pricing Strategy

Two competing firms choosing between high and low pricing can be modeled as a game. The payoffs represent market share and profit under each scenario:

import nashpy as nash
import numpy as np

# Pricing game: High Price, Medium Price, Low Price

firm1 = np.array([
    [12,  8,  2],
    [14, 10,  5],
    [ 9,  7,  4]
])

firm2 = np.array([
    [12, 14,  9],
    [ 8, 10,  7],
    [ 2,  5,  4]
])

game = nash.Game(firm1, firm2)
equilibria = list(game.support_enumeration())

prices = ["High", "Medium", "Low"]
for i, eq in enumerate(equilibria):
    payoffs = game[eq[0], eq[1]]
    print(f"Equilibrium {i + 1}:")
    print(f"  Firm 1: {dict(zip(prices, np.round(eq[0], 3)))}")
    print(f"  Firm 2: {dict(zip(prices, np.round(eq[1], 3)))}")
    print(f"  Expected profits: Firm 1 = ${payoffs[0]:.1f}M, "
          f"Firm 2 = ${payoffs[1]:.1f}M")
    print()

One question this model leaves unasked: what happens when the pricing game repeats indefinitely? In a one-shot game, both firms have incentive to undercut. But in a repeated game with no fixed endpoint, the folk theorem tells us that cooperation on high prices can be sustained as an equilibrium -- provided firms care enough about future payoffs and retaliation is credible. The same payoff matrix can produce radically different outcomes depending on whether you model the interaction as one-shot or repeated. Real antitrust investigations often turn on exactly this distinction.

Auction Design

Game theory underpins auction design. A sealed-bid auction simulation shows how bidders should shade their bids below their true valuations in a first-price auction:

import numpy as np

def simulate_first_price_auction(n_bidders, n_simulations=10000, seed=42):
    """
    First-price sealed-bid auction.
    Bidder valuations drawn uniformly from [0, 1].
    Optimal bid: (n-1)/n * valuation (Bayes-Nash equilibrium strategy).
    """
    rng = np.random.default_rng(seed)
    results = {"naive": [], "optimal": []}

    for _ in range(n_simulations):
        valuations = rng.uniform(0, 1, n_bidders)

        # Naive: bid true value (zero profit for winner)
        naive_bids = valuations.copy()
        naive_winner = np.argmax(naive_bids)
        results["naive"].append(
            valuations[naive_winner] - naive_bids[naive_winner]
        )

        # Optimal: shade bids by factor (n-1)/n
        shade_factor = (n_bidders - 1) / n_bidders
        optimal_bids = valuations * shade_factor
        optimal_winner = np.argmax(optimal_bids)
        results["optimal"].append(
            valuations[optimal_winner] - optimal_bids[optimal_winner]
        )

    print(f"First-Price Auction ({n_bidders} bidders, {n_simulations} sims)")
    print(f"  Naive bidding - Avg winner profit:   "
          f"${np.mean(results['naive']):.4f}")
    print(f"  Optimal bidding - Avg winner profit: "
          f"${np.mean(results['optimal']):.4f}")

simulate_first_price_auction(n_bidders=3)
print()
simulate_first_price_auction(n_bidders=10)

With naive bidding, the winner always earns zero profit. The game-theoretic optimal strategy -- shading your bid by (n-1)/n -- produces positive expected profit. As the number of bidders increases, the shade factor approaches 1 and profits shrink, reflecting increased competition.

Note that the optimal shading formula assumes private values drawn uniformly and complete bidder rationality. Real auctions involve correlated values, budget constraints, risk aversion, and the winner's curse (overpaying because winning implies your estimate was the highest, not the most accurate). Revenue equivalence -- the theorem that all standard auction formats produce the same expected revenue under ideal conditions -- is a beautiful theoretical result, but its assumptions fail often enough that auction designers routinely run heterogeneous, boundedly rational simulations before committing to a format.

Caution

Game theory models are simplifications of reality. Real markets involve repeated interactions, incomplete information, regulatory constraints, and behavioral biases that pure game theory does not capture. Use these models as analytical starting points and complement them with domain expertise and empirical data. The goal of computation is not to replace judgment -- it is to make the assumptions behind a judgment explicit and testable.

When Nash Equilibrium Is Not Enough

Nash equilibrium is a powerful concept, but it has well-documented failure modes that practitioners regularly encounter.

Multiple equilibria. Games frequently have more than one Nash equilibrium, as the Battle of the Sexes illustrates. When that happens, the theory provides no mechanism for selecting among them. In practice, selection happens through communication, social norms, or focal points -- outcomes that stand out for cultural or contextual reasons. Python models that output multiple equilibria are not broken; they are surfacing a genuine ambiguity that requires non-game-theoretic judgment to resolve.

Computational complexity. Computing Nash equilibria is PPAD-complete for general games, a complexity class that sits between P and NP. For small games, the algorithms in Nashpy and Gambit are fast. For larger games -- multi-player, large strategy spaces, or games embedded in reinforcement learning loops -- exact computation becomes intractable and approximation methods are required.

Equilibrium as prediction versus prescription. Nash equilibrium tells you where a game could settle if players are rational and have correct beliefs. It does not tell you how players get there, or how long it takes. Evolutionary dynamics and fictitious play explore convergence, but many games have non-convergent or chaotic learning dynamics. If you are using game theory to advise a decision-maker, the difference between "this is an equilibrium" and "your opponent will actually play this" is critical.

Incomplete information and Bayesian games. The examples in this article assume players know the payoff structure. Many real interactions involve private information -- a bidder's true valuation, a firm's actual cost structure, an attacker's real capability. Bayesian Nash equilibria handle this, but they require specifying players' beliefs about each other's private information, which is often unavailable. Assuming complete information when information is actually incomplete can make a model worse than no model at all.

When to use Stackelberg instead. Whenever one player commits to a strategy before the other acts -- and that commitment is observable -- the Stackelberg formulation is more appropriate than Nash. Common real-world examples: a regulator publishing compliance requirements before firms decide whether to comply; a defender deploying observable security controls before an attacker selects a vector; a dominant firm announcing prices before smaller rivals respond.

Game Theory and AI Agents

One of the most active research frontiers in 2025 and 2026 is the intersection of game theory with large language model (LLM) agents. As LLMs increasingly act as autonomous decision-makers in multi-agent systems, the strategic questions that game theory studies have become directly relevant to AI safety, coordination, and system design.

Research presented at the 2025 International Conference on Computational Linguistics introduced Alympics, a simulation framework in which LLM agents play game-theoretic scenarios against each other, including multi-round resource auctions. The framework provides a controlled, scalable, and reproducible platform for studying strategic behavior that is difficult to observe in purely human experiments. (Source: Mao et al., COLING 2025, "ALYMPICS: LLM Agents Meet Game Theory")

A complementary paper from December 2025 extended the FAIRGAME framework to evaluate LLM behavior in repeated social dilemmas, including a payoff-scaled Prisoner's Dilemma and a multi-agent Public Goods Game. The results revealed consistent behavioral signatures across models and languages: sensitivity to the magnitude of payoffs, cross-linguistic divergence in cooperation rates, and a tendency toward defection as games approached their endpoints -- a pattern called "end-game defection" in the human experimental literature as well. (Source: Huynh et al., arXiv:2512.07462, 2025)

Understanding LLM strategic behaviour has profound implications for safety, coordination, and AI-driven infrastructure design. Huynh et al., arXiv:2512.07462 (December 2025)

What does this mean for Python developers building multi-agent systems? If you are building a pipeline where multiple LLM agents negotiate, allocate resources, or compete for attention and compute, the equilibrium analysis and evolutionary simulations in this article give you tools to reason about what stable behaviors might emerge -- and which ones are fragile.

It also raises a question the field is actively working on: do LLM agents converge to Nash equilibria when placed in strategic environments, or do they converge to something else, shaped more by their training distribution than by rational calculation? Early evidence points to "something else" -- which means Nash equilibrium-based predictions about multi-agent LLM systems should be treated with extra skepticism until validated empirically. A published survey of this intersection (IJCAI-25, "Game Theory Meets Large Language Models") identified systematic deviations from classical rationality and called for richer theoretical models specifically designed for LLM-based strategic actors.

This is genuinely uncharted territory. The Python tools in this article are already being used to probe it. If you are building agents that interact with each other or with humans in strategic settings, game-theoretic simulation is not optional scaffolding -- it is how you verify that the system does what you think it does.

Key Takeaways

  1. Nashpy (v0.0.43, Python 3.10+) handles two-player equilibrium computation cleanly. It supports support enumeration, vertex enumeration, Lemke-Howson, and fictitious play. For multi-player games, Gambit offers broader coverage.
  2. The Axelrod library (v4.14.0) is the de facto standard for iterated Prisoner's Dilemma research. With over 200 strategies, full tournament infrastructure, and Moran process support, it enables reproducible research. A 2024 PLOS Computational Biology study confirmed that adaptability to context is more decisive than any single behavioral principle.
  3. Building solvers from scratch deepens understanding. Implementing the indifference condition for 2x2 games and LP-based solvers for zero-sum games connects mathematical theory directly to inspectable code.
  4. Evolutionary game theory bridges strategy and population dynamics. The replicator equation converges to evolutionarily stable strategies. Systematically varying V and C in Hawk-Dove reveals qualitative phase transitions that pure equilibrium analysis obscures.
  5. Nash equilibrium has important limits. Multiple equilibria, PPAD-completeness, prediction-versus-prescription confusion, and incomplete information are common failure modes. Know when Stackelberg, Bayesian, or evolutionary framing is more appropriate.
  6. Game theory increasingly applies to AI agent design. Attack-defense games (Nash and Stackelberg), pricing strategy, auction design, and multi-agent LLM systems are all domains where game-theoretic Python code produces actionable insight -- and the LLM agent frontier is expanding rapidly.

Game theory gives you a structured way to reason about strategic interactions, and Python makes it accessible enough to move from theory to working simulations in an afternoon. Start with a simple 2x2 game in Nashpy, run a Prisoner's Dilemma tournament in Axelrod, then extend to your own domain-specific models. The libraries are actively maintained and the mathematical foundations are well-established. The part that requires the most care -- and that tutorials almost universally skip -- is framing the right question as a game, assigning payoffs that reflect reality, and knowing when the equilibrium answer is actually the one you should act on.

back to articles