Python Genetic Algorithms: Build Evolutionary Solutions from Scratch

Genetic algorithms solve problems the same way nature does -- by evolving populations of candidate solutions over successive generations until the fittest survive. In Python, you can build one from scratch in under a hundred lines of code, or leverage powerful libraries like PyGAD and DEAP to tackle everything from function optimization to training neural networks.

When you face an optimization problem with a massive search space, no clean gradient to follow, and multiple possible solutions, traditional algorithms can struggle. Genetic algorithms (GAs) take a fundamentally different approach. Instead of calculating a precise path to an answer, they simulate the mechanics of biological evolution -- creating populations of candidate solutions, testing their fitness, breeding the strongest, and introducing random mutations to explore new territory. This article walks through the theory, builds a complete implementation in pure Python, and then shows how production-grade libraries handle the same work.

What Is a Genetic Algorithm

A genetic algorithm is a search heuristic inspired by Charles Darwin's theory of natural selection. It belongs to a broader family of evolutionary algorithms that also includes genetic programming, evolution strategies, and differential evolution. The central idea is straightforward: if you can represent a solution as a data structure (called a chromosome), measure how good it is (using a fitness function), and define operations that combine and alter solutions (crossover and mutation), then you can evolve increasingly better solutions over generations.

The process follows a predictable cycle. First, a population of random candidate solutions is generated. Each individual in the population is evaluated by the fitness function, which assigns a numerical score reflecting solution quality. Higher-fitness individuals are selected as parents, and those parents are recombined through crossover to produce offspring that carry traits from both. Random mutations are applied to maintain diversity and prevent the population from converging too early on a suboptimal solution. The offspring replace part or all of the old population, and the cycle repeats.

Note

Genetic algorithms are metaheuristics -- they do not guarantee finding the globally optimal solution every time. Instead, they find good-enough solutions efficiently for problems where exhaustive search is impractical. They are especially useful for nonlinear, multimodal, and combinatorial optimization tasks.

GAs have been applied to a wide range of problems: scheduling, routing, feature selection in machine learning, circuit design, game AI, portfolio optimization, and hyperparameter tuning. Their strength lies in requiring no domain-specific mathematical structure like gradients or convexity. You only need a way to encode a solution and a way to evaluate it.

Core Components Explained

Before writing any code, it helps to understand the five building blocks that every genetic algorithm shares. Each component has multiple implementation strategies, and choosing the right combination for your problem is where the real engineering happens.

Chromosomes and Encoding

A chromosome is a data structure that represents a candidate solution to the problem. The encoding scheme determines how solutions are translated into a form the algorithm can manipulate. Common encodings include binary strings (where each gene is a 0 or 1), integer arrays, floating-point vectors, and permutation sequences. The choice depends on your problem. Binary encoding works well for combinatorial problems like knapsack or feature selection. Real-valued encoding suits continuous optimization. Permutation encoding is natural for ordering problems like the traveling salesman.

Fitness Function

The fitness function is the heart of a genetic algorithm. It takes a chromosome as input and returns a numerical value indicating how well that solution solves the problem. Everything else in the algorithm -- selection, crossover, mutation -- exists only to produce chromosomes that score higher on this function. A well-designed fitness function guides the search toward good solutions. A poorly designed one can cause the algorithm to stall or converge on meaningless results.

Selection

Selection determines which individuals become parents for the next generation. The goal is to give fitter individuals a higher probability of reproducing while still allowing weaker individuals a chance, preserving diversity. Common selection methods include:

Roulette Wheel Selection: Each individual's probability of being selected is proportional to its fitness. Think of a roulette wheel where each slice corresponds to an individual and the slice size matches their fitness score.
Tournament Selection: A small random subset of the population is chosen, and the fittest individual from that subset becomes a parent. The tournament size controls selection pressure -- larger tournaments favor stronger individuals more aggressively.
Rank-Based Selection: Individuals are sorted by fitness and selection probability is based on rank rather than raw fitness value. This prevents a single dominant individual from monopolizing reproduction.

Crossover (Recombination)

Crossover combines genetic material from two parents to produce one or more children. It is the primary exploration mechanism, allowing the algorithm to combine building blocks from different good solutions. The main crossover types are:

Single-Point Crossover: A random point is chosen along the chromosome. Genes before the point come from one parent and genes after come from the other. This produces two children that are mirror complements of each other.
Two-Point Crossover: Two random points are selected, and the segment between them is swapped between parents. This allows more mixing of genetic material than single-point crossover.
Uniform Crossover: Each gene is independently chosen from one parent or the other with equal probability. This provides the highest degree of recombination but can disrupt beneficial gene combinations.

Mutation

Mutation introduces small random changes to individual chromosomes after crossover. Its purpose is to maintain genetic diversity and prevent premature convergence to local optima. Without mutation, the population can lose alleles (gene values) permanently, eliminating potentially valuable genetic material. Common mutation strategies include bit-flip mutation for binary encoding, random resetting for integer encoding, and Gaussian perturbation for real-valued encoding. The mutation rate is typically kept low -- between 1% and 10% per gene -- so that mutation explores without destroying the solutions that crossover has built.

Pro Tip

The balance between crossover and mutation is critical. Strong crossover with weak mutation exploits known good solutions. Strong mutation with weak crossover explores more broadly. Start with a crossover rate around 80-90% and a mutation rate around 1-5%, then adjust based on whether your population is converging too quickly or not converging at all.

Building a Genetic Algorithm from Scratch

The best way to understand genetic algorithms is to build one. The following implementation solves a classic optimization problem: finding the input values that maximize a mathematical function. The entire algorithm uses only Python's standard library and NumPy.

The target function for this example is f(x) = x * sin(10 * pi * x) + 1, evaluated over the interval [-1, 2]. This function has many peaks and valleys, making it a good test case because gradient-based methods would easily get stuck in local optima.

import numpy as np
import random

# Define the fitness function
def fitness(x):
    return x * np.sin(10 * np.pi * x) + 1

# Initialize a random population within the search bounds
def initialize_population(pop_size, x_min, x_max):
    return [random.uniform(x_min, x_max) for _ in range(pop_size)]

# Tournament selection: pick the best from a random subset
def tournament_selection(population, fitnesses, tournament_size=3):
    selected = []
    for _ in range(len(population)):
        competitors = random.sample(
            list(zip(population, fitnesses)), tournament_size
        )
        winner = max(competitors, key=lambda c: c[1])
        selected.append(winner[0])
    return selected

# Blend crossover for real-valued chromosomes
def crossover(parent1, parent2, crossover_rate=0.85):
    if random.random() < crossover_rate:
        alpha = random.uniform(-0.5, 1.5)
        child1 = parent1 + alpha * (parent2 - parent1)
        child2 = parent2 + alpha * (parent1 - parent2)
        return child1, child2
    return parent1, parent2

# Gaussian mutation adds a small random perturbation
def mutate(individual, mutation_rate, x_min, x_max, sigma=0.1):
    if random.random() < mutation_rate:
        individual += random.gauss(0, sigma)
        individual = max(x_min, min(x_max, individual))
    return individual

The functions above handle each stage of the algorithm independently. tournament_selection picks parents by running small competitions. crossover uses blend crossover (BLX-alpha), which creates children that can fall anywhere along -- and slightly beyond -- the line between two parents. mutate adds Gaussian noise and clamps the result to stay within bounds.

Now, the main loop ties everything together:

# Main genetic algorithm loop
def genetic_algorithm(
    pop_size=60,
    generations=100,
    x_min=-1.0,
    x_max=2.0,
    crossover_rate=0.85,
    mutation_rate=0.05,
    elitism_count=2,
):
    population = initialize_population(pop_size, x_min, x_max)
    best_solution = None
    best_fitness = float("-inf")

    for gen in range(generations):
        # Evaluate fitness for every individual
        fitnesses = [fitness(ind) for ind in population]

        # Track the overall best
        gen_best_idx = np.argmax(fitnesses)
        if fitnesses[gen_best_idx] > best_fitness:
            best_fitness = fitnesses[gen_best_idx]
            best_solution = population[gen_best_idx]

        # Elitism: carry top individuals directly to next generation
        sorted_pop = sorted(
            zip(population, fitnesses), key=lambda x: x[1], reverse=True
        )
        elites = [ind for ind, fit in sorted_pop[:elitism_count]]

        # Selection
        parents = tournament_selection(population, fitnesses)

        # Crossover and mutation to build the next generation
        next_generation = list(elites)
        while len(next_generation) < pop_size:
            p1, p2 = random.sample(parents, 2)
            c1, c2 = crossover(p1, p2, crossover_rate)
            c1 = mutate(c1, mutation_rate, x_min, x_max)
            c2 = mutate(c2, mutation_rate, x_min, x_max)
            next_generation.extend([c1, c2])

        population = next_generation[:pop_size]

        if gen % 20 == 0:
            print(
                f"Generation {gen}: best fitness = {best_fitness:.4f}, "
                f"x = {best_solution:.4f}"
            )

    return best_solution, best_fitness

# Run it
solution, score = genetic_algorithm()
print(f"\nFinal result: x = {solution:.6f}, f(x) = {score:.6f}")

Running this code typically produces a result very close to the true maximum of the function within 100 generations. The elitism mechanism guarantees that the best solutions found so far are never lost, even if crossover or mutation produce weaker offspring in a given generation.

Note

This implementation handles a single real-valued variable for clarity. For multi-variable optimization, each individual becomes a list or array of values, and crossover and mutation operate on each dimension independently.

Using PyGAD for Real-World Optimization

Building from scratch is educational, but production work calls for a battle-tested library. PyGAD is one of the more popular Python libraries for genetic algorithms, currently at version 3.5. It provides built-in support for multiple crossover types (single-point, two-point, uniform, scattered), multiple mutation types (random, swap, inversion, scramble, adaptive), and several parent selection strategies (steady-state, roulette wheel, stochastic universal, rank, random, tournament). It also integrates with Keras and PyTorch for training neural networks using evolutionary optimization.

Here is a practical example: finding the weights that, when multiplied with a set of input values and summed, produce a target output.

import pygad
import numpy as np

# Problem: find weights w such that
# sum(w * inputs) is as close as possible to the target
inputs = [4, -2, 3.5, 5, -11, -4.7]
target_output = 44

def fitness_func(ga_instance, solution, solution_idx):
    output = np.sum(solution * inputs)
    # Fitness is inversely proportional to the error
    fitness = 1.0 / (np.abs(output - target_output) + 1e-6)
    return fitness

# Configure the genetic algorithm
ga_instance = pygad.GA(
    num_generations=200,
    num_parents_mating=5,
    fitness_func=fitness_func,
    sol_per_pop=20,
    num_genes=len(inputs),
    init_range_low=-3,
    init_range_high=3,
    parent_selection_type="tournament",
    K_tournament=3,
    crossover_type="two_points",
    mutation_type="random",
    mutation_percent_genes=15,
    keep_elitism=2,
)

# Run the optimization
ga_instance.run()

# Retrieve the best solution
solution, solution_fitness, _ = ga_instance.best_solution()
print(f"Best weights: {solution}")
print(f"Predicted output: {np.sum(solution * inputs):.4f}")
print(f"Target output: {target_output}")
print(f"Fitness: {solution_fitness:.4f}")

PyGAD handles all the internal mechanics -- population initialization, parent selection, crossover, mutation, and generational replacement. You only need to define the fitness function and set the hyperparameters. The library also supports callbacks at various stages of the algorithm, letting you log progress, implement custom stopping criteria, or dynamically adjust parameters during the run.

Pro Tip

Install PyGAD with pip install pygad. The library has no heavy dependencies beyond NumPy, making it lightweight and easy to integrate into existing projects. For multi-objective optimization, PyGAD includes an NSGA-II implementation starting in recent versions.

Advanced Techniques and Tuning

A basic genetic algorithm will solve simple problems, but real-world optimization often requires more sophisticated strategies. Here are the techniques that separate a working GA from an effective one.

Elitism

Elitism ensures that the best individuals from the current generation are copied directly into the next generation without modification. This prevents the algorithm from accidentally losing its best-known solution through crossover or mutation. A typical elitism count is 1 to 5 individuals, depending on population size. Without elitism, it is possible for the best fitness score to decrease from one generation to the next, which slows convergence.

Adaptive Mutation

Fixed mutation rates can be problematic. A high mutation rate in early generations is useful for exploration, but the same rate in later generations can destabilize converging solutions. Adaptive mutation adjusts the mutation probability based on an individual's fitness. High-fitness individuals receive lower mutation rates to preserve their quality, while low-fitness individuals receive higher rates to push them toward better regions of the search space. PyGAD supports this through the mutation_type="adaptive" parameter, where you supply a tuple of two mutation rates: one for high-fitness solutions and one for low-fitness solutions.

# Adaptive mutation in PyGAD
ga_instance = pygad.GA(
    num_generations=300,
    num_parents_mating=10,
    fitness_func=fitness_func,
    sol_per_pop=50,
    num_genes=6,
    mutation_type="adaptive",
    mutation_percent_genes=[15, 5],  # [low-fitness rate, high-fitness rate]
    parent_selection_type="rank",
    crossover_type="uniform",
    keep_elitism=3,
)

Constraint Handling with Penalty Functions

Many real-world problems have constraints that candidate solutions must satisfy. Since genetic algorithms are unconstrained by nature, the standard approach is to incorporate constraints into the fitness function using penalty terms. If a solution violates a constraint, its fitness is reduced by a penalty proportional to the degree of violation. The penalty must be large enough to steer the population away from infeasible regions but not so large that the algorithm cannot explore near constraint boundaries where optimal solutions often lie.

# Example: maximize f(x,y) = x + y
# subject to: x^2 + y^2 <= 25  (solutions must be inside a circle)

def constrained_fitness(ga_instance, solution, solution_idx):
    x, y = solution
    objective = x + y

    # Penalty for violating the constraint
    constraint_violation = x**2 + y**2 - 25
    if constraint_violation > 0:
        penalty = 100 * constraint_violation
        return objective - penalty

    return objective

Multi-Objective Optimization

When you need to optimize multiple competing objectives simultaneously -- for example, maximizing performance while minimizing cost -- single-objective approaches fall short. Multi-objective genetic algorithms like NSGA-II (Non-dominated Sorting Genetic Algorithm II) maintain a set of Pareto-optimal solutions, where no single solution is better than another in all objectives. The pymoo library provides a comprehensive implementation of NSGA-II, NSGA-III, and other multi-objective algorithms in Python, along with visualization tools for analyzing Pareto fronts.

Python Libraries for Genetic Algorithms

The Python ecosystem offers several mature libraries for evolutionary computing, each with different strengths.

PyGAD (version 3.5) is a general-purpose library that balances simplicity with flexibility. It supports single-objective and multi-objective optimization, integrates with Keras and PyTorch for training neural networks, and provides built-in visualization through its visualize module. Its three-step workflow -- define a fitness function, create a pygad.GA instance, and call run() -- makes it accessible for quick prototyping.

DEAP (Distributed Evolutionary Algorithms in Python) is a more research-oriented framework that provides maximum flexibility at the cost of a steeper learning curve. It supports genetic algorithms, genetic programming, evolution strategies, and particle swarm optimization. DEAP uses a toolbox pattern where you register individual functions for each operator, giving you complete control over the algorithm's behavior. It is well-suited for academic research and custom algorithm design.

pymoo (version 0.6) specializes in multi-objective optimization. It implements NSGA-II, NSGA-III, MOEA/D, and many other state-of-the-art algorithms. The library includes built-in support for constraint handling, various variable types (binary, discrete, permutation, mixed), and extensive visualization tools including scatter plots, parallel coordinate plots, and heatmaps for analyzing multi-dimensional Pareto fronts.

geneticalgorithm2 is a straightforward library for classic GA optimization. It provides an easy interface for defining variable types and bounds, supports both real and discrete variables, and includes customization through middleware callbacks. It is a good choice when you need a working GA quickly and do not require neural network integration or multi-objective capabilities.

Key Takeaways

Genetic algorithms mimic natural selection to search large, complex solution spaces where gradient-based methods are impractical or impossible. They require only a fitness function and a chromosome encoding -- no derivatives, no convexity assumptions.
The five core operators work together: encoding defines how solutions are represented, fitness evaluation measures quality, selection chooses parents, crossover combines them, and mutation introduces diversity. Tuning the balance between these operators is key to performance.
Elitism and adaptive mutation are essential for production-quality implementations. Elitism prevents losing the best solutions, while adaptive mutation balances exploration and exploitation across generations.
Python has mature libraries for every level of complexity. Use PyGAD for straightforward optimization with optional ML integration, DEAP for research-grade customization, and pymoo when you need multi-objective optimization with Pareto analysis.
Constraint handling through penalty functions extends genetic algorithms to real-world problems with feasibility requirements. The design of the penalty function directly affects convergence behavior and solution quality.

Genetic algorithms remain one of the most versatile tools in the optimization toolkit. They will not replace specialized algorithms where those algorithms apply, but they fill a critical gap for problems that resist analytical solutions. Whether you are optimizing hyperparameters for a machine learning model, solving a complex scheduling problem, or exploring a design space with conflicting objectives, a well-tuned genetic algorithm in Python gives you a powerful and flexible starting point.