Decision Intelligence with Python: Moving Beyond Prediction to Causation and Action

Prediction is not decision-making. You can build the most accurate machine learning model on the planet, and it will still tell you nothing about what to do. A model that predicts customer churn at 94% accuracy doesn't tell you whether offering a discount will actually reduce that churn, or whether the discount will cannibalize revenue from customers who were never going to leave.

This is the gap that decision intelligence exists to fill. It's an emerging discipline that combines data science, causal inference, behavioral science, and decision theory into a unified framework for turning information into better actions. And Python — with its ecosystem of causal inference libraries, probabilistic programming tools, and decision analysis frameworks — has become the primary language for implementing it.

This article goes beyond the buzzword. We'll walk through the foundations of decision intelligence, the Python libraries that power it, and real code you can run to start building decision systems that understand why things happen, not just what correlates with what.

What Decision Intelligence Actually Is

The term "decision intelligence" was formalized as a discipline at Google by Cassie Kozyrkov, who served as the company's first Chief Decision Scientist from 2018 until her departure in late 2023. In a 2023 interview on the DataCamp podcast, Kozyrkov defined decision intelligence as the discipline of turning information into better action in any setting at any scale. In a July 2018 feature in Fast Company titled "Why Google defined a new discipline to help humans make decisions," the article described how Kozyrkov had already trained 17,000 Googlers to make better decisions by augmenting data science with psychology, neuroscience, economics, and managerial science. Over her nearly decade-long tenure at Google (2014–2023), she ultimately trained over 20,000 employees.

The Key Distinction

Traditional data science begins with data and asks "what patterns can I find?" Decision intelligence begins with a decision and asks "what information do I need to make this choice well?"

As Kozyrkov explained on Episode 128 of the Google Cloud Platform Podcast in May 2018, decision intelligence engineering is about augmenting data science with the behavioral and managerial sciences. She emphasized that in order for data to drive the decision, the decision context has to be framed upfront. The decision-maker has to understand what it takes to choose one action over another, and that process — involving incentive design, cost-benefit analysis, and risk assessment — is not typically taught in data science programs.

This distinction matters in practice. An organization might hire a team of data scientists, invest heavily in infrastructure, and still fail to make better decisions because nobody framed what "better" means. The model answers a question, but nobody asked the right question in the first place.

The Three Pillars of Decision Intelligence in Python

Decision intelligence in Python rests on three technical pillars, each supported by a mature library ecosystem:

  1. Decision analysis — structuring choices under uncertainty using expected value, utility theory, and sensitivity analysis
  2. Causal inference — understanding cause-and-effect relationships so interventions have predictable outcomes
  3. Probabilistic modeling — encoding domain knowledge and uncertainty in graphical models

Pillar 1: Decision Analysis — Structuring Choices Under Uncertainty

Before reaching for machine learning, the first question in decision intelligence is whether you've structured the decision correctly. Many poor outcomes stem not from bad models but from bad framing.

Expected value calculations are the simplest and arguably the most important tool in the decision intelligence toolkit. Here's a concrete example: a product manager must decide whether to invest $50,000 in a new feature.

import numpy as np

# Define the decision: invest in Feature X or not
investment_cost = 50_000

# Scenario probabilities (estimated from market research + domain expertise)
scenarios = {
    "high_adoption":   {"probability": 0.25, "revenue": 200_000},
    "medium_adoption": {"probability": 0.45, "revenue": 80_000},
    "low_adoption":    {"probability": 0.20, "revenue": 20_000},
    "failure":         {"probability": 0.10, "revenue": 0},
}

# Expected value of investing
ev_invest = sum(
    s["probability"] * (s["revenue"] - investment_cost)
    for s in scenarios.values()
)

# Expected value of not investing (status quo)
ev_no_invest = 0

print(f"Expected value of investing: ${ev_invest:,.0f}")
print(f"Expected value of not investing: ${ev_no_invest:,.0f}")
print(f"Decision: {'Invest' if ev_invest > ev_no_invest else 'Do not invest'}")

Output:

Expected value of investing: $40,000
Expected value of not investing: $0
Decision: Invest

But raw expected value hides something critical: risk. A decision-maker who can't afford to lose $50,000 might rationally choose differently from one who can. This is where sensitivity analysis comes in — systematically varying your assumptions to see how fragile the decision is:

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Sensitivity analysis: how does the decision change
# as high-adoption probability varies?
high_probs = np.linspace(0.05, 0.50, 50)
expected_values = []

for hp in high_probs:
    # Redistribute remaining probability proportionally
    remaining = 1.0 - hp
    original_remaining = 0.75  # sum of non-high probabilities
    scale = remaining / original_remaining

    ev = (
        hp * (200_000 - investment_cost)
        + 0.45 * scale * (80_000 - investment_cost)
        + 0.20 * scale * (20_000 - investment_cost)
        + 0.10 * scale * (0 - investment_cost)
    )
    expected_values.append(ev)

plt.figure(figsize=(10, 6))
plt.plot(high_probs, expected_values, linewidth=2)
plt.axhline(y=0, color='r', linestyle='--', label='Break-even')
plt.xlabel('Probability of High Adoption', fontsize=12)
plt.ylabel('Expected Value ($)', fontsize=12)
plt.title('Sensitivity Analysis: Investment Decision', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('sensitivity_analysis.png', dpi=150)
print("Sensitivity analysis chart saved.")

This plot reveals the break-even point — the minimum probability of high adoption needed for the investment to have a positive expected value. If your estimate of high adoption probability is anywhere near that threshold, the decision is fragile, and you should invest in better information before investing in the feature.

Pillar 2: Causal Inference — The Engine of Decision Intelligence

This is where decision intelligence diverges sharply from standard data science. Judea Pearl, the Turing Award-winning computer scientist who authored Causality: Models, Reasoning and Inference (Cambridge University Press, 2000; 2nd ed. 2009) and the popular follow-up The Book of Why (2018, co-authored with Dana Mackenzie), developed what he calls the "Ladder of Causation" — three levels of increasingly powerful reasoning:

  1. Association (seeing): What is the data telling me? "Customers who see the ad buy more."
  2. Intervention (doing): What happens if I act? "If I show the ad, will they buy more?"
  3. Counterfactual (imagining): What would have happened? "Would this customer have bought anyway, without the ad?"
The Correlation Trap

Standard machine learning operates entirely at level 1 — it finds correlations. Decision intelligence demands level 2 and level 3 reasoning. In "The Seven Tools of Causal Inference, with Reflections on Machine Learning" (Communications of the ACM, Vol. 62, No. 3, March 2019), Pearl argued that causal reasoning was essential for AI to move beyond curve-fitting to genuine understanding.

DoWhy: Causal Inference in Four Steps

DoWhy, originally released by Microsoft Research and now maintained as part of the PyWhy open-source ecosystem, has become the standard tool for causal inference in Python. Its documentation describes DoWhy as a library that aims to spark causal thinking and analysis by providing a unified interface for causal inference methods that automatically tests many assumptions, making inference accessible to non-experts.

DoWhy enforces a disciplined four-step process: model, identify, estimate, refute. Here's a complete working example. Suppose we want to know whether a customer loyalty program actually increases spending:

from dowhy import CausalModel
import pandas as pd
import numpy as np

# Simulate data where we KNOW the true causal effect
np.random.seed(42)
n = 5000

# Confounders: customer income and purchase history affect
# both enrollment and spending
income = np.random.normal(50000, 15000, n)
purchase_history = np.random.poisson(10, n)

# Treatment: loyalty program enrollment
# (influenced by income and purchase history)
propensity = 1 / (1 + np.exp(-(
    -2 + 0.00003 * income + 0.05 * purchase_history
)))
enrolled = np.random.binomial(1, propensity)

# Outcome: monthly spending
# True causal effect of enrollment = $25
true_effect = 25
spending = (
    50
    + 0.001 * income
    + 2 * purchase_history
    + true_effect * enrolled
    + np.random.normal(0, 20, n)
)

data = pd.DataFrame({
    "income": income,
    "purchase_history": purchase_history,
    "enrolled": enrolled,
    "spending": spending,
})

# Step 1: MODEL -- encode causal assumptions as a graph
model = CausalModel(
    data=data,
    treatment="enrolled",
    outcome="spending",
    common_causes=["income", "purchase_history"],
)

# Step 2: IDENTIFY -- determine if the causal effect is estimable
identified_estimand = model.identify_effect()
print("Identified estimand:")
print(identified_estimand)

# Step 3: ESTIMATE -- compute the causal effect
estimate = model.estimate_effect(
    identified_estimand,
    method_name="backdoor.linear_regression",
)
print(f"\nEstimated causal effect: ${estimate.value:.2f}")
print(f"True causal effect: ${true_effect}")

# Step 4: REFUTE -- stress-test the estimate
refutation = model.refute_estimate(
    identified_estimand,
    estimate,
    method_name="random_common_cause",
)
print(f"\nRefutation test (random common cause):")
print(refutation)
Why Refutation Matters

Step 4 is what separates causal inference from regression with extra steps. DoWhy doesn't just give you an estimate and walk away — it introduces fake confounders, runs placebo tests, and checks whether your result survives challenges to your assumptions. Without causal reasoning, a naive analysis would overestimate the loyalty program's effect because higher-income customers (who already spend more) are more likely to enroll.

CausalNex: Bayesian Networks for "What-If" Analysis

QuantumBlack (part of McKinsey & Company) released CausalNex as an open-source library specifically designed for causal reasoning with Bayesian Networks. Their GitHub documentation describes it as aiming to become one of the leading libraries for causal reasoning and "what-if" analysis using Bayesian Networks.

CausalNex is particularly useful when you need to learn causal structure directly from data, augment it with domain knowledge, and then simulate interventions:

from causalnex.structure.notears import from_pandas
from causalnex.network import BayesianNetwork
from causalnex.discretiser import Discretiser
import pandas as pd
import numpy as np

# Generate synthetic business data
np.random.seed(42)
n = 2000

marketing_spend = np.random.uniform(1000, 10000, n)
product_quality = np.random.uniform(1, 10, n)
customer_satisfaction = (
    0.3 * (marketing_spend / 1000)
    + 0.5 * product_quality
    + np.random.normal(0, 1, n)
)
revenue = (
    5 * customer_satisfaction
    + 0.02 * marketing_spend
    + np.random.normal(0, 10, n)
)

data = pd.DataFrame({
    "marketing_spend": marketing_spend,
    "product_quality": product_quality,
    "customer_satisfaction": customer_satisfaction,
    "revenue": revenue,
})

# Learn causal structure from data using NOTEARS algorithm
structure = from_pandas(data, max_iter=100)

# Examine discovered edges
print("Discovered causal edges:")
for edge in structure.edges:
    print(f"  {edge[0]} -> {edge[1]}")

The NOTEARS algorithm (Non-combinatorial Optimization via Trace Exponential and Augmented lagRangian for Structure learning) discovers causal graph structure by reformulating the combinatorial problem of searching over directed acyclic graphs into a continuous optimization problem. What you get back is a learned graph that you can then inspect, modify with domain knowledge, and use for intervention simulation.

A Word on Assumptions

Causal inference from observational data is only as valid as its assumptions. As Pearl emphasized, you cannot extract causal conclusions from data alone — the causal model (the graph) encodes assumptions that are not testable from the data itself. DoWhy's refutation step stress-tests sensitivity to those assumptions, but it cannot prove they are correct. Structure learning algorithms like NOTEARS discover statistical relationships that are consistent with causal structure, not guaranteed to be causal. Domain expertise remains essential for validating any learned or specified graph.

Pillar 3: Probabilistic Modeling with Bayesian Networks

Bayesian Networks encode the joint probability distribution over a set of variables as a directed acyclic graph. They're the natural data structure for decision intelligence because they represent how variables influence each other, not just how they correlate.

The pgmpy library, first presented at the 14th Python in Science Conference (SciPy 2015) by Ankur Ankan and Abinash Panda, has matured into a comprehensive toolkit. Its 2024 paper in the Journal of Machine Learning Research (Vol. 25, No. 265), co-authored by Ankan and Johannes Textor, describes it as providing a collection of algorithms and tools to work with Bayesian Networks and related models, including structure learning, parameter estimation, approximate and exact inference, causal inference, and simulations.

Here's a practical example: building a decision support system for IT incident response:

from pgmpy.models import DiscreteBayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

# Build a network for diagnosing service outages
model = DiscreteBayesianNetwork([
    ("Server_Load", "Response_Time"),
    ("DB_Health", "Response_Time"),
    ("Response_Time", "User_Complaints"),
    ("Network_Issue", "Response_Time"),
    ("Response_Time", "Revenue_Impact"),
])

# Define conditional probability distributions
cpd_server = TabularCPD(
    variable="Server_Load",
    variable_card=2,  # 0=Normal, 1=High
    values=[[0.7], [0.3]],
)

cpd_db = TabularCPD(
    variable="DB_Health",
    variable_card=2,  # 0=Healthy, 1=Degraded
    values=[[0.85], [0.15]],
)

cpd_network = TabularCPD(
    variable="Network_Issue",
    variable_card=2,  # 0=None, 1=Present
    values=[[0.9], [0.1]],
)

# Response time depends on all three upstream causes
cpd_response = TabularCPD(
    variable="Response_Time",
    variable_card=2,  # 0=Fast, 1=Slow
    values=[
        # Server: Normal              | High
        # DB:   Healthy  Degraded     | Healthy   Degraded
        # Net:  None Pres None Pres   | None Pres None  Pres
        [0.95, 0.40, 0.50, 0.15,  0.30, 0.10, 0.10, 0.02],
        [0.05, 0.60, 0.50, 0.85,  0.70, 0.90, 0.90, 0.98],
    ],
    evidence=["Server_Load", "DB_Health", "Network_Issue"],
    evidence_card=[2, 2, 2],
)

cpd_complaints = TabularCPD(
    variable="User_Complaints",
    variable_card=2,  # 0=Low, 1=High
    values=[
        [0.95, 0.20],
        [0.05, 0.80],
    ],
    evidence=["Response_Time"],
    evidence_card=[2],
)

cpd_revenue = TabularCPD(
    variable="Revenue_Impact",
    variable_card=2,  # 0=Minimal, 1=Significant
    values=[
        [0.98, 0.35],
        [0.02, 0.65],
    ],
    evidence=["Response_Time"],
    evidence_card=[2],
)

model.add_cpds(
    cpd_server, cpd_db, cpd_network,
    cpd_response, cpd_complaints, cpd_revenue
)
assert model.check_model()

# Now use inference to make decisions
inference = VariableElimination(model)

# Scenario: We're seeing high user complaints.
# What's the most likely root cause?
for cause in ["Server_Load", "DB_Health", "Network_Issue"]:
    result = inference.query(
        variables=[cause],
        evidence={"User_Complaints": 1},  # 1 = High
    )
    print(f"\nP({cause} | High Complaints):")
    print(result)

This is the kind of reasoning that pure ML can't do. Given an observation (high complaints), we reason backward through the causal structure to identify probable root causes, then reason forward from potential interventions to predict their downstream effects. The network encodes domain expertise from your engineering team, not just patterns in historical data.

Putting It Together: A Decision Intelligence Workflow

Here's how these pieces combine into a complete decision intelligence workflow using Monte Carlo simulation:

import numpy as np

def decision_intelligence_workflow(
    options: dict,
    num_simulations: int = 10_000,
) -> dict:
    """
    Monte Carlo simulation for comparing decision options
    under uncertainty.

    Each option has:
      - cost: fixed cost
      - revenue_dist: (mean, std) for revenue outcome
      - success_probability: chance of positive outcome
    """
    results = {}

    for name, params in options.items():
        cost = params["cost"]
        rev_mean, rev_std = params["revenue_dist"]
        p_success = params["success_probability"]

        # Simulate outcomes
        successes = np.random.binomial(1, p_success, num_simulations)
        revenues = np.where(
            successes,
            np.random.normal(rev_mean, rev_std, num_simulations),
            np.random.normal(rev_mean * 0.1, rev_std * 0.5, num_simulations),
        )
        net_outcomes = revenues - cost

        results[name] = {
            "expected_value": np.mean(net_outcomes),
            "std_dev": np.std(net_outcomes),
            "p_loss": np.mean(net_outcomes < 0),
            "var_5pct": np.percentile(net_outcomes, 5),
            "best_case_95pct": np.percentile(net_outcomes, 95),
        }

    return results


# Compare three strategic options
options = {
    "Expand to new market": {
        "cost": 500_000,
        "revenue_dist": (800_000, 200_000),
        "success_probability": 0.6,
    },
    "Improve existing product": {
        "cost": 150_000,
        "revenue_dist": (300_000, 80_000),
        "success_probability": 0.8,
    },
    "Strategic partnership": {
        "cost": 75_000,
        "revenue_dist": (200_000, 100_000),
        "success_probability": 0.7,
    },
}

np.random.seed(42)
results = decision_intelligence_workflow(options)

for option, metrics in results.items():
    print(f"\n{'=' * 50}")
    print(f"Option: {option}")
    print(f"  Expected Net Value:  ${metrics['expected_value']:>12,.0f}")
    print(f"  Risk (Std Dev):      ${metrics['std_dev']:>12,.0f}")
    print(f"  Probability of Loss: {metrics['p_loss']:>11.1%}")
    print(f"  Worst Case (5th %):  ${metrics['var_5pct']:>12,.0f}")
    print(f"  Best Case (95th %):  ${metrics['best_case_95pct']:>12,.0f}")

This simulation doesn't just tell you which option has the highest expected return. It tells you the probability of losing money, the worst-case scenario, and the variance of each outcome. A risk-averse decision-maker might choose a lower-expected-value option with a much lower probability of catastrophic loss. That's a rational decision that no accuracy metric can capture.

The Python Decision Intelligence Ecosystem

Here's a summary of the key libraries powering decision intelligence work in Python today:

Causal Inference: DoWhy (PyWhy, originally Microsoft Research) provides the four-step causal inference pipeline. EconML (Microsoft ALICE) specializes in heterogeneous treatment effects using ML methods. CausalML (Uber) focuses on uplift modeling and individual treatment effects.

Causal Discovery: gCastle (Huawei Noah's Ark Lab) implements algorithms for learning causal structure from data. CausalNex (QuantumBlack/McKinsey) combines Bayesian Networks with structure learning and intervention simulation. causal-learn (CMU's Center for Causal Discovery) provides a unified Python implementation of causal discovery algorithms.

Probabilistic Modeling: pgmpy provides comprehensive Bayesian Network tooling for structure learning, parameter estimation, and inference. PyMC offers general-purpose Bayesian modeling and probabilistic programming.

Decision Analysis and Simulation: NumPy and SciPy handle Monte Carlo simulation and statistical analysis. Matplotlib and Plotly support visualization of decision landscapes and sensitivity analyses.

Why This Matters Now

The gap between what machine learning can predict and what organizations need to decide is widening. As models grow more powerful, the temptation to skip the hard work of causal reasoning and decision framing grows with them. A language model can summarize your data beautifully and still lead you to the wrong action if the underlying causal assumptions are wrong.

"Decision intelligence is the discipline of turning information into better actions at any scale." — Cassie Kozyrkov, former Chief Decision Scientist at Google and founder of Decision Intelligence

In Kozyrkov's view, the failure of many organizations to use data effectively stems from a lack of decision skills rather than data skills. Python's decision intelligence ecosystem gives practitioners the tools to close that gap. Since Kozyrkov's departure from Google in late 2023, the field she helped formalize has only grown — with the PyWhy ecosystem, pgmpy, and CausalNex all seeing active development. But the tools are only half the story. The other half is the discipline of asking the right question before writing the first line of code: What decision are we trying to make? What would change our minds? What are the causal mechanisms we believe are at play? And what would we need to see to know we were wrong?

  1. Start with the decision, not the data: Frame what "better" means before building any model. Decision intelligence begins with the choice you need to make, not the patterns you can find.
  2. Move beyond correlation to causation: Use DoWhy, CausalNex, and pgmpy to model cause-and-effect relationships. The Ladder of Causation — association, intervention, counterfactual — defines the levels of reasoning your system needs.
  3. Quantify risk, not just expected value: Monte Carlo simulation reveals the probability of loss, worst-case scenarios, and variance. A rational decision-maker needs the full distribution of outcomes, not a single number.
  4. Always refute your estimates: DoWhy's four-step process — model, identify, estimate, refute — builds integrity into causal analysis. If your estimate doesn't survive stress-testing, you shouldn't act on it.

That is what separates a data scientist from a decision scientist. And Python gives you the code for both.

back to articles