Decision Analysis and Simulation in Python: NumPy, SciPy, Matplotlib, and Plotly

Every significant decision carries uncertainty. How much revenue will the new product line generate? What is the probability that this infrastructure project runs 40% over budget? Will the supply chain survive a six-month disruption in a critical region? Executives who answer these questions with single-point estimates — "we expect $4.2 million in year one" — are not making decisions. They are making wishes. The entire field of decision analysis exists because the future is a distribution, not a number.

Monte Carlo simulation is the engine that turns uncertainty into actionable intelligence. The core idea, invented during the Manhattan Project by Stanislaw Ulam and John von Neumann in the late 1940s, is deceptively simple: if you cannot solve a complex problem analytically, run it thousands of times with randomly sampled inputs and let the distribution of outputs tell you what you need to know. Pair that with sensitivity analysis — the systematic study of which inputs matter and which ones you can safely ignore — and you have a framework that scales from personal financial planning to multi-billion-dollar capital allocation.

Python has become the dominant language for this work, and four libraries form the backbone: NumPy for fast array-based random sampling and numerical computation, SciPy for probability distributions and statistical testing, Matplotlib for publication-quality static visualization, and Plotly for interactive exploration of decision landscapes. This article shows how they work together in practice — not with toy pi-estimation exercises, but with the kind of simulation and analysis that actually informs decisions.

The Foundations: NumPy and SciPy for Simulation

NumPy: The Random Number Engine

NumPy is, as Harris et al. described it in their landmark 2020 Nature paper, the foundation upon which the scientific Python ecosystem is constructed. For Monte Carlo work specifically, its random number generation infrastructure is what makes Python competitive with specialized simulation software.

The critical thing to understand about NumPy's random module is that it underwent a major architectural overhaul, formalized in NumPy Enhancement Proposal (NEP) 19, which was integrated in NumPy 1.17.0. The legacy RandomState class used the Mersenne Twister (MT19937) algorithm and exposed a global singleton — convenient but problematic for reproducibility and parallelism. The modern replacement is the Generator class, backed by the PCG64 (Permuted Congruential Generator) bit generator by default. PCG64 has a period of 2128, better statistical properties than MT19937, and critically supports SeedSequence for safe parallel stream generation.

The correct modern idiom is:

import numpy as np

# Modern API: explicit, reproducible, parallelizable
rng = np.random.default_rng(seed=42)

# Draw from distributions directly -- vectorized, fast
normal_samples = rng.normal(loc=100, scale=15, size=10_000)
uniform_samples = rng.uniform(low=0.8, high=1.2, size=10_000)
triangular_samples = rng.triangular(left=50, mode=75, right=150, size=10_000)
Pro Tip

The old np.random.normal(...) still works but relies on a global RandomState instance. For anything where reproducibility matters — which in decision analysis is everything — use default_rng.

SciPy: The Distribution and Statistics Layer

Where NumPy provides the random sampling engine, SciPy provides the statistical modeling layer. The SciPy 1.0 paper (Virtanen et al., 2020, Nature Methods) documented a library that had, since its initial release in 2001, become the de facto standard for scientific algorithms in Python — with over 600 unique code contributors, thousands of dependent packages, and usage by high-profile projects including LIGO gravitational wave analysis.

For decision analysis, scipy.stats is the critical subpackage. It implements over 100 continuous and discrete probability distributions, each with a consistent interface: .rvs() for random variates, .pdf() / .pmf() for densities, .cdf() for cumulative probabilities, .ppf() for percentiles, and .fit() for maximum likelihood estimation from data. This consistency matters enormously when you are building models where the choice of distribution is itself uncertain and needs to be swapped out easily.

A key recent addition is scipy.stats.sobol_indices, introduced in SciPy 1.11 (2023). This function computes Sobol' sensitivity indices — variance-based measures that decompose a model's output variance into contributions from each input parameter and their interactions. Pamphile T. Roy of Quansight, who contributed the implementation, presented it at SciPy 2023 with the framing that these indices answer a central question in uncertainty analysis: what parameter of a system is the most important? Before this addition, practitioners had to rely on the external SALib library; now the core computation lives in SciPy itself.

A Complete Monte Carlo Decision Model

Let us build something real. Consider a company evaluating whether to invest in a new automated manufacturing line. The decision depends on uncertain variables: unit demand, unit price, variable cost per unit, fixed costs, and the initial capital expenditure. Rather than plugging in single estimates and getting a single NPV, we will simulate the full distribution of outcomes.

import numpy as np

rng = np.random.default_rng(seed=42)
n_simulations = 50_000

# --- Define uncertain inputs with appropriate distributions ---

# Annual unit demand: triangular distribution (pessimistic, likely, optimistic)
# Management believes 40,000 is the floor, 75,000 is likely, 120,000 is the ceiling.
demand = rng.triangular(left=40_000, mode=75_000, right=120_000, size=n_simulations)

# Unit selling price: normal distribution around $48, std dev $5
# Historical pricing data supports this assumption.
price = rng.normal(loc=48.0, scale=5.0, size=n_simulations)

# Variable cost per unit: log-normal to enforce positivity
# Median around $28, with right skew for supply chain disruptions.
variable_cost = rng.lognormal(mean=np.log(28), sigma=0.15, size=n_simulations)

# Annual fixed costs: uniform between $800K and $1.1M
fixed_costs = rng.uniform(low=800_000, high=1_100_000, size=n_simulations)

# Capital expenditure: PERT-like via beta distribution
# Minimum $1.2M, most likely $1.8M, maximum $3M
# PERT shape parameters: alpha = 1 + 4*(mode-min)/(max-min), beta = 1 + 4*(max-mode)/(max-min)
a_pert, b_pert = 2.33, 3.67  # alpha = 1+4*(0.6/1.8)=2.33, beta = 1+4*(1.2/1.8)=3.67
capex = 1_200_000 + rng.beta(a_pert, b_pert, size=n_simulations) * 1_800_000

# --- Calculate annual profit and 5-year NPV ---

annual_revenue = demand * price
annual_variable_costs = demand * variable_cost
annual_profit = annual_revenue - annual_variable_costs - fixed_costs

# NPV over 5 years at a 10% discount rate
discount_rate = 0.10
years = np.arange(1, 6)
# Each simulation gets the same annual profit repeated for 5 years
# (a simplification; you could model year-over-year growth)
discount_factors = 1 / (1 + discount_rate) ** years  # shape: (5,)
npv = annual_profit[:, np.newaxis] * discount_factors[np.newaxis, :]
npv = npv.sum(axis=1) - capex  # subtract initial investment

# --- Summarize the decision ---

mean_npv = np.mean(npv)
median_npv = np.median(npv)
prob_positive = np.mean(npv > 0)
percentile_5 = np.percentile(npv, 5)
percentile_95 = np.percentile(npv, 95)
value_at_risk_5 = np.percentile(npv, 5)  # 5th percentile = downside risk

print(f"Mean NPV:          ${mean_npv:,.0f}")
print(f"Median NPV:        ${median_npv:,.0f}")
print(f"P(NPV > 0):        {prob_positive:.1%}")
print(f"5th Percentile:    ${percentile_5:,.0f}")
print(f"95th Percentile:   ${percentile_95:,.0f}")
print(f"90% CI Width:      ${percentile_95 - percentile_5:,.0f}")

Notice what this gives you that a single-point estimate never could. You do not just know the expected NPV — you know the probability of losing money, the magnitude of the downside tail, and the width of the confidence interval. Douglas Hubbard, in How to Measure Anything (2007), argued that all risk in a project can be expressed by the ranges of uncertainty on costs and benefits and the probabilities of events that might affect them, and that Monte Carlo simulation is the simplest tool for measuring such risks accurately. This code is the realization of that principle.

Note

The np.newaxis broadcasting trick deserves attention. Rather than looping over 50,000 simulations and 5 years, NumPy's vectorized operations handle the entire computation in a single array operation. This is the difference between a simulation that runs in milliseconds and one that takes minutes. The @ matrix multiplication operator (PEP 465) would be the natural choice for more complex discounting structures with varying annual cash flows — npv = cash_flows @ discount_factors reads almost exactly like the textbook formula.

Sensitivity Analysis: Finding What Matters

Running 50,000 simulations is only the beginning. The real value comes from understanding which inputs drive the output. Sensitivity analysis answers a question that is arguably more important than the expected outcome itself: where should we focus our measurement effort, and which uncertainties can we safely ignore?

One-at-a-Time Tornado Analysis

The simplest approach is deterministic one-at-a-time (OAT) sensitivity analysis, visualized as a tornado diagram. For each input variable, hold all others at their baseline and sweep the variable of interest across its range. The resulting chart ranks variables by their impact on the output.

import numpy as np
import matplotlib.pyplot as plt

# Baseline values (modes/means of our distributions)
baseline = {
    'demand': 75_000,
    'price': 48.0,
    'variable_cost': 28.0,
    'fixed_costs': 950_000,
    'capex': 1_800_000,
}

# Low and high bounds (roughly 10th and 90th percentiles)
bounds = {
    'demand':       (48_000, 108_000),
    'price':        (39.6, 56.4),
    'variable_cost':(23.5, 33.3),
    'fixed_costs':  (820_000, 1_080_000),
    'capex':        (1_300_000, 2_600_000),
}

def compute_npv(demand, price, variable_cost, fixed_costs, capex,
                discount_rate=0.10, years=5):
    """Compute deterministic NPV for a single set of inputs."""
    annual_profit = demand * (price - variable_cost) - fixed_costs
    pv_factor = sum(1 / (1 + discount_rate) ** y for y in range(1, years + 1))
    return annual_profit * pv_factor - capex

# Compute baseline NPV
base_npv = compute_npv(**baseline)

# Sweep each variable
results = {}
for var_name, (low, high) in bounds.items():
    params_low = {**baseline, var_name: low}
    params_high = {**baseline, var_name: high}
    npv_low = compute_npv(**params_low)
    npv_high = compute_npv(**params_high)
    results[var_name] = (npv_low, npv_high)

# Sort by total swing (impact magnitude)
sorted_vars = sorted(results.keys(),
                     key=lambda v: abs(results[v][1] - results[v][0]))

# --- Build the tornado diagram ---
fig, ax = plt.subplots(figsize=(10, 6))
y_positions = range(len(sorted_vars))

for i, var in enumerate(sorted_vars):
    low_val, high_val = results[var]
    # Bar extends from the lower NPV to the higher NPV
    left = min(low_val, high_val)
    width = abs(high_val - low_val)
    color = '#2196F3' if high_val > low_val else '#FF5722'
    ax.barh(i, width, left=left, height=0.6, color=color, alpha=0.8,
            edgecolor='white', linewidth=0.5)

ax.axvline(x=base_npv, color='#333333', linewidth=1.5, linestyle='--',
           label=f'Baseline NPV: ${base_npv:,.0f}')
ax.set_yticks(y_positions)
ax.set_yticklabels([v.replace('_', ' ').title() for v in sorted_vars])
ax.set_xlabel('Net Present Value ($)')
ax.set_title('Sensitivity Analysis: Tornado Diagram')
ax.legend(loc='lower right')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('tornado_diagram.png', dpi=150, bbox_inches='tight')
plt.show()

Tornado diagrams are useful for quick communication, but they have a fundamental limitation: they cannot capture interactions between variables. If high demand and high variable cost tend to occur together (because both are driven by raw material scarcity), OAT analysis will miss that correlation entirely.

Global Sensitivity with Sobol' Indices

Sobol' indices solve this problem. They decompose the total output variance into first-order effects (the contribution of each variable in isolation) and total-order effects (the contribution including all interactions). The difference between first-order and total-order tells you how much interaction exists.

import numpy as np
import scipy.stats as stats
from scipy.stats import sobol_indices
import matplotlib.pyplot as plt

def manufacturing_npv(params):
    """
    Vectorized NPV model for Sobol' analysis.
    params shape: (d, n) where d=5 parameters, n=sample points.
    Returns shape: (n,) -- one NPV per sample.
    """
    demand, price, variable_cost, fixed_costs, capex = params
    annual_profit = demand * (price - variable_cost) - fixed_costs
    discount_rate = 0.10
    pv_factor = sum(1 / (1 + discount_rate) ** y for y in range(1, 6))
    return annual_profit * pv_factor - capex

# Define distributions for each parameter
dists = [
    stats.triang(c=0.4375, loc=40_000, scale=80_000),  # demand
    stats.norm(loc=48.0, scale=5.0),                     # price
    stats.lognorm(s=0.15, scale=28.0),                   # variable_cost
    stats.uniform(loc=800_000, scale=300_000),            # fixed_costs
    stats.beta(a=2.33, b=3.67, loc=1.2e6, scale=1.8e6), # capex
]

rng = np.random.default_rng(seed=42)

# Compute Sobol' indices
result = sobol_indices(
    func=manufacturing_npv,
    n=2048,           # power of 2 required
    dists=dists,
    rng=rng,
)

first_order = result.first_order
total_order = result.total_order
var_names = ['Demand', 'Price', 'Variable Cost', 'Fixed Costs', 'CapEx']

# --- Visualize Sobol' indices ---
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(len(var_names))
width = 0.35

bars_first = ax.bar(x - width/2, first_order, width, label='First Order (S1)',
                     color='#1976D2', alpha=0.85)
bars_total = ax.bar(x + width/2, total_order, width, label='Total Order (ST)',
                     color='#FF7043', alpha=0.85)

ax.set_ylabel('Sobol\' Index')
ax.set_title('Global Sensitivity Analysis: Sobol\' Indices')
ax.set_xticks(x)
ax.set_xticklabels(var_names)
ax.legend()
ax.set_ylim(0, 1)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('sobol_indices.png', dpi=150, bbox_inches='tight')
plt.show()

# Print results
print("Parameter          S1        ST        Interaction")
print("-" * 55)
for name, s1, st in zip(var_names, first_order, total_order):
    interaction = st - s1
    print(f"{name:<20s} {s1:.4f}    {st:.4f}    {interaction:.4f}")

The gap between S1 and ST for a given variable quantifies its interaction effects. If demand has S1 = 0.35 and ST = 0.42, it means demand contributes 35% of output variance on its own and an additional 7% through interactions with other variables. A variable with a high ST but low S1 is one you would completely miss with a tornado diagram — its importance comes entirely from how it combines with other inputs.

Visualization: From Static to Interactive

Matplotlib: The Publication Standard

Matplotlib, created by John D. Hunter and described in his 2007 paper in Computing in Science & Engineering as a 2D graphics package for application development, interactive scripting, and publication-quality image generation, remains the default for static, print-ready figures. For decision analysis, its strength is producing the kind of tightly controlled, camera-ready output that goes into board presentations and regulatory filings.

The tornado and Sobol' charts above use Matplotlib for exactly this reason. When the audience needs a PDF they can print and annotate, or a figure embedded in a LaTeX report, Matplotlib is the right tool. Its object-oriented API (fig, ax = plt.subplots()) gives you pixel-level control over every element.

For Monte Carlo output distributions, a well-formatted Matplotlib histogram with annotated percentiles communicates the essential story:

import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import numpy as np

fig, ax = plt.subplots(figsize=(10, 6))

# npv is the array of 50,000 simulated NPV values from earlier
ax.hist(npv, bins=120, density=True, color='#42A5F5', alpha=0.7,
        edgecolor='white', linewidth=0.3)

# Annotate key percentiles
for pct, color, style in [(5, '#D32F2F', '--'), (50, '#333333', '-'),
                            (95, '#388E3C', '--')]:
    value = np.percentile(npv, pct)
    ax.axvline(value, color=color, linewidth=1.8, linestyle=style)
    ax.text(value, ax.get_ylim()[1] * 0.92,
            f'  P{pct}: ${value/1e6:.2f}M',
            color=color, fontsize=9, fontweight='bold')

# Shade the loss region
loss_mask = npv[npv < 0]
if len(loss_mask) > 0:
    ax.hist(loss_mask, bins=60, density=True, color='#EF5350', alpha=0.5,
            edgecolor='none')
    ax.text(np.mean(loss_mask), ax.get_ylim()[1] * 0.5,
            f'P(Loss) = {np.mean(npv < 0):.1%}',
            color='#C62828', fontsize=11, fontweight='bold',
            ha='center', bbox=dict(boxstyle='round,pad=0.3',
            facecolor='white', alpha=0.8))

ax.xaxis.set_major_formatter(mticker.FuncFormatter(
    lambda x, _: f'${x/1e6:.1f}M'))
ax.set_xlabel('Net Present Value')
ax.set_ylabel('Probability Density')
ax.set_title('Monte Carlo NPV Distribution (n = 50,000 simulations)')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('npv_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

Plotly: Interactive Decision Exploration

When the audience is not reading a printed report but exploring a dashboard — adjusting assumptions, hovering over data points, zooming into tails — Plotly is the tool that transforms static analysis into collaborative decision-making. Founded in 2013 by Alex Johnson, Jack Parmer, Chris Parmer, and Matthew Sundquist, Plotly grew from a PyCon 2013 startup-row project into a platform whose Python library now sees over 40 million downloads per month. Chris Parmer described the vision succinctly: highly visual, interactive, and easy-to-use web applications act as bridges connecting data science investments with meaningful action.

Plotly's strength for decision analysis is its interactivity. A sensitivity heatmap that you can hover over to see exact values. A distribution chart where you can zoom into the left tail to understand downside risk. A scatter plot of input-output relationships where you can select subsets and see how the distribution shifts. None of this is possible with a static image.

import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import numpy as np
import pandas as pd

# Build a DataFrame of simulation results for interactive exploration
sim_df = pd.DataFrame({
    'Demand': demand,
    'Price': price,
    'Variable Cost': variable_cost,
    'Fixed Costs': fixed_costs,
    'CapEx': capex,
    'NPV': npv,
    'Profitable': npv > 0
})

# --- Interactive NPV Distribution ---
fig = go.Figure()

fig.add_trace(go.Histogram(
    x=sim_df['NPV'],
    nbinsx=150,
    marker_color='#42A5F5',
    opacity=0.7,
    name='All Scenarios'
))

# Add loss scenarios in red
fig.add_trace(go.Histogram(
    x=sim_df.loc[~sim_df['Profitable'], 'NPV'],
    nbinsx=50,
    marker_color='#EF5350',
    opacity=0.6,
    name=f'Loss Scenarios ({(~sim_df["Profitable"]).mean():.1%})'
))

# Add percentile annotations
for pct in [5, 50, 95]:
    val = np.percentile(npv, pct)
    fig.add_vline(x=val, line_dash="dash",
                  annotation_text=f"P{pct}: ${val/1e6:.2f}M",
                  annotation_position="top")

fig.update_layout(
    title='Interactive NPV Distribution -- Hover for Details',
    xaxis_title='Net Present Value ($)',
    yaxis_title='Frequency',
    barmode='overlay',
    template='plotly_white',
    hovermode='x unified'
)
fig.write_html('npv_interactive.html')
fig.show()

# --- Decision Landscape: Demand vs Price colored by NPV ---
# Sample 5,000 points for readability
sample_idx = rng.choice(n_simulations, size=5_000, replace=False)
sample_df = sim_df.iloc[sample_idx]

fig2 = px.scatter(
    sample_df,
    x='Demand',
    y='Price',
    color='NPV',
    color_continuous_scale='RdYlGn',
    color_continuous_midpoint=0,
    opacity=0.6,
    title='Decision Landscape: Demand vs Price vs NPV',
    labels={'NPV': 'NPV ($)'},
    hover_data=['Variable Cost', 'Fixed Costs', 'CapEx']
)

fig2.update_layout(template='plotly_white')
fig2.write_html('decision_landscape.html')
fig2.show()

# --- Cumulative Probability (S-Curve) ---
sorted_npv = np.sort(npv)
cumulative_prob = np.arange(1, len(sorted_npv) + 1) / len(sorted_npv)

fig3 = go.Figure()
fig3.add_trace(go.Scatter(
    x=sorted_npv,
    y=cumulative_prob,
    mode='lines',
    line=dict(color='#1976D2', width=2),
    name='CDF',
    hovertemplate='NPV: $%{x:,.0f}<br>P(NPV < value): %{y:.1%}'
))

# Highlight the breakeven point
breakeven_prob = np.mean(npv < 0)
fig3.add_hline(y=breakeven_prob, line_dash="dot", line_color="#D32F2F",
               annotation_text=f"P(Loss) = {breakeven_prob:.1%}")
fig3.add_vline(x=0, line_dash="dot", line_color="#D32F2F")

fig3.update_layout(
    title='Cumulative Probability of NPV (S-Curve)',
    xaxis_title='Net Present Value ($)',
    yaxis_title='Cumulative Probability',
    template='plotly_white'
)
fig3.write_html('scurve.html')
fig3.show()

The decision landscape scatter plot is particularly powerful in meetings. The color gradient immediately shows where the green zone (profitable scenarios) transitions to red (losses). Stakeholders can hover over individual scenarios and see all five input values that produced that outcome. This kind of transparency is what turns Monte Carlo simulation from a black-box number generator into a decision-support tool that people actually trust.

When to Use Which Visualization Library

The choice between Matplotlib and Plotly is not about one being better than the other. It is about matching the tool to the audience and the medium.

Matplotlib excels when the output is a static document: a board memo, a regulatory filing, an academic paper, a slide deck exported to PDF. It gives you complete control over typography, spacing, colors, and layout. Its savefig() function produces vector PDFs that look sharp at any resolution. When John Hunter created it, the design goal was publication-quality output, and that remains its core strength.

Plotly excels when the audience is exploring, not just reading. Dashboards built with Plotly's Dash framework allow stakeholders to adjust input assumptions and see the NPV distribution update in real time. The hover-for-details capability means a single chart can serve as both a summary view and a drill-down tool. For decision analysis workshops where the question is "what if we negotiate a 5% lower CapEx?" — Plotly-based apps let you answer that question live.

In practice, a mature decision analysis workflow uses both: Matplotlib for the final report, Plotly for the interactive exploration that happens before and after the report is written.

Related PEPs: Python's Foundation for Numerical Work

Several Python Enhancement Proposals directly enable the kind of work shown in this article.

PEP 465 (Nathaniel J. Smith, 2014) introduced the @ infix operator for matrix multiplication, implemented in Python 3.5. Before this, chained matrix operations like the normal equations for linear regression required deeply nested np.dot() calls that were nearly unreadable: np.dot(np.dot(np.linalg.inv(np.dot(X.T, X)), X.T), y). With PEP 465, the same expression becomes np.linalg.inv(X.T @ X) @ X.T @ y — almost identical to the textbook formula. For decision analysis, the @ operator makes discounted cash flow computations with varying annual cash flows and discount vectors clean and auditable.

PEP 484 (van Rossum and Lehtosalo, 2014) formalized type hints in Python 3.5, building on the function annotation syntax from PEP 3107 (Winter and Lownds, 2006, Python 3.0). Type hints do not change runtime behavior, but for simulation models — which can grow to hundreds of parameters across dozens of functions — they are essential for maintainability. Annotating def compute_npv(demand: np.ndarray, price: np.ndarray, ...) -> np.ndarray catches entire classes of shape-mismatch bugs before the simulation runs.

PEP 450 (D'Aprano, 2013) added the statistics module to the standard library in Python 3.4. While scipy.stats is far more powerful, statistics.NormalDist provides a lightweight, dependency-free way to work with normal distributions — useful for quick parametric estimates and back-of-envelope calculations in contexts where pulling in NumPy and SciPy is overkill.

PEP 3120 (von Lowis, 2007) made UTF-8 the default source encoding in Python 3.0. This matters more than you might think for quantitative work: variable names in internationally collaborative codebases, Greek letters for statistical parameters in documentation strings, and proper rendering of mathematical symbols in inline comments all depend on reliable Unicode handling.

The Expected Value of Perfect Information

One of the more powerful outputs of a Monte Carlo simulation is the Expected Value of Perfect Information (EVPI). It answers the question: if you could eliminate all uncertainty about a specific variable before making your decision, how much would that be worth?

import numpy as np

def compute_evpi(npv_array, input_array, n_bins=50):
    """
    Estimate the EVPI for a single input variable.
    
    Partitions the input into bins, computes the optimal decision
    (invest or don't invest) within each bin, and compares to the
    overall optimal decision.
    """
    # Overall decision: invest if E[NPV] > 0
    overall_expected = np.mean(npv_array)
    overall_value = max(overall_expected, 0)  # can always choose not to invest
    
    # With perfect information about this variable:
    bin_edges = np.percentile(input_array, np.linspace(0, 100, n_bins + 1))
    bin_indices = np.digitize(input_array, bin_edges[1:-1], right=False)
    
    conditional_value = 0.0
    for b in range(n_bins):
        mask = bin_indices == b
        if mask.sum() == 0:
            continue
        bin_expected_npv = np.mean(npv_array[mask])
        bin_optimal = max(bin_expected_npv, 0)
        bin_weight = mask.sum() / len(npv_array)
        conditional_value += bin_optimal * bin_weight
    
    return conditional_value - overall_value

# Compute EVPI for each variable
inputs = {
    'Demand': demand,
    'Price': price,
    'Variable Cost': variable_cost,
    'Fixed Costs': fixed_costs,
    'CapEx': capex,
}

print("Expected Value of Perfect Information")
print("=" * 45)
for name, values in inputs.items():
    evpi = compute_evpi(npv, values)
    print(f"{name:<20s}  ${evpi:>12,.0f}")

EVPI tells you where to spend your measurement budget. If eliminating uncertainty about demand is worth $400,000 but eliminating uncertainty about fixed costs is worth only $12,000, you know where to commission the market research study and where to accept the current estimates as good enough. This is the principle Hubbard emphasizes: measure what matters, and use value-of-information calculations to determine what matters.

Practical Recommendations

  1. Start with distributions, not point estimates. If someone gives you a single number for an uncertain input, push back. Ask for a range: "What is the lowest plausible value, the highest, and where do you think it most likely falls?" A triangular distribution from those three numbers is dramatically more informative than a single guess.
  2. Use the modern NumPy random API. The default_rng() constructor with explicit seeds makes your simulation reproducible across runs, machines, and Python versions. The legacy global-state API does not provide this guarantee.
  3. Run enough simulations. Monte Carlo convergence goes as O(1/√n). Doubling precision requires quadrupling the sample size. For a rough decision-support analysis, 10,000 iterations are usually sufficient. For tail-risk analysis where you care about the 1st or 99th percentile, run at least 50,000.
  4. Validate with known-answer tests. Before trusting a complex simulation, test it with inputs that have analytical solutions. If you set all inputs to constants (zero variance), your NPV should equal the deterministic calculation exactly. If you use only normal distributions with known parameters, you can verify moments analytically.
  5. Pair Sobol' indices with tornado diagrams. Tornado diagrams are easier to explain to non-technical stakeholders. Sobol' indices capture interactions. Use both: the tornado for the executive summary, the Sobol' analysis for the technical appendix.
  6. Save your simulation results. A 50,000-iteration simulation over 5 input variables produces a modest DataFrame. Save it to Parquet or HDF5. When stakeholders come back with follow-up questions — "what does the distribution look like if we only consider scenarios where demand exceeds 80,000?" — you can answer instantly by filtering instead of re-running.

References

  1. Harris, C. R., Millman, K. J., van der Walt, S. J., et al. (2020). "Array programming with NumPy." Nature, 585, 357–362. DOI: 10.1038/s41586-020-2649-2.
  2. Virtanen, P., Gommers, R., Oliphant, T. E., et al. (2020). "SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python." Nature Methods, 17(3), 261–272. DOI: 10.1038/s41592-019-0686-2.
  3. Hunter, J. D. (2007). "Matplotlib: A 2D Graphics Environment." Computing in Science & Engineering, 9(3), 90–95. DOI: 10.1109/MCSE.2007.55.
  4. Hubbard, D. W. (2007). How to Measure Anything: Finding the Value of Intangibles in Business. Wiley.
  5. Sobol', I. M. (2001). "Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates." Mathematics and Computers in Simulation, 55(1–3), 271–280.
  6. Saltelli, A., Ratto, M., Andres, T., et al. (2008). Global Sensitivity Analysis: The Primer. Wiley.
  7. Roy, P. T. (2023). "Sensitivity Analysis in Python: scipy.stats.sobol_indices." Poster presentation, SciPy 2023. Zenodo. DOI: 10.5281/zenodo.8136764.
  8. Python Enhancement Proposals: PEP 465, PEP 484, PEP 3107, PEP 450, PEP 3120.
  9. Plotly Technologies Inc. (2023). "Plotly Turns 10!" Medium.
  10. NumPy Enhancement Proposal 19 (NEP 19). "Random Number Generator Policy." numpy.org.
back to articles