Python statsmodels: Statistical Modeling From the Ground Up

When scikit-learn gives you predictions but not answers, statsmodels fills the gap. It is Python's dedicated library for rigorous statistical modeling -- the kind where p-values, confidence intervals, and diagnostic tests actually matter. But knowing how to call .fit() is only the first step. Knowing when to distrust your results, why a diagnostic test matters, and what to do when an assumption breaks -- that is what separates a functioning model from a defensible one.

Python has no shortage of machine learning libraries, but statistical modeling is a different discipline. When a researcher needs to publish results, an economist needs to interpret coefficients, or a data scientist needs to test whether a relationship is real, the tooling has to go deeper than accuracy scores. That is exactly the problem statsmodels solves. As the statsmodels documentation states, the library provides estimation and inference for statistical models (source: statsmodels.org). The library has been a cornerstone of scientific Python since its initial release in 2009, and version 0.14.6 -- released December 5, 2025 via PyPI -- keeps it fully compatible with Python 3.14 (released October 7, 2025, now at patch 3.14.3), the latest NumPy 2.x, pandas 3.0, and SciPy 1.16+.

What statsmodels actually does

statsmodels is a Python module built to estimate statistical models, run hypothesis tests, and explore data with the kind of rigor that scientific and academic work demands. Where scikit-learn optimizes for predictive accuracy and ease of use, statsmodels optimizes for interpretability and inference. You get detailed summary tables, standard errors, t-statistics, and diagnostic tests baked directly into the output -- not as an afterthought.

The library sits alongside NumPy and SciPy in the scientific Python stack, complementing rather than replacing them. It draws on SciPy's numerical routines while adding the statistical scaffolding that those libraries do not provide on their own. Crucially, results are validated against R and Stata, which means published findings built with statsmodels can be trusted to match those reference environments. The statsmodels GitHub repository confirms this cross-validation approach (source: github.com/statsmodels).

The core capabilities cover a wide range: linear models (OLS, GLS, WLS), generalized linear models for non-normal outcomes, robust regression with M-estimators, time series analysis (ARIMA, SARIMAX, VAR, VECM), mixed effects models, treatment effects estimation, hurdle count models, and a comprehensive suite of statistical tests. It also supports R-style formula syntax through the Patsy library, which means you can write y ~ x1 + x2 instead of constructing design matrices by hand.

Note

statsmodels answers why questions. scikit-learn answers what questions. The two libraries are complementary. Use statsmodels when you need to explain relationships, test hypotheses, or produce results that must survive peer review. Use scikit-learn when your primary goal is to maximize predictive accuracy on unseen data.

When to use statsmodels vs. scikit-learn

The decision between statsmodels and scikit-learn is not about which library is "better" -- it is about what question you are trying to answer. This distinction runs deeper than feature sets. It reflects two fundamentally different philosophies of how data should inform decisions.

Traditional statistical modeling, the kind statsmodels implements, is concerned with inference. You have a theory about how the world works -- perhaps that advertising spending drives sales -- and you want to test whether the data supports that theory while quantifying your uncertainty. The entire workflow revolves around parameter estimation, hypothesis testing, and the assumptions that make those tests valid. The coefficients themselves are the product. As Galit Shmueli articulated in her influential 2010 paper in Statistical Science, the goals of explanation and prediction are distinct and require different modeling strategies (source: Shmueli, G. "To Explain or to Predict?" Statistical Science, 2010).

Machine learning, the kind scikit-learn implements, is concerned with prediction. You want to know what will happen next, and you evaluate success by how well the model generalizes to data it has never seen. The coefficients are a means to an end -- if a black-box model predicts better, you use the black-box.

Here is the practical decision framework: if someone will ask you "why did the model produce this coefficient, and how confident are you?" then you need statsmodels. If someone will ask you "how accurate is this prediction on new data?" then you need scikit-learn. If both questions matter -- and they often do -- you use both, and that is not a compromise but a complete analytical workflow.

Pro Tip

A powerful hybrid approach: use scikit-learn for feature selection or detecting nonlinearities, then switch to statsmodels for interpretable inference on the selected variables. This gives you the predictive power of machine learning with the explanatory rigor of statistics.

Installation and imports

Install statsmodels with pip. It pulls in NumPy, SciPy, and pandas automatically. As of version 0.14.6, the minimum supported Python version is 3.9, and it runs on Python 3.14 without issues.

pip install statsmodels

Two import patterns cover the vast majority of use cases. The main API gives you access to datasets, model classes, and utilities. The formula API lets you write model specifications in R-style syntax.

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

Pro Tip

Use smf (formula API) when working with pandas DataFrames and named columns. Use sm (main API) when working with raw NumPy arrays. Both produce identical results -- the formula API just handles the design matrix construction for you, including automatic intercept inclusion and categorical variable encoding.

OLS regression

Ordinary least squares is the foundation of linear regression and the starting point for understanding how statsmodels thinks about modeling. Every other model in the library -- logistic regression, time series, GLMs -- builds on concepts you first encounter here. statsmodels provides two paths to fit an OLS model: the array-based approach using sm.OLS, and the formula-based approach using smf.ols. Both produce the same rich summary output.

Array-based OLS

When working directly with arrays, you must add a constant column manually. statsmodels does not include an intercept by default -- this is intentional, and a common source of confusion for newcomers. The design choice keeps the API explicit: you always know exactly what is in your design matrix.

import numpy as np
import statsmodels.api as sm

# Simulate some data
np.random.seed(42)
n = 100
X = np.random.rand(n, 2)           # two predictors
X = sm.add_constant(X)             # prepend intercept column
beta_true = [3.0, 1.5, -0.8]
y = X @ beta_true + np.random.randn(n) * 0.5

# Fit the model
model = sm.OLS(y, X)
results = model.fit()

print(results.summary())

The .summary() output is one of the reasons researchers choose statsmodels over alternatives. It gives you the estimated coefficients, standard errors, t-statistics, p-values, 95% confidence intervals, R-squared, adjusted R-squared, the F-statistic for overall model significance, the Durbin-Watson statistic for autocorrelation, and omnibus and Jarque-Bera test statistics for residual normality. All of this comes from a single call, and every number is cross-validated against R and Stata output.

Formula-based OLS

When your data lives in a pandas DataFrame, the formula API is cleaner and less error-prone. Categorical variables are handled automatically, the intercept is included by default, and you can apply transformations directly in the formula string.

import pandas as pd
import statsmodels.formula.api as smf

# Build a DataFrame
df = pd.DataFrame({
    'sales':       [150, 200, 250, 300, 220, 180, 270, 310, 190, 240],
    'advertising': [10, 20, 25, 30, 22, 15, 28, 35, 18, 23],
    'price':       [50, 45, 40, 38, 42, 47, 39, 36, 46, 41]
})

# Fit using R-style formula
results = smf.ols('sales ~ advertising + price', data=df).fit()

print(results.summary())

# Access specific attributes
print(f"\nR-squared:   {results.rsquared:.4f}")
print(f"Adj R-sq:    {results.rsquared_adj:.4f}")
print(f"F-statistic: {results.fvalue:.4f}")
print(f"F p-value:   {results.f_pvalue:.4f}")
print(f"\nCoefficients:\n{results.params}")
print(f"\n95% Confidence Intervals:\n{results.conf_int()}")

The formula API also supports interactions (x1:x2), polynomial terms (np.power(x, 2)), log transformations (np.log(x)), and explicit categorical encoding (C(region)). This means you can express complex model specifications in a single readable string:

# Interaction terms, log transforms, and categorical variables
results = smf.ols(
    'sales ~ advertising * C(region) + np.log(price)',
    data=df
).fit()

Making predictions with uncertainty

After fitting, use .predict() to generate fitted values or forecast new observations. But point predictions alone are rarely sufficient -- you typically need to communicate uncertainty. The .get_prediction() method returns both confidence intervals (uncertainty about the mean response) and prediction intervals (uncertainty about individual observations):

# Predict on new data
new_data = pd.DataFrame({
    'advertising': [25, 30],
    'price':       [42, 38]
})

predictions = results.predict(new_data)
print(predictions)

# Get prediction intervals (more useful in practice)
pred = results.get_prediction(new_data)
pred_summary = pred.summary_frame(alpha=0.05)
print(pred_summary[['mean', 'obs_ci_lower', 'obs_ci_upper']])

Warning

Always use sm.add_constant(X) when using the array-based API. Forgetting this step means your model has no intercept, which forces the regression line through the origin and produces biased coefficient estimates in almost every real-world scenario. The formula API handles this automatically.

The assumption-diagnostic-remedy chain

Here is where many statsmodels tutorials fall short: they show you how to fit a model and run diagnostic tests, but they do not connect the dots into a coherent reasoning chain. OLS produces reliable inference only when certain assumptions hold. When they break, your p-values, confidence intervals, and hypothesis tests become unreliable -- sometimes wildly so. Understanding this chain of reasoning is what separates someone who uses statsmodels from someone who understands statistical modeling.

The four critical OLS assumptions are: linearity (the relationship between predictors and response is linear), independence (residuals are not correlated with each other), homoscedasticity (residuals have constant variance), and normality (residuals are approximately normally distributed). Each assumption has a corresponding diagnostic test, and each broken assumption has a specific remedy. The goal is not perfection -- it is awareness of what can go wrong and what to do about it.

Normality of residuals

The normality assumption affects the validity of t-tests and F-tests on your coefficients. If residuals are severely non-normal (heavy tails, strong skew), your p-values may be misleading.

from statsmodels.stats.stattools import jarque_bera

residuals = results.resid

# Jarque-Bera test for normality of residuals
jb_stat, jb_p, skew, kurtosis = jarque_bera(residuals)
print(f"Jarque-Bera statistic: {jb_stat:.4f}")
print(f"p-value:               {jb_p:.4f}")
print(f"Skewness:              {skew:.4f}")
print(f"Kurtosis:              {kurtosis:.4f}")

# Interpretation:
# p < 0.05 --> reject normality
# Remedy: transform the response variable (log, Box-Cox),
#          use robust standard errors, or increase sample size

The reasoning: A small p-value means the residuals deviate significantly from a normal distribution. With large samples (roughly n > 30), the Central Limit Theorem makes coefficient estimates approximately normal regardless, so mild non-normality is often tolerable. With small samples, consider transforming your dependent variable or using bootstrap confidence intervals.

Heteroscedasticity

Heteroscedasticity -- non-constant variance in residuals -- does not bias your coefficient estimates, but it makes your standard errors wrong. This is a critical distinction: the coefficients themselves are still unbiased, but every test you run on them (t-tests, F-tests, confidence intervals) becomes unreliable.

from statsmodels.stats.diagnostic import het_breuschpagan, het_white

# Breusch-Pagan test
bp_stat, bp_p, bp_f, bp_fp = het_breuschpagan(results.resid, results.model.exog)
print(f"Breusch-Pagan LM stat: {bp_stat:.4f}")
print(f"p-value:               {bp_p:.4f}")

# White's test (more general, detects nonlinear heteroscedasticity)
white_stat, white_p, white_f, white_fp = het_white(results.resid, results.model.exog)
print(f"White test stat:       {white_stat:.4f}")
print(f"p-value:               {white_p:.4f}")

# Interpretation:
# p < 0.05 --> heteroscedasticity detected
# Remedy: use robust standard errors (HC0-HC3),
#          use WLS, or transform the response variable

The reasoning: Breusch-Pagan tests whether the squared residuals are linearly related to the predictors. White's test is more general -- it also captures nonlinear relationships. If either test rejects, your standard errors need correction. The next section shows exactly how to do that.

Autocorrelation

For time series or panel data, residual autocorrelation is a common problem. When residuals are correlated over time, OLS underestimates standard errors, making coefficients appear more significant than they actually are.

from statsmodels.stats.diagnostic import acorr_ljungbox

lb_results = acorr_ljungbox(results.resid, lags=[5, 10, 20], return_df=True)
print(lb_results)

# Interpretation:
# p < 0.05 at any lag --> autocorrelation detected
# Remedy: use Newey-West (HAC) standard errors,
#          add lagged variables, or switch to a time series model

Note

The Durbin-Watson statistic in the OLS summary is a quick check for first-order autocorrelation. Values near 2.0 suggest no autocorrelation, values near 0 suggest positive autocorrelation, and values near 4 suggest negative autocorrelation. But Ljung-Box is more thorough because it tests multiple lag orders simultaneously.

Robust standard errors

When heteroscedasticity or autocorrelation is present, the solution is often not to re-specify your entire model -- it is to correct the standard errors. Robust standard errors leave the coefficient estimates unchanged (they are still unbiased under OLS) but produce valid inference by adjusting how uncertainty is calculated.

statsmodels supports the full family of heteroscedasticity-consistent (HC) standard errors and heteroscedasticity-and-autocorrelation-consistent (HAC) standard errors, directly in the .fit() call:

# HC3 robust standard errors (recommended for small samples)
results_robust = sm.OLS(y, X).fit(cov_type='HC3')
print(results_robust.summary())

# Compare standard errors
print(f"\nNon-robust SE: {results.bse}")
print(f"HC3 robust SE: {results_robust.bse}")

# Newey-West HAC standard errors (for autocorrelated residuals)
results_hac = sm.OLS(y, X).fit(
    cov_type='HAC',
    cov_kwds={'maxlags': 5}
)
print(results_hac.summary())

The HC family includes HC0 (White's original estimator), HC1 (with a degrees-of-freedom correction), HC2 (uses leverage values), and HC3 (conservative, best for small samples). Research by Long and Ervin published in The American Statistician found that HC3 performs well even with limited observations and is a safer default than HC0 or HC1 (source: Long, J.S. and Ervin, L.H., The American Statistician, 2000).

Pro Tip

Many econometricians advocate for always using robust standard errors as a default, regardless of whether formal tests detect heteroscedasticity. The downside is minimal (slightly wider confidence intervals), but the protection against misspecification is substantial. In statsmodels, this is a single argument: cov_type='HC3'.

Detecting multicollinearity with VIF

Multicollinearity occurs when two or more predictors are highly correlated, making it difficult to separate their individual effects. Unlike heteroscedasticity, multicollinearity does not violate OLS assumptions in a technical sense -- OLS estimates remain unbiased. But it inflates the variance of coefficient estimates, which can make individually significant predictors appear insignificant and produce wildly unstable estimates that change dramatically with small data perturbations.

The Variance Inflation Factor (VIF) quantifies how much each coefficient's variance is inflated by collinearity. A VIF of 1 means no collinearity. VIF values above 5 warrant investigation, and values above 10 indicate serious multicollinearity, according to guidance from Penn State's STAT 462 course materials (source: online.stat.psu.edu).

from statsmodels.stats.outliers_influence import variance_inflation_factor

# Calculate VIF for each predictor
# X must include the constant column
vif_data = pd.DataFrame({
    'feature': ['const', 'advertising', 'price'],
    'VIF': [
        variance_inflation_factor(results.model.exog, i)
        for i in range(results.model.exog.shape[1])
    ]
})
print(vif_data)

# Interpretation:
# VIF = 1    --> no collinearity
# VIF 1-5   --> moderate (usually acceptable)
# VIF 5-10  --> high (investigate further)
# VIF > 10  --> severe (take action)

Remedies for high VIF: Remove one of the correlated predictors (a scientific decision about which one matters more to your research question). Combine correlated predictors into a single index using PCA or domain knowledge. Ridge regression (available via sm.OLS().fit_regularized()) can also stabilize estimates by adding a penalty term, though this introduces bias. The choice depends on whether your goal is explanation (drop a predictor) or prediction (regularize).

Logistic regression

Logistic regression handles binary outcomes -- yes/no, pass/fail, fraud/not-fraud. In statsmodels, it lives under the generalized linear models module (sm.Logit or smf.logit). Unlike scikit-learn's implementation, statsmodels does not apply regularization by default. This is intentional: the goal is unbiased coefficient estimation for inference, not predictive optimization.

import numpy as np
import statsmodels.api as sm

# Simulate binary outcome data
np.random.seed(0)
n = 200
X = np.column_stack([
    np.random.randn(n),     # continuous predictor
    np.random.randn(n)      # another predictor
])
X = sm.add_constant(X)

# True log-odds: intercept=-1, coefs=[0.8, -0.5]
log_odds = -1 + 0.8 * X[:, 1] + (-0.5) * X[:, 2]
prob = 1 / (1 + np.exp(-log_odds))
y = np.random.binomial(1, prob)

# Fit logistic regression
logit_model = sm.Logit(y, X)
logit_results = logit_model.fit()

print(logit_results.summary())

# Marginal effects (change in probability per unit change in predictor)
marginal_effects = logit_results.get_margeff()
print(marginal_effects.summary())

The coefficients from a logistic regression are in log-odds units, which are not directly interpretable for many audiences. The .get_margeff() method converts them to marginal effects -- the change in the probability of the outcome for a one-unit change in each predictor, evaluated at the mean. This is frequently what researchers actually want to report, because it answers the concrete question: "by how much does the probability change?"

Pro Tip

Use smf.logit('outcome ~ predictor1 + predictor2', data=df).fit() for cleaner syntax when working with DataFrames. For multinomial outcomes (more than two categories), switch to sm.MNLogit. For ordinal outcomes (ordered categories like satisfaction ratings), use sm.OrderedModel.

Time series with ARIMA

statsmodels is one of the few Python libraries that provides production-grade time series models. ARIMA (AutoRegressive Integrated Moving Average) handles stationary and differenced time series. SARIMAX extends that to seasonal patterns and external regressors. Both are available through the sm.tsa namespace.

Stationarity testing (always do this first)

Before fitting any ARIMA model, you need to determine whether your series is stationary. A stationary series has constant mean and variance over time. If it is not stationary, the ARIMA model's parameters and forecasts will be unreliable. The Augmented Dickey-Fuller (ADF) test checks this formally.

import numpy as np
import statsmodels.api as sm

# Generate a synthetic series with a trend (non-stationary)
np.random.seed(7)
n = 200
y_nonstationary = np.cumsum(np.random.randn(n)) + np.arange(n) * 0.1

# ADF test
adf_result = sm.tsa.adfuller(y_nonstationary)
print(f"ADF Statistic: {adf_result[0]:.4f}")
print(f"p-value:       {adf_result[1]:.4f}")
print(f"Lags used:     {adf_result[2]}")

# p < 0.05 --> stationary (reject unit root)
# p >= 0.05 --> non-stationary (cannot reject unit root)
# If non-stationary, set d=1 (or higher) in ARIMA to difference

Fitting an ARIMA model

# Generate a synthetic AR(2) process
np.random.seed(7)
n = 200
ar_params = [0.75, -0.25]           # AR(2) coefficients
ma_params = [0.0]                   # no MA component
ar = np.r_[1, -np.array(ar_params)] # statsmodels sign convention
ma = np.r_[1,  np.array(ma_params)]

y = sm.tsa.arma_generate_sample(ar=ar, ma=ma, nsample=n, scale=1.0)

# Fit ARIMA(2, 0, 0) -- order=(p, d, q)
arima_model = sm.tsa.ARIMA(y, order=(2, 0, 0))
arima_results = arima_model.fit()

print(arima_results.summary())

# Forecast 10 steps ahead
forecast = arima_results.get_forecast(steps=10)
forecast_mean = forecast.predicted_mean
forecast_ci   = forecast.conf_int()

print(f"\nForecast:\n{forecast_mean}")
print(f"\n95% Confidence Intervals:\n{forecast_ci}")

SARIMAX for seasonal data

Real-world time series often have seasonal patterns -- monthly retail sales, quarterly earnings, weekly web traffic. SARIMAX handles this with a seasonal order parameter that captures repeating patterns at fixed intervals.

# SARIMAX: order=(p,d,q) seasonal_order=(P,D,Q,s)
# s=12 for monthly data with annual seasonality
sarimax_model = sm.tsa.SARIMAX(
    y,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 0, 12),
    enforce_stationarity=False,
    enforce_invertibility=False
)
sarimax_results = sarimax_model.fit(disp=False)
print(sarimax_results.summary())

Note

Choosing the right (p, d, q) order can be challenging. Use sm.tsa.adfuller() to determine d (the differencing order). Then examine the ACF and PACF plots to guide p and q selection: the PACF cuts off at the AR order, and the ACF cuts off at the MA order. For automated order selection, use pm.auto_arima() from the pmdarima library, which wraps statsmodels internally.

Hypothesis testing

statsmodels includes an extensive collection of statistical tests accessible through statsmodels.stats. These tests are not optional extras -- they are the mechanism by which you determine whether your model's claims are statistically supported. Below are several you will reach for regularly.

Testing model coefficients

After fitting any model, you can test whether individual coefficients are zero, or whether a group of coefficients jointly equal some set of values, using .t_test() and .f_test().

# From an earlier OLS results object
# Test whether the 'advertising' coefficient equals zero
t_test = results.t_test('advertising = 0')
print(t_test)

# Joint F-test: advertising and price are both zero
f_test = results.f_test(['advertising = 0', 'price = 0'])
print(f_test)

# Test a specific hypothesis: does advertising have
# more than twice the effect of price?
t_test_custom = results.t_test('advertising - 2 * price = 0')
print(t_test_custom)

The F-test is particularly important for evaluating groups of variables. A common use case is testing whether an entire category of dummy variables (say, region indicators) is jointly significant. If the F-test fails to reject, you can safely drop the entire group -- even if individual dummies appear marginally significant on their own.

Comparing nested models with ANOVA

When one model is a special case of another (the restricted model is "nested" within the full model), the likelihood ratio test provides a principled way to determine whether the additional parameters in the full model are worth including.

from statsmodels.stats.api import anova_lm

# Full model: sales ~ advertising + price
model_full = smf.ols('sales ~ advertising + price', data=df).fit()

# Restricted model: sales ~ advertising
model_restricted = smf.ols('sales ~ advertising', data=df).fit()

# ANOVA comparison (F-test for nested models)
anova_table = anova_lm(model_restricted, model_full)
print(anova_table)

Model comparison workflow

Choosing between models is one of the central tasks in statistical modeling, and the right approach depends on whether the models are nested (one is a special case of the other) or non-nested (they have different predictor sets or functional forms).

AIC and BIC for non-nested comparison

When choosing between two fitted models, information criteria give you a principled way to trade off model fit against complexity. Lower is better for both AIC and BIC, but they differ in philosophy.

# After fitting two models
print(f"Model A -- AIC: {results.aic:.2f}  BIC: {results.bic:.2f}")

# For ARIMA models, same attributes apply
print(f"ARIMA   -- AIC: {arima_results.aic:.2f}  BIC: {arima_results.bic:.2f}")

AIC (Akaike Information Criterion) rewards predictive accuracy and tends to select slightly more complex models. BIC (Bayesian Information Criterion) applies a heavier penalty for model complexity and tends to select simpler models. As Burnham and Anderson argued in their foundational text on the subject, AIC is grounded in minimizing information loss (source: Burnham, K.P. and Anderson, D.R., "Model Selection and Multimodel Inference," Springer, 2002). BIC, by contrast, is better suited for inference when you want to identify the simplest adequate generating process.

Adjusted R-squared for quick comparison

For linear models specifically, adjusted R-squared penalizes for the number of predictors and provides a quick comparison metric. Unlike raw R-squared, it can decrease when an irrelevant predictor is added, making it a better guard against overfitting in an explanatory context.

# Compare models by adjusted R-squared
print(f"Full model adj R-sq:       {model_full.rsquared_adj:.4f}")
print(f"Restricted model adj R-sq: {model_restricted.rsquared_adj:.4f}")

Pro Tip

Do not rely on a single metric. A strong model selection workflow combines the F-test for nested comparisons, AIC/BIC for non-nested comparisons, diagnostic tests to verify assumptions, and adjusted R-squared as a quick sanity check. When the metrics disagree, the scientific question should drive the final decision.

Key takeaways

statsmodels is built for inference, not prediction: Use it when you need p-values, confidence intervals, and validated statistical tests -- not when you need to maximize predictive accuracy on held-out data. If you need both inference and prediction, use both statsmodels and scikit-learn.
Always add a constant for array-based models: sm.add_constant(X) is required when you use sm.OLS, sm.Logit, or any other array-based estimator. The formula API handles this automatically.
Think in chains -- assumption, diagnostic, remedy: Every OLS assumption has a diagnostic test and a corresponding fix. Normality: Jarque-Bera then transform or bootstrap. Heteroscedasticity: Breusch-Pagan or White, then use HC3 robust standard errors. Autocorrelation: Ljung-Box, then use HAC or switch to a time series model. Multicollinearity: VIF, then drop predictors or regularize.
Robust standard errors are cheap insurance: Using cov_type='HC3' costs almost nothing in terms of estimation but protects against misspecified variance structures. Consider making it a default in your workflow.
Time series support is first-class: ARIMA, SARIMAX, VAR, and VECM are all available and production-ready. Always test for stationarity with sm.tsa.adfuller() before choosing your model specification.
Model comparison requires the right tool: Use F-tests or likelihood ratio tests for nested models. Use AIC/BIC for non-nested models. Use adjusted R-squared as a quick sanity check, not as a primary selection criterion.
The formula API reduces errors: Writing smf.ols('y ~ x1 + x2', data=df) is cleaner, less error-prone, and handles categorical variables and interactions automatically through Patsy.

statsmodels occupies a specific and important role in the Python data science toolkit. It is the library you reach for when you need to do real statistics -- when the output must be defensible, reproducible, and interpretable. Version 0.14.6 keeps it fully compatible with the modern Python 3.14 and NumPy 2 ecosystem, and the library is released under the BSD 3-Clause license, so there is no barrier to using it in new projects. When your work requires explaining why a relationship exists rather than simply predicting what comes next, statsmodels is where that work happens.