Using Python in Power BI: The Complete, No-Nonsense Guide

Power BI handles dashboards and drag-and-drop visualizations with ease. Python handles everything else. When you bring them together, you get a business intelligence environment that can ingest data from virtually any source, run statistical models and machine learning algorithms, and render visualizations that Power BI's native tooling simply cannot produce on its own.

This is not a surface-level overview. We are going to walk through exactly how the integration works under the hood — including the CSV-based data exchange mechanism that most documentation glosses over — where Python fits into real workflows, what Python Enhancement Proposals (PEPs) make the whole thing possible at a language level, and what the current state of affairs looks like in 2026, including significant deprecation changes, the PBIR format shift, and how Copilot is reshaping where Python fits in the broader analytics stack.

Why Python Belongs in Your Power BI Workflow

Power BI speaks DAX and M (Power Query). These are powerful within their domain, but they have hard ceilings. DAX does not do regression analysis. M does not train a clustering model. Neither of them can reach into an obscure legacy API, parse a nested JSON response, and reshape it into a clean tabular format in ten lines of code.

Python can.

In a 2020 Dropbox Blog profile, Python creator Guido van Rossum offered his theory on why the language took over data science so completely. He argued that once Python reached a critical mass of adoption in scientific computing, network effects locked it in: it became easier for every new practitioner to join the Python ecosystem than to bet on an alternative. That critical mass was built on the back of one architectural decision — Python was designed from the ground up to support deep integration with independently developed third-party libraries. Libraries like NumPy, pandas, and scikit-learn were not built by the Python core team. They were built by the scientific computing community, in Python, because Python made that possible. That same extensibility is precisely what makes the Python-Power BI integration work.

When you write a Python script inside Power BI, you are not using some watered-down subset of the language. You are running actual Python on your local machine (or in a Fabric notebook in the cloud), with full access to the ecosystem. The bridge between the two systems is the pandas DataFrame — and that bridge rests on foundational Python specifications that are worth understanding.

The PEP Foundations That Make It Work

If you want to understand why Python integrates so cleanly with Power BI, you need to look at a few key PEPs (Python Enhancement Proposals) that established the standards underpinning the integration.

PEP 249: The Python Database API Specification v2.0

PEP 249 is arguably the most directly relevant specification. It defines a common interface for Python packages that connect to relational databases. Every time you use pyodbc, sqlite3, psycopg2, or any other database connector in a Python script inside Power BI, you are relying on a module that implements this specification.

PEP 249 standardizes how connection objects are created, how cursors work, how queries are executed, and how results are fetched. It defines the constructor connect(), which returns a Connection object, and lays out the full exception hierarchy — from InterfaceError for problems with the API layer to ProgrammingError for issues like a malformed SQL statement.

Why does this matter for Power BI? Because one of the primary use cases for Python in Power BI is as a data source connector. Power BI natively supports dozens of connectors, but it does not support everything. If your organization stores data in an obscure database system, a legacy flat-file format, or a proprietary API, you can write a Python script that uses a PEP 249-compliant module to connect, query, and return the data as a pandas DataFrame. Power BI consumes that DataFrame as if it came from any native connector. Here is a practical example:

import pandas as pd
import pyodbc

conn_string = (
    "DRIVER={ODBC Driver 18 for SQL Server};"
    "SERVER=your_server.database.windows.net;"
    "DATABASE=SalesDB;"
    "UID=analyst;"
    "PWD=your_password;"
    "Encrypt=yes;"
    "TrustServerCertificate=no;"
)

conn = pyodbc.connect(conn_string)

query = """
    SELECT 
        OrderDate,
        ProductCategory,
        SUM(Revenue) AS TotalRevenue,
        COUNT(DISTINCT CustomerID) AS UniqueCustomers
    FROM Sales.Orders
    WHERE OrderDate >= '2025-01-01'
    GROUP BY OrderDate, ProductCategory
"""

df = pd.read_sql(query, conn)
conn.close()

When you paste this into Power BI's "Get Data > Python script" dialog and click OK, Power BI executes it, finds the df variable (a pandas DataFrame), and presents it for loading into the data model. The PEP 249-compliant pyodbc module handles the database handshake, and pandas handles the translation into a tabular format that Power BI understands.

PEP 484: Type Hints

PEP 484, authored by Guido van Rossum, Jukka Lehtosalo, and Lukasz Langa in 2014, introduced a standard syntax for type annotations in Python. While this does not directly affect Power BI's runtime behavior, it has had a massive indirect impact on code quality in data science workflows.

When you write Python scripts for Power BI — especially complex transformation or machine learning scripts — type hints make your code dramatically more maintainable. Consider the difference between these two function signatures:

# Without type hints
def prepare_forecast(data, periods, freq):
    ...

# With type hints
def prepare_forecast(
    data: pd.DataFrame,
    periods: int,
    freq: str = "M"
) -> pd.DataFrame:
    ...

The second version communicates exactly what goes in, what comes out, and what the default behavior is. When your Power BI report depends on a Python transformation script, and someone else on your team needs to modify it six months from now, type hints are the difference between clarity and confusion.

PEP 484 built on PEP 3107, which introduced the syntax for function annotations back in 2006 but deliberately left the semantics undefined. PEP 484 gave those annotations a concrete meaning. Later, PEP 526 extended the concept to variable annotations, and PEP 563 proposed postponed evaluation of annotations to resolve forward reference issues. Together, these PEPs created the type hinting ecosystem that tools like mypy, Pylance, and modern IDEs rely on — the same tools you should be using when developing Python scripts destined for Power BI.

PEP 20: The Zen of Python

This one is philosophical, but it matters. PEP 20 — Tim Peters' collection of guiding principles — includes the aphorism: "There should be one — and preferably only one — obvious way to do it." When you are writing Python for Power BI, this principle is your guardrail against over-engineering. Power BI already has DAX. It already has M. You should reach for Python when there is no clean way to accomplish the task with native tools — not as a replacement for them.

Three Modes of Python Integration in Power BI Desktop

Power BI Desktop supports three distinct modes of Python integration, each serving a different stage of the analytics workflow.

Mode 1: Python as a Data Source

Navigate to Home > Get Data > Other > Python script. This opens a blank editor where you write (or paste) a Python script. The script must produce one or more pandas DataFrames. Power BI detects all DataFrame variables in the script's namespace and presents them for selection.

This mode is ideal for connecting to data sources that Power BI does not natively support, generating synthetic data for testing, or pulling from APIs that require custom authentication logic.

Note

Power BI will only recognize pandas.DataFrame objects. If your script produces a NumPy array, a list of dictionaries, or any other structure, you must convert it to a DataFrame before Power BI can use it. The Microsoft documentation is explicit about this requirement.

import pandas as pd
import requests

# Pull data from a REST API
response = requests.get(
    "https://api.example.com/v2/quarterly-metrics",
    headers={"Authorization": "Bearer YOUR_TOKEN"}
)
data = response.json()

# Convert nested JSON to a flat DataFrame
df_metrics = pd.json_normalize(
    data["results"],
    record_path=["metrics"],
    meta=["region", "quarter"]
)

Mode 2: Python as a Transformation Step

Once data is loaded into Power Query Editor (from any source — Excel, SQL Server, a CSV file), you can apply a Python transformation step. Go to Transform > Run Python script. Power BI injects the current query's data into your script as a pandas DataFrame stored in a variable called dataset. You manipulate it and return one or more DataFrames.

This mode is where Python really flexes its muscles. Need to impute missing values using a machine learning model? Need to apply a complex text parsing routine with regular expressions? Need to perform fuzzy matching between two datasets? All of these are trivial in Python and either impossible or painfully awkward in M.

import pandas as pd
from sklearn.impute import KNNImputer

# 'dataset' is injected by Power BI
df = dataset.copy()

# Impute missing numeric values using K-Nearest Neighbors
numeric_cols = df.select_dtypes(include=["float64", "int64"]).columns
imputer = KNNImputer(n_neighbors=5)
df[numeric_cols] = imputer.fit_transform(df[numeric_cols])
Pro Tip

The data exchange between Power BI and Python happens through a temporary CSV file. Power BI writes the data to disk, Python reads it, processes it, writes the result back, and Power BI reads it again. For large datasets — think millions of rows — this roundtrip can become a bottleneck. Filter your data upstream before it reaches the Python script.

Mode 3: Python Visuals

Select the Python visual icon (labeled "Py") from the Visualizations pane. This inserts a placeholder visual on the canvas and opens the Python script editor at the bottom. You drag fields from your data model into the Values well, and Power BI auto-generates binding code that creates a dataset DataFrame from those fields. You then write a script using matplotlib, seaborn, plotly, or any other plotting library to generate a static image.

import matplotlib.pyplot as plt
import seaborn as sns

# 'dataset' is auto-generated by Power BI from selected fields
fig, ax = plt.subplots(figsize=(10, 6))

sns.violinplot(
    data=dataset,
    x="Department",
    y="Salary",
    hue="Gender",
    split=True,
    ax=ax,
    palette="muted"
)

ax.set_title("Salary Distribution by Department and Gender")
ax.set_ylabel("Annual Salary (USD)")
plt.tight_layout()
plt.show()

Python visuals respond to filters and slicers in the report — when you change a filter, Power BI re-executes the script with the updated data. However, the visuals are rendered as static images and do not support cross-filtering. You cannot click on a bar in a Python visual and have other visuals on the page filter accordingly.

Licensing, Limits, and Constraints You Need to Know

Python in Power BI is not without guardrails. Here are the hard limits documented by Microsoft:

  • Python data source and transformation scripts in Power Query time out after 30 minutes. Python visuals time out after 5 minutes in Power BI Desktop and 1 minute in the Power BI Service — a distinction the official documentation makes clearly, and one that catches teams off guard when they develop in Desktop and then publish.
  • Python visuals in the Power BI Service are subject to a 30MB payload limit covering both compressed input data and the script itself. In Power BI Desktop, the input data limit for Python visuals is 250MB.
  • The input data for Python visuals is capped at 150,000 rows; anything beyond that is silently truncated.
  • In Python visual input datasets, string values longer than 32,766 characters in a single cell are truncated. This is also the general truncation limit when Power Query loads text into the data model.
  • Python visuals in the Service require a Fabric, Pro, Premium Per User (PPU), or Premium capacity license and are only supported in regions where Fabric is available.
  • Refresh of reports containing Python scripts in the Power BI Service is only supported through a personal gateway — the enterprise/standard gateway is not supported.
  • Only packages published on PyPI with free and open-source licenses are supported in the Power BI Service. Networking is blocked, meaning packages that make client-server queries over the web will not function in the cloud.

The Deprecation You Cannot Ignore: Embedded Python Visuals

Deprecation Warning

Starting May 2026, Power BI will end support for embedding reports containing Python (and R) visuals in the "Embed for your customers" scenario (also called "app owns data") and Publish to web scenarios. After the deprecation date, embedded reports containing Python visuals will still load, but the Python charts will render as blank.

In November 2025, Microsoft announced this significant change in their Power BI Feature Summary. Reports without Python visuals will be unaffected. This does not impact "Embed for your organization" (user owns data) scenarios, nor does it affect secure embedding to SharePoint, websites, or portals.

But if your organization uses the "app owns data" embedding pattern and any of those reports contain Python visuals, you have until May 2026 to migrate those visuals to native Power BI alternatives or DAX-based analytics. Microsoft's recommendation is to consider leveraging Fabric Notebooks for sophisticated or technical visualizations going forward.

The Future: Microsoft Fabric and Semantic Link

The direction Microsoft is heading is clear: Python's role is expanding, but it is shifting from embedded-in-the-report scripting toward a more robust, cloud-native notebook experience within Microsoft Fabric.

In November 2024, Microsoft announced the public preview of Python Notebooks in Fabric. These are pure Python environments (no Spark required) running on lightweight 2-vCore/16GB compute nodes. They support Python 3.10 and 3.11, native IPython features, and direct integration with Fabric Lakehouses and Data Warehouses.

The key innovation is Semantic Link — a feature that connects Power BI semantic models directly to Fabric notebooks. The SemPy Python library, built on top of Semantic Link, introduces a FabricDataFrame object that behaves like a pandas DataFrame but carries metadata from Power BI semantic models — relationships, hierarchies, and column descriptions. You can retrieve and evaluate DAX measures from your Power BI models directly in Python, without reimplementing the business logic.

At FabCon 2025, Microsoft CVP Arun Ulag addressed concerns about Power BI's future within the Fabric ecosystem, stating that Power BI remains critically important to Microsoft and that investment in the product has only grown since Fabric launched. This is relevant for Python users because it means the integration story is not going away — it is maturing. Rather than running Python scripts inside Power BI Desktop with CSV-based data exchange, the architecture is moving toward running Python in Fabric Notebooks with direct API access to semantic models, lakehouse data, and machine learning infrastructure, then surfacing the results in Power BI reports.

A Real-World Workflow: Predictive Churn Scoring

To tie this together, here is a realistic end-to-end workflow that demonstrates where Python adds value that Power BI alone cannot.

The goal: A telecom company wants to predict which customers are likely to churn in the next 90 days and surface those predictions in an executive Power BI dashboard.

Step 1 — Data Ingestion (Python as Data Source): A Python script in Power BI connects to the company's PostgreSQL customer database using psycopg2 (a PEP 249-compliant module), pulls customer demographics, usage patterns, and support ticket history, and returns a consolidated DataFrame.

Step 2 — Feature Engineering (Python Transformation): A second Python script in Power Query applies feature engineering that is impractical in M: calculating rolling 30-day averages of usage metrics, encoding categorical variables, and normalizing numeric features using scikit-learn's StandardScaler.

Step 3 — Model Scoring (Python Transformation): A pre-trained Random Forest model (saved as a pickle file) is loaded and used to score each customer with a churn probability. The output DataFrame includes the original customer data plus a new churn_risk_score column.

Step 4 — Visualization (Native Power BI + Python Visual): The dashboard uses native Power BI visuals for KPI cards, slicers, and bar charts. A Python visual provides a SHAP (SHapley Additive exPlanations) waterfall plot showing the top features driving churn risk for a selected customer segment — a visualization type that Power BI does not offer natively.

import matplotlib.pyplot as plt
import shap
import pickle
import pandas as pd

# Load the pre-trained model
with open(r"C:\Models\churn_rf_model.pkl", "rb") as f:
    model = pickle.load(f)

# 'dataset' provided by Power BI
X = dataset.drop(columns=["CustomerID", "churn_risk_score"])

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)

fig, ax = plt.subplots(figsize=(10, 8))
shap.summary_plot(
    shap_values[1],
    X,
    plot_type="bar",
    show=False
)
plt.title("Top Features Driving Churn Risk")
plt.tight_layout()
plt.show()

What Actually Happens When Power BI Runs Your Python Script

There is a detail about the Python integration that almost no documentation spells out clearly, and it has real implications for both performance and security: Power BI communicates with Python through temporary CSV files on disk. When you run a Python transformation step, Power BI serializes the current query's data to a temporary CSV in a system temp directory, starts a Python subprocess, passes the file path as an environment variable, and waits for Python to write its output back to another CSV. Power BI then reads that output and continues the query pipeline.

This means several things that most tutorials skip over. First, your data leaves Power BI's memory and lands on disk in plaintext, even temporarily. If you are working with sensitive PII or regulated data, that temporary file path is accessible on the local machine for the duration of script execution. This matters for compliance teams evaluating whether Python scripts can be used in a production Power BI Desktop workflow on machines with shared access or endpoint monitoring. The temp files are deleted after execution, but this should be documented in your data handling policies.

Second, this serialization roundtrip is why large DataFrames cause disproportionate performance degradation. CSV serialization is slow, and for a 500MB dataset, the write-to-disk and read-from-disk operations can dwarf the actual computation time inside the Python script. This is not a Python performance problem — it is an I/O problem baked into the integration architecture. The practical ceiling for workable performance in this mode is roughly 100,000 to 150,000 rows with a moderate column count, depending on disk speed.

Third, this architecture explains why the Power BI Service imposes stricter limits than Desktop. In the Service, the temp files are written to an isolated compute container, which has lower I/O throughput and a tighter execution sandbox than a local developer machine. The 1-minute visual timeout in the Service is not arbitrary — it reflects the resource constraints of that containerized environment.

Pro Tip

If your organization has data loss prevention (DLP) policies that monitor temp directory writes, coordinate with your security team before deploying Python-in-Power BI workflows. Document the temp file behavior in your data flow diagrams. The files are ephemeral, but the write events are real.

Setting Up a Proper Python Environment for Power BI

Power BI Desktop uses whatever Python executable it finds at the path you configure in File > Options and settings > Options > Python scripting. If you point it at your system Python, you will eventually break something: a global package upgrade for one project will silently change the behavior of a Power BI report that depends on an older version of the same package.

The correct approach is to create a dedicated virtual environment for Power BI and point Desktop at that environment's Python executable. This is not optional if you take report reproducibility seriously. Here is how to set one up properly:

# Create a dedicated environment for Power BI scripts
python -m venv C:\Envs\powerbi-env

# Activate it
C:\Envs\powerbi-env\Scripts\activate

# Install your required packages pinned to specific versions
pip install pandas==2.2.2 matplotlib==3.9.0 seaborn==0.13.2 scikit-learn==1.5.1 shap==0.46.0

# Freeze the exact environment for documentation
pip freeze > requirements-powerbi.txt

Then in Power BI Desktop, set the Python home directory to C:\Envs\powerbi-env. Power BI will use the python.exe inside that environment's Scripts folder. From this point, any changes you make to your system Python or other projects have zero impact on your Power BI reports.

Keep the requirements-powerbi.txt file in version control alongside your .pbix file. When another team member needs to run or modify the report, they can recreate the exact environment with pip install -r requirements-powerbi.txt.

Compatibility Warning

The Power BI Service does not use your local virtual environment. When you publish a report, Python visuals run in Microsoft's cloud environment using their supported package list. Your local virtual environment governs Desktop behavior only. Always validate published Python visuals against the official supported packages list before deploying to production.

The PBIR Format Change and What It Means for Python Reports

Starting on January 25, 2026, Microsoft began rolling out the Power BI Enhanced Report Format (PBIR) as the default for all new reports created in the Power BI Service, with full deployment completing by the end of February. Power BI Desktop followed with the March 2026 release. Existing reports are automatically converted to PBIR when edited and saved. PBIR remains technically in Preview during this default rollout, with General Availability planned for Q3 2026, at which point PBIR will become the only supported report format. This is relevant to Python users for reasons that most coverage of this change ignores.

PBIR stores report definitions as a folder-based structure rather than a single .pbix binary. Each visual's definition, including the Python script content for Python visuals, is stored as a readable JSON file inside the report folder. This has a meaningful implication: Python scripts inside your Power BI reports are now diffable, reviewable, and version-controllable in Git at the individual script level. Teams that previously had to diff binary .pbix files to audit Python script changes can now do so with standard text tooling.

This also means that script injection — where a malicious actor modifies a Python script inside a .pbix file to exfiltrate data or execute arbitrary code on the analyst's machine — becomes more visible in a proper CI/CD pipeline with PBIR. It is still a theoretical risk worth including in your threat model, but the PBIR format makes code review of embedded Python scripts a real, practical option rather than an afterthought.

Python, Copilot, and the Emerging Hybrid Workflow

Power BI's integration of Copilot — which has been expanding steadily through 2025 and into 2026, including the announcement at FabCon 2025 that Copilot would be available across all Fabric SKUs — creates a new dynamic for Python users. Copilot can now generate DAX and M code, answer natural language questions about your data, and summarize report insights. But it cannot write or debug the Python scripts embedded in your reports. This creates a clear functional boundary: Copilot handles the conversational BI layer, Python handles the programmatic analytics layer, and Semantic Link in Fabric handles the bridge between them.

The emerging pattern for advanced analytics teams is a three-tier workflow. In the Fabric notebook layer, Python handles model training, complex transformations, and data preparation, surfacing results to a lakehouse. In the Power BI layer, DAX and native visuals handle interactive exploration, with Copilot accelerating measure creation and insight summarization. Python visuals in Power BI fill the narrow remaining gap: specialized chart types (SHAP plots, custom statistical distributions, bespoke geospatial renders) that neither native Power BI visuals nor Copilot-generated content can produce.

Understanding this three-tier model is important for deciding when to reach for Python visuals versus when to push work into a Fabric notebook. The rule of thumb: if the Python output is a visualization for consumption in an existing Power BI report, use a Python visual. If the Python work is analytical — training, scoring, complex feature engineering, or anything that benefits from a full notebook execution environment — it belongs in Fabric.

Best Practices for Python in Power BI

Write and test your scripts in an external editor first. Power BI's built-in Python editor has no syntax highlighting, no autocompletion, and no debugger. Use VS Code, PyCharm, or a Jupyter notebook to develop and test your scripts before pasting them into Power BI.

Keep your scripts focused. Do not try to replicate your entire data pipeline in a single Power BI Python script. Use Python for the specific tasks where it adds value — connecting to unsupported sources, applying statistical transformations, or rendering advanced visuals — and let Power BI handle everything else.

Use type hints in your functions (per PEP 484). When your Power BI reports depend on Python scripts that other team members may need to maintain, type-annotated functions are dramatically easier to understand and debug.

Mind the data volume. The CSV-based data exchange mechanism means that performance degrades with large datasets. Filter your data upstream before it reaches the Python script, and avoid loading more rows than the script actually needs.

Pin your Python version and package versions. Power BI uses the local Python installation, and different Python or package versions can produce different results. Document which Python version and package versions your reports depend on, and consider using virtual environments.

Be aware of the Service constraints. A script that works in Power BI Desktop may not work when the report is published to the Power BI Service. Networking is blocked, only supported packages are available, and the execution timeout is stricter.

Where Things Stand

Python's integration with Power BI has been available since 2018, and in the years since, it has gone from a niche feature to a core part of how advanced analytics teams operate. The combination gives you the analytical depth of Python's scientific computing ecosystem with the polished, interactive, enterprise-grade visualization and distribution layer of Power BI.

The landscape is shifting in several directions simultaneously. Microsoft Fabric's Python Notebooks and Semantic Link are moving the heavier Python workloads out of the report and into a proper cloud-native compute environment. The deprecation of Python visuals in embedded scenarios signals that Microsoft sees the future of Python-in-BI as a first-class notebook experience rather than inline scripting. Copilot is handling more of the conversational analytics layer that once required custom tooling. And the PBIR format change is quietly making Python scripts inside Power BI reports reviewable, diffable, and auditable for the first time.

What has not changed is the core contract. Power BI still consumes pandas DataFrames. Data connections still go through PEP 249-compliant modules. Scripts still benefit from PEP 484 type hints for maintainability. Python still solves the problems that DAX and M cannot. And the temporary CSV mechanism that powers the integration is still on disk, still subject to the same I/O constraints, and still worth understanding at that level of detail if you are serious about production deployments.

The question to ask about every Python script in your Power BI workflow is no longer just "does this work?" — it is "does this belong here, or does it belong in a Fabric notebook?" Getting that decision right is where the real leverage is in 2026.

That is real understanding — not a copy-paste tutorial. Now go build something with it.
back to articles