NumPy's ndarray is the foundation of nearly every serious Python data workflow. Once you understand how multi-dimensional arrays are structured and how NumPy thinks about them, everything from image processing to machine learning model inputs starts to make a lot more sense.
Python lists are flexible and intuitive, but they were not built for numerical computing. When you store numbers in a Python list, each element is a full Python object with its own memory overhead. NumPy sidesteps this entirely by storing data in a contiguous block of typed memory — a design borrowed from C — which makes arithmetic on large collections of numbers orders of magnitude faster. That underlying data structure is the ndarray, and everything in this article builds on it.
The ndarray: How NumPy Stores Data
An ndarray (n-dimensional array) is a grid of values, all of the same data type, indexed by a tuple of non-negative integers. The number of dimensions is called the rank. The size along each dimension is described by the array's shape, which is always expressed as a tuple.
A one-dimensional array has a shape like (5,). A two-dimensional array — what you would normally call a matrix — has a shape like (3, 4), meaning 3 rows and 4 columns. A three-dimensional array might have a shape like (2, 3, 4), which you can think of as 2 layers, each containing a 3x4 grid.
NumPy arrays are homogeneous — every element must be the same data type. This constraint is exactly what makes them fast. When NumPy knows every value in an array is a 64-bit float, it can pass the entire block of memory directly to optimized C and Fortran routines.
Three attributes you will reach for constantly are ndim (the number of dimensions), shape (a tuple describing the size of each dimension), and dtype (the data type of the elements). Getting comfortable reading these three values is the first step toward reasoning about any unfamiliar array.
import numpy as np
# A simple 2D array
a = np.array([[1, 2, 3],
[4, 5, 6]])
print(a.ndim) # 2
print(a.shape) # (2, 3)
print(a.dtype) # int64
Creating Arrays and Matrices
NumPy gives you several ways to create arrays without entering every value by hand. The right approach depends on what you are trying to set up.
From Python sequences
The most direct route is passing a Python list or list of lists to np.array(). NumPy infers the shape and dtype automatically. Nesting one more level of lists adds another dimension.
import numpy as np
# 1D array
v = np.array([10, 20, 30, 40])
# 2D array (matrix): 3 rows, 4 columns
m = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# 3D array: 2 layers of 2x3
t = np.array([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[10, 11, 12]]])
print(t.shape) # (2, 2, 3)
Initializer functions
When you need an array of a specific shape but do not yet have the values, NumPy's initializer functions are the right tool. Pass a shape tuple to any of these and you get a ready-to-fill array back.
import numpy as np
# All zeros
zeros = np.zeros((3, 4))
# All ones
ones = np.ones((2, 5))
# Uninitialized (faster than zeros when values will be overwritten)
empty = np.empty((4, 4))
# Identity matrix (square, 1s on the diagonal)
identity = np.eye(4)
# Filled with a constant
fives = np.full((3, 3), 5)
print(identity)
# [[1. 0. 0. 0.]
# [0. 1. 0. 0.]
# [0. 0. 1. 0.]
# [0. 0. 0. 1.]]
Range and linspace
np.arange() works like Python's built-in range() but returns an array. np.linspace() generates a specified number of evenly spaced values between a start and stop — useful any time you need a smooth numeric range.
import numpy as np
# Values from 0 to 9
r = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Values from 0 to 1 with step 0.25
r2 = np.arange(0, 1, 0.25) # array([0. , 0.25, 0.5 , 0.75])
# 5 evenly spaced points between 0 and 1 (inclusive)
ls = np.linspace(0, 1, 5) # array([0. , 0.25, 0.5 , 0.75, 1. ])
Random arrays
The np.random submodule generates arrays filled with random values. The recommended approach since NumPy 1.17 is to use a Generator object rather than the legacy top-level functions, as it gives you better statistical properties and reproducibility control.
import numpy as np
rng = np.random.default_rng(seed=42)
# Uniform floats between 0 and 1
uniform = rng.random((3, 3))
# Integers between 0 and 9
ints = rng.integers(0, 10, size=(2, 4))
# Standard normal distribution
normal = rng.standard_normal((3, 3))
Always set a seed when your code needs to be reproducible — in tests, tutorials, or any time someone else needs to run your script and get the same output. Pass an integer to np.random.default_rng(seed=42) and your random arrays will be identical every run.
Indexing, Slicing, and Reshaping
Accessing and rearranging array data is something you will do constantly. NumPy extends Python's familiar slice syntax into multiple dimensions, and adds a few powerful tools that have no equivalent in plain Python.
Basic indexing
For a 2D array, you specify the row index first, then the column. NumPy accepts both comma-separated indices and chained bracket notation, but the comma style is preferred because it is cleaner and slightly faster.
import numpy as np
m = np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
print(m[0, 1]) # 20 (row 0, column 1)
print(m[2, 2]) # 90 (row 2, column 2)
print(m[-1, -1]) # 90 (last row, last column)
Slicing
Slices work per dimension, separated by commas. The colon : alone means "all elements along this dimension." You can combine slices with specific indices freely.
import numpy as np
m = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# First two rows, all columns
print(m[:2, :])
# [[ 1 2 3 4]
# [ 5 6 7 8]]
# All rows, last two columns
print(m[:, 2:])
# [[ 3 4]
# [ 7 8]
# [11 12]]
# Middle block: rows 0-1, columns 1-2
print(m[:2, 1:3])
# [[2 3]
# [6 7]]
NumPy slices return views, not copies. Modifying a slice modifies the original array. If you need an independent copy, use .copy() explicitly: sub = m[:2, :].copy().
Boolean indexing
You can filter an array using a boolean condition. NumPy evaluates the condition element-wise and returns only the elements where the condition is True. This is one of the most practical tools in the entire library.
import numpy as np
a = np.array([3, 15, 7, 42, 1, 99, 8])
# Elements greater than 10
print(a[a > 10]) # [15 42 99]
# Apply the same mask to replace values
a[a > 10] = 0
print(a) # [ 3 0 7 0 1 0 8]
Reshaping
reshape() changes the shape of an array without changing its data. The total number of elements must stay the same. Use -1 as a placeholder for any dimension you want NumPy to calculate automatically.
import numpy as np
a = np.arange(12)
print(a.shape) # (12,)
# Reshape to 3 rows, 4 columns
b = a.reshape(3, 4)
print(b.shape) # (3, 4)
# Let NumPy calculate the number of rows
c = a.reshape(-1, 3)
print(c.shape) # (4, 3)
# Flatten back to 1D
d = b.flatten()
print(d.shape) # (12,)
reshape() returns a view when possible; flatten() always returns a copy. If memory is tight and you just need a 1D version for iteration, use ravel() instead — it returns a view when it can.
Matrix Operations
NumPy treats 2D arrays as matrices and provides the full suite of linear algebra operations. There is an important distinction to keep in mind: the standard arithmetic operators (+, -, *, /) all operate element-wise, not as matrix math. For true matrix multiplication, you need @ or np.dot().
Element-wise arithmetic
import numpy as np
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])
print(A + B)
# [[ 6 8]
# [10 12]]
print(A * B) # Element-wise multiplication, NOT matrix product
# [[ 5 12]
# [21 32]]
print(A / 2) # Scalar division applied to every element
# [[0.5 1. ]
# [1.5 2. ]]
Matrix multiplication
The @ operator (introduced in Python 3.5) performs matrix multiplication. For two 2D arrays of shapes (m, k) and (k, n), the result has shape (m, n). The inner dimensions must match.
import numpy as np
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])
print(A @ B)
# [[19 22]
# [43 50]]
# Equivalent using np.dot
print(np.dot(A, B))
# [[19 22]
# [43 50]]
Transpose
The transpose of a matrix flips it along its diagonal — rows become columns and columns become rows. Use the .T attribute or np.transpose().
import numpy as np
M = np.array([[1, 2, 3],
[4, 5, 6]])
print(M.shape) # (2, 3)
print(M.T.shape) # (3, 2)
print(M.T)
# [[1 4]
# [2 5]
# [3 6]]
Linear algebra with numpy.linalg
The np.linalg submodule handles more advanced operations: determinants, inverses, eigenvalues, and solving systems of linear equations.
import numpy as np
A = np.array([[3., 1.],
[1., 2.]])
# Determinant
print(np.linalg.det(A)) # 5.0
# Inverse
print(np.linalg.inv(A))
# [[ 0.4 -0.2]
# [-0.2 0.6]]
# Solve Ax = b for x
b = np.array([9., 8.])
x = np.linalg.solve(A, b)
print(x) # [2. 3.]
# Verify: A @ x should equal b
print(A @ x) # [9. 8.]
Aggregation along axes
Functions like sum(), mean(), max(), and min() accept an axis argument that controls which dimension gets collapsed. axis=0 operates down the rows (column-wise result); axis=1 operates across columns (row-wise result).
import numpy as np
M = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(M.sum()) # 45 (total)
print(M.sum(axis=0)) # [12 15 18] (sum of each column)
print(M.sum(axis=1)) # [ 6 15 24] (sum of each row)
print(M.mean(axis=0)) # [4. 5. 6.] (mean of each column)
Broadcasting
Broadcasting is how NumPy handles arithmetic between arrays of different shapes without copying data. It is one of the more conceptually unfamiliar parts of NumPy, but once it clicks it becomes indispensable.
The core idea is that when NumPy encounters two arrays with mismatched shapes, it attempts to "stretch" the smaller array across the larger one so their shapes align — but only along dimensions where the smaller array has size 1 (or no dimension at all). No actual data is duplicated in memory.
"Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python." — NumPy documentation
The rules NumPy follows: if the arrays have different numbers of dimensions, the shape of the smaller one is padded with 1s on the left. Then, for each dimension, the sizes must either match, or one of them must be 1. If neither condition holds, NumPy raises a ValueError.
import numpy as np
# Adding a scalar to a 2D array
M = np.ones((3, 3))
print(M + 10)
# [[11. 11. 11.]
# [11. 11. 11.]
# [11. 11. 11.]]
# Adding a 1D array (shape (3,)) to a 2D array (shape (3,3))
# The 1D array broadcasts across each row
row = np.array([1, 2, 3])
print(M + row)
# [[2. 3. 4.]
# [2. 3. 4.]
# [2. 3. 4.]]
# Adding a column vector (shape (3,1)) to a 2D array (shape (3,3))
# The column broadcasts across each column
col = np.array([[10], [20], [30]])
print(M + col)
# [[11. 11. 11.]
# [21. 21. 21.]
# [31. 31. 31.]]
A practical use case for broadcasting is mean-centering a dataset. If X is a 2D array where each row is an observation and each column is a feature, you can subtract the column means in one line: X_centered = X - X.mean(axis=0). NumPy broadcasts the 1D mean array across all rows automatically.
A practical broadcasting example
import numpy as np
# Dataset: 5 samples, 3 features
X = np.array([[2., 8., 1.],
[4., 6., 3.],
[6., 4., 5.],
[8., 2., 7.],
[10., 0., 9.]])
# Column means — shape (3,)
col_means = X.mean(axis=0)
print(col_means) # [6. 4. 5.]
# Mean-center the data — broadcasting subtracts col_means from every row
X_centered = X - col_means
print(X_centered)
# [[-4. 4. -4.]
# [-2. 2. -2.]
# [ 0. 0. 0.]
# [ 2. -2. 2.]
# [ 4. -4. 4.]]
# Verify: column means of centered data should be ~0
print(X_centered.mean(axis=0)) # [0. 0. 0.]
Key Takeaways
- Know your shape: Check
.ndim,.shape, and.dtypewhenever you encounter an unfamiliar array. Understanding the structure before operating on data prevents many common errors. - Use initializer functions:
np.zeros(),np.ones(),np.eye(), andnp.arange()let you build arrays of any shape quickly without writing out values by hand. - Slices are views: NumPy slices share memory with the original array. Call
.copy()when you need an independent array that will not affect the source. - Use
@for matrix multiplication: The*operator is element-wise. The@operator (ornp.dot()) performs true matrix multiplication. Knowing the difference will save you from subtle, hard-to-debug errors. - Broadcasting eliminates loops: When you find yourself writing a loop to apply an operation row-by-row or column-by-column, look for a broadcasting solution first. It will almost always be faster and shorter.
NumPy's ndarray is a genuinely powerful abstraction. The array manipulation patterns covered here — creating, indexing, reshaping, and operating on multi-dimensional data — form the backbone of libraries like pandas, scikit-learn, and PyTorch. Getting fluent with them at the NumPy level means you will be better prepared to understand what those higher-level tools are doing under the hood.