Introduction
NumPy is the fundamental package for numerical computing in Python. It provides N‑dimensional arrays, fast vectorized operations, linear algebra, random sampling, and tight integration with the scientific Python ecosystem.
Why NumPy matters:
- Dense, contiguous memory layout for speed and cache efficiency
- Vectorization removes Python loops and boosts performance
- Broadcasting enables operations across different shapes
- Foundation for Pandas, SciPy, scikit‑learn, and many ML libraries
Installation
# Using pip pip install numpy # Using conda conda install numpy
For reproducible environments, pin versions in requirements.txt or use conda environments.
Arrays — Basics
import numpy as np # create arrays arr1 = np.array([1,2,3]) arr2 = np.array([[1,2],[3,4]]) print(arr1.shape, arr2.shape) print(arr2[1,1])
NumPy arrays are homogeneous and support fast element‑wise operations. Use them instead of Python lists for numerical workloads.
Array Attributes
import numpy as np arr = np.array([[1,2,3],[4,5,6]]) print(arr.ndim) # number of dimensions print(arr.shape) # rows & columns print(arr.size) # total elements print(arr.dtype) # data type print(arr.itemsize)# memory per element
Understanding shape, stride, and dtype is crucial for performance tuning and correct reshaping.
Data Types & Type Conversion
arr = np.array([1, 2, 3]) print(arr.dtype) arr_float = arr.astype(float) print(arr_float, arr_float.dtype)
np.array([1,2,3], dtype=np.int32) np.array([1.5, 2.3], dtype=np.int64)
Choosing the right dtype can reduce memory usage dramatically and speed up computations.
Array Operations
arr = np.array([1,2,3]) print(arr + 5) print(arr * 2) print(arr ** 2) print(arr.mean(), arr.sum(), arr.max())
Operations are vectorized and run in compiled C loops, which is much faster than Python loops.
Indexing & Slicing
arr = np.array([10,20,30,40,50]) print(arr[0], arr[-1]) print(arr[1:4]) arr[2:4] = 100, 200 print(arr)
Slicing creates a view, not a copy. Mutating a slice affects the original array unless you copy.
Boolean Indexing & Masking
arr = np.array([10, 20, 30, 40, 50]) mask = arr > 25 print(mask) print(arr[mask])
arr[arr < 30] = 0 print(arr)
Boolean masks are powerful for filtering, cleaning, and conditional updates without loops.
Multi-Dimensional Arrays
mat = np.array([[1,2,3],[4,5,6]]) print(mat.shape) print(mat[:,1]) # second column print(mat[1,:]) # second row
Multi‑dimensional slicing uses : and index lists to target rows, columns, or sub‑matrices.
Reshape & Flatten
arr = np.arange(12) arr2d = arr.reshape(3,4) flat = arr2d.flatten() print(arr2d) print(flat)
Use reshape for views when possible; flatten always returns a copy.
Stacking & Splitting Arrays
a = np.array([1,2]) b = np.array([3,4]) h = np.hstack([a,b]) v = np.vstack([a,b]) print(h,v) split = np.array_split(h,2) print(split)
Stacking combines arrays along different axes, while splitting divides arrays into smaller chunks.
Sorting, Searching & Filtering
arr = np.array([40, 10, 30, 20]) print(np.sort(arr)) print(np.argsort(arr))
print(np.where(arr > 25)) print(np.any(arr > 50)) print(np.all(arr > 5))
Use argsort to get sorted indices and where for conditional selection.
Array Creation Functions
NumPy provides many built-in functions to create arrays efficiently.
🔹 np.array()
import numpy as np arr = np.array([1, 2, 3, 4]) print(arr)
🔹 np.zeros()
zeros = np.zeros((2,3)) print(zeros)
🔹 np.ones()
ones = np.ones((3,2)) print(ones)
🔹 np.full()
filled = np.full((2,2), 9) print(filled)
🔹 np.arange()
arr = np.arange(0, 10, 2) print(arr)
🔹 np.linspace()
arr = np.linspace(0, 1, 5) print(arr)
These constructors are optimized and often faster than manually building Python lists.
Universal Functions (ufuncs)
Universal functions apply element-wise operations.
arr = np.array([1, 4, 9, 16]) print(np.sqrt(arr)) print(np.square(arr)) print(np.abs([-3, -7, 4]))
a = np.array([1,2,3]) b = np.array([4,5,6]) print(np.add(a,b)) print(np.subtract(a,b)) print(np.multiply(a,b)) print(np.divide(a,b))
Ufuncs support broadcasting and can operate efficiently on large arrays.
Mathematical Functions
arr = np.array([1,2,3]) print(np.sqrt(arr)) print(np.exp(arr)) print(np.log(arr)) print(np.sin(arr))
These functions apply element‑wise and are optimized for performance and numerical stability.
Random Numbers
print(np.random.rand(3)) print(np.random.randint(0,10,5)) print(np.random.randn(4))
For reproducibility, set a seed with np.random.seed(42) or use the newer default_rng.
Linear Algebra
mat = np.array([[1,2],[3,4]]) inv = np.linalg.inv(mat) det = np.linalg.det(mat) eig = np.linalg.eig(mat) print(inv, det, eig)
NumPy’s linear algebra module supports matrix decomposition, eigenvalues, and solving systems.
Statistical Functions
data = np.array([10, 20, 30, 40, 50]) print(np.mean(data)) print(np.median(data)) print(np.std(data)) print(np.var(data)) print(np.min(data)) print(np.max(data)) print(np.percentile(data, 75))
Use these functions for quick descriptive statistics and data exploration.
Copy vs View
a = np.array([1,2,3,4]) b = a.view() c = a.copy() b[0] = 100 print(a) # changed c[1] = 200 print(a) # not changed
Views are efficient but can cause unexpected mutations if you’re not careful.
Why NumPy is Faster than Python Lists
import time
import numpy as np
size = 1000000
list1 = list(range(size))
start = time.time()
[x*2 for x in list1]
print("List time:", time.time()-start)
arr = np.arange(size)
start = time.time()
arr * 2
print("NumPy time:", time.time()-start)
Vectorized operations and contiguous memory access provide significant speedups for large arrays.
Advanced NumPy
- Broadcasting — operations between arrays of different shapes
- Masking & Boolean indexing
- Structured arrays & record arrays
- Vectorization — replace loops with array operations
- Integration with Pandas, SciPy, Matplotlib
arr = np.array([1,2,3,4]) mask = arr > 2 print(arr[mask]) a = np.array([[1,2],[3,4]]) b = np.array([10,20]) print(a + b) # broadcasting
Broadcasting and vectorization are the key to writing fast, clean NumPy code.
NumPy Mastery Checklist
- Array creation functions
- Indexing & slicing
- Broadcasting
- ufuncs & vectorization
- Statistical methods
- Linear algebra basics
- Boolean masking
Review these topics regularly and practice with real datasets to build intuition.