Numpy#

While Python’s built-in data structures e.g. list, are powerful and flexible, sometimes you need even more - especially when working with large datasets or doing scientific and numerical computing.

Python is fast

NumPy is a popular external library that introduces the array data structure, which looks and feels a lot like a Python list but offers better performance with the right mathematical implementation.

Installation#

pip comes in default with python as the package manager to install and manage packages from PyPI. To install numpy

pip install numpy

Example#

Let’s say you have a grayscale image stored as a 2D NumPy array, with pixel values between 0 and 1. Even though the pixel values are between 0 and 1, the average brightness can still vary from image to image. One image might be mostly dark, another might be very bright. This variation can affect how well a model learns, since it might pick up on brightness differences instead of actual shapes or patterns.

By applying Z-score normalization, we shift the data so its mean becomes 0 and its spread becomes consistent. This removes brightness bias and makes the data more stable and reliable for learning.

\[\text{normalized} = \frac{x - \text{mean}}{\text{std}}\]
 1import numpy as np
 2from numpy import typing as npt
 3
 4
 5# Simulate an image
 6image: npt.NDArray[np.float64] = np.random.rand(1000, 1000)
 7print("Raw:\n", image)
 8
 9
10# Z-score normalization
11mean: float = np.mean(image)
12std: float = np.std(image)
13normalized_image: npt.NDArray[np.float64] = (image - mean) / std
14print("Normalized:\n", normalized_image)
15
16
17print("Mean after normalization:", np.mean(normalized_image))
18# Mean after normalization: -2.503490748040349e-16 ~0
19print("Std after normalization:", np.std(normalized_image))
20# Std after normalization: 0.9999999999999998 ~1