DEV Community

Cover image for Part 1: Introduction to NumPy
Biraj
Biraj

Posted on • Updated on

Part 1: Introduction to NumPy

NumPy is a Python library that is mainly used to work with arrays. An array is a collection of items that are stored next to each other in memory. For now, just think of them like Python lists.

NumPy is written in Python and C. The calculations in NumPy are done by the parts that are written in C, which makes them extremely fast compared to normal Python code.

Installation

Make sure Python & Pip are installed in your computer. Then open command prompt or terminal and run

pip install numpy
Enter fullscreen mode Exit fullscreen mode

Creating Arrays

We can create a NumPy array by using the numpy module's array() function.

import numpy as np

arr = np.array([3, 5, 7, 9])
print(type(arr))
Enter fullscreen mode Exit fullscreen mode
Output:
<class 'numpy.ndarray'>
Enter fullscreen mode Exit fullscreen mode

We just created a NumPy array from a list. The type of our arr variable is numpy.ndarray. Here ndarray stands for N-dimensional array.

Dimensions or Axes

In NumPy, dimensions are called axes (plural for axis). I like to think of an axis as a line along which items can be stored. A simple list or a 1 dimensional array can be visualized as:

Axis for 1D Array

We will now look at the following:

  1. Scalars (0D Arrays)
  2. Vectors (1D Arrays)
  3. Matrices (2D Arrays)
  4. 3D Arrays
  5. 4D Arrays

1) Scalars (0D Arrays)

A scalar is just a single value.

import numpy as np

s = np.array(21)
print("Number of axes:", s.ndim)
print("Shape:", s.shape)
Enter fullscreen mode Exit fullscreen mode
Output:
Number of axes: 0
Shape: ()
Enter fullscreen mode Exit fullscreen mode

Here we have used 2 properties of a numpy array:

  • ndim: It returns the number of dimensions (or axes) in an array. It returns 0 here because a value in itself does not have any dimensions.
  • shape: It returns a tuple that contains the number of values along each axis of an array. Since a scalar has 0 axes, it returns an empty tuple.

2) Vectors (1D Arrays)

A vector is a collection of values.

import numpy as np

vec = np.array([-1, 2, 7, 9, 2])
print("Number of axes:", vec.ndim)
print("Shape:", vec.shape)
Enter fullscreen mode Exit fullscreen mode
Output:
Number of axes: 1
Shape: (5,)
Enter fullscreen mode Exit fullscreen mode

vec.shape[0] gives us the number of values in our vector, which is 5 here.

3) Matrices (2D Arrays)

A matrix is a collection of vectors.

import numpy as np

mat = np.array([
    [1, 2, 3],
    [5, 6, 7]
])

print("Number of axes:", mat.ndim)
print("Shape:", mat.shape)
Enter fullscreen mode Exit fullscreen mode
Output:
Number of axes: 2
Shape: (2, 3)
Enter fullscreen mode Exit fullscreen mode

Here we created a 2x3 matrix (2D array) using a list of lists. Since a matrix has 2 axes, mat.shape tuple contains two values: the first value is the number of rows and the second value is the number of columns.

Matrix

Each item (row) in a 2D array is a vector (1D array).

4) 3D Arrays

A 3D array is a collection of matrices.

import numpy as np

t = np.array([
    [[1, 3, 9],
     [7, -6, 2]],

    [[2, 3, 5],
     [0, -2, -2]],

    [[9, 6, 2],
     [-7, -3, -12]],

    [[2, 4, 5],
     [-1, 9, 8]]
])

print("Number of axes:", t.ndim)
print("Shape:", t.shape)
Enter fullscreen mode Exit fullscreen mode
Output:
Number of axes: 3
Shape: (4, 2, 3)
Enter fullscreen mode Exit fullscreen mode

Here we created a 3D array by using a list of 4 lists, which themselves contain 2 lists.

3D Array

Each item in a 3D array is a matrix (1D array). Note that the last matrix in the array is the front-most in the image.

5) 4D Ararys

4D Array

After looking at the above examples, we see a pattern here. An n-dimensional array is a collection of n-1 dimensional arrays, for n > 0.
I hope that now you have a better idea of visualizing multidimensional arrays.


Accessing Array Elements

Just like Python lists, the indexes in NumPy arrays start with 0.

import numpy as np

vec = np.array([-3, 4, 6, 9, 8, 3])
print("vec - 4th value:", vec[3])

vec[3] = 19
print("vec - 4th value (changed):", vec[3])

mat = np.array([
    [2, 4, 6, 8],
    [10, 12, 14, 16]
])
print("mat - 1st row:", mat[0])
print("mat - 2nd row's 1st value:", mat[1, 0])
print("mat - last row's last value:", mat[-1, -1])
Enter fullscreen mode Exit fullscreen mode
Output:
vec - 4th value: 9
vec - 4th value (changed): 19
mat - 1st row: [2 4 6 8]
mat - 2nd row's 1st value: 10
mat - last row's last value: 16
Enter fullscreen mode Exit fullscreen mode

NumPy arrays also support slicing.

# continuing the above code

print("vec - 2nd to 4th:", vec[1:4])
print("mat - 1st rows 1st to 3rd values:", mat[0, 0:3])
print("mat - 2nd column:", mat[:, 1])
Enter fullscreen mode Exit fullscreen mode
Output:
vec - 2nd to 4th: [4 6 9]
mat - 1st row's 1st to 3rd values: [2 4 6]
mat - 2nd column: [ 4 12]
Enter fullscreen mode Exit fullscreen mode

In the last example, [:, 1] tells "get 2nd value from all rows". Hence, we get the 2nd column of the matrix as the output.

Example: Indexing in a 4D Array

Indexing in 4D Array

Let's say we want to access the circled value. It is located in the 2nd 3D array's last matrix's 2nd row's 2nd column. It's a lot so take your time. Here's how to access it:

arr[2, -1, 1, 1]
Enter fullscreen mode Exit fullscreen mode

Python VS NumPy

At the beginning of the post, I said that calculations in NumPy are extremely fast compared to normal Python code. Let's see the difference. We will create two lists with 10 million numbers from 0 to 9,999,999, add them element-wise and measure the time it takes. Then we will convert both lists to NumPy arrays and do the same.

import numpy as np
import time

l1 = list(range(10000000))
l2 = list(range(10000000))
sum = []

then = time.time()
for i in range(len(l1)):
    sum.append(l1[i] + l2[i])

print(f"With just Python: {time.time() - then: .2f}s")

arr1 = np.array(l1)
arr2 = np.array(l2)

then = time.time()
sum = arr1 + arr2
print(f"With NumPy: {time.time() - then: .2f}s")
Enter fullscreen mode Exit fullscreen mode
Output:
With just Python:  2.30s
With NumPy:  0.14s
Enter fullscreen mode Exit fullscreen mode

In this case, NumPy was 16x faster than raw Python.

Discussion (0)