DEV Community is a community of 787,776 amazing developers

We're a place where coders share, stay up-to-date and grow their careers. keshavs759

Posted on • Originally published at vidyasheela.com

Numpy Tutorials [beginners to Intermediate]

Numpy is an open-source library for scientific computing with Python and especially for data analysis.NumPy stands for Numerical Python. It is used for working with arrays in Python.

Installation of Numpy

Usually, Numpy is present as basic packages in most of the Python distributions: However if not present, it can be installed later.

On Windows with Anaconda use:

conda install numpy

On Linux (Ubuntu and Debian), use:

sudo apt-get install python-numpy

If you are using pip, use:

pip install numpy

Ndarray

The array object in NumPy is called ndarray(N-dimensional array). This is a multidimensional array having a homogenous and predetermined number of items.

The numpy arrays have fixed size and it is defined in the time of creation and remains unchanged.

Let's look at some of the basic functions associated with Numpy array

dtype - specifies the data type of array elements

shape - returns the shape of numpy array(row x columns)

ndim - returns the dimension of numpy array (no of rows)

size - returns the total number of elements contained in the array

Numpy array can be created simply by passing a Python List to a function array(). i.e. myArray = np.array([1, 2, 3])

In :

#import numpy
import numpy as np

#creating a numpy array a
a = np.array([[1,2,3,4,5],[2,3,4,5,6]])
print(a)

Out:

array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])

To check whether the created object "a" is numpy array or not you can use function type()

In :

type(a)

Out:

numpy.ndarray

In :

a.dtype

Out:

dtype('int32')

In :

a.size

Out:

10

In :

a.ndim

Out:

2

In :

a.shape

Out:

(2, 5)

Data types supported by Numpy

 Data Type Description bool_ Boolean (true or false) stored as a byte int_ Default integer type (same as C long; normally either int64 or int32) intc Identical to C int (normally int32 or int64 intp Integer used for indexing (same as C size_t; normally either int32 or int64 int8 Byte (�128 to 127) int16 Integer (�32768 to 32767) int32 Integer (�2147483648 to 2147483647) int64 Integer (�9223372036854775808 to 9223372036854775807) uint8 Unsigned integer (0 to 255) uint16 Unsigned integer (0 to 65535 uint32 Unsigned integer (0 to 4294967295) uint64 Unsigned integer (0 to 18446744073709551615) float_ Shorthand for float64 float16 Half precision float: sign bit, 5-bit exponent, 10-bit mantissa float32 Single precision float: sign bit, 8-bit exponent, 23-bit mantissa float64 Double precision float: sign bit, 11-bit exponent, 52-bit mantissa complex_ Shorthand for complex128 complex64 Complex number, represented by two 32-bit floats (real and imaginary components complex128 Complex number, represented by two 64-bit floats (real and imaginary components)

In :

list1 = [[1+1j,2+2j,3+2j,4+8j,5+6j],[1+1j,3,4,5,2]]
complex_array = np.array(list1)
complex_array.dtype

Out:

dtype('complex128')

In :

list1 = [[1,3,5,6],[1,2,4,5]]
cmp = np.array(list1,dtype = float)
print(cmp)
print(cmp.dtype)

Out:

[[1. 3. 5. 6.]
[1. 2. 4. 5.]]

dtype('float64')

Numpy array generation

You can use dtype to define the data type of array elements

The NumPy library provides a set of functions that generate ndarrays with initial content, created with different values depending on the function.

Zeros()

The zeros() function, creates a full array of zeros with dimensions defined by the shape argument. For example, to create a two-dimensional array 2x2,

By default, arrays will be created with float64 datatypes

you can use:

In :

np.zeros((2,2))

Out:

array([[0., 0.],
[0., 0.]])

Ones()

The ones() function, creates a full array of ones with dimensions defined by the shape argument. For example, to create a two-dimensional array 2x3,

By default, arrays will be created with float64 datatypes

you can use:

In :

np.ones((2, 3))

Out:

array([[1., 1., 1.],
[1., 1., 1.]])

Diagonal matrix[ diag() ]

A function diag() is used to generate a diagonal matrix with diagonal elements as given in python list passed as argument to the function

In :

np.diag([5,6,5,3,4])

Out:

array([[5, 0, 0, 0, 0],
[0, 6, 0, 0, 0],
[0, 0, 5, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4]])

arange()

The Function arange() generates Numpy array in a particular sequence as defined by passing arguments

you can generate a sequence of values 1 to 50 as follows.

In :

np.arange(1, 50)

Out:

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

By default, the interval is 1 but you can change the interval by passing the third parameter as follows

In :

np.arange(1, 50,5)

Out:

array([ 1,  6, 11, 16, 21, 26, 31, 36, 41, 46])

This piece of code will generate a sequence of numbers from 1 to 50 with an interval of 5. you can also use float in intervals e.g. 2.5

In :

np.arange(1, 50,2.5)

Out:

array([ 1. ,  3.5,  6. ,  8.5, 11. , 13.5, 16. , 18.5, 21. , 23.5, 26. ,
28.5, 31. , 33.5, 36. , 38.5, 41. , 43.5, 46. , 48.5])

reshape()

reshape() is the function used to reshape a numpy array according to the arguments passed to it. you can use reshape as follows

In :

print("Before Applying reshape function")
beforeArray = np.arange(1, 50,2.5)
print(beforeArray.shape)

print("After Applying reshape function")
afterArray = beforeArray.reshape(4,5)
print(afterArray.shape)

Out:

Before Applying reshape function
(20,)
After Applying reshape function
(4, 5)

Generate random array

you can generate an array of random numbers by using the random() function. The dimension of the array to be formed is given as an argument to the function.

To generate some random numbers every time your program you can use random.seed() function. It will take some seed value which can any number you wish to use. What it basically does is every time when you generate a random number using a specified seed, it will generate the same numbers every time. Let's see in the example below.

In :

np.random.seed(5)
firstRandomArray = np.random.random((3,3))
print(firstRandomArray)

print("n")
print("Again let's use same seed value 5 n")

np.random.seed(5)
secondRandomArray = np.random.random((3,3))
print(secondRandomArray)

print("n")
print("Now lets use different seed value say 10 n")

np.random.seed(10)
thirdRandomArray = np.random.random((3,3)
print(thirdRandomArray)

Out:

[[0.22199317 0.87073231 0.20671916]
[0.91861091 0.48841119 0.61174386]
[0.76590786 0.51841799 0.2968005 ]]

Again let's use the same seed value 5

[[0.22199317 0.87073231 0.20671916]
[0.91861091 0.48841119 0.61174386]
[0.76590786 0.51841799 0.2968005 ]]

Now let's use different seed values say 10

[[0.77132064 0.02075195 0.63364823]
[0.74880388 0.49850701 0.22479665]
[0.19806286 0.76053071 0.16911084]]

Mathematical Operations

Now let's see some of the important mathematical operations that can be performed with Numpy array

Arithmetic Operation

In :

a = np.arange(5)
print(a)
b = np.arange(5,10)
print(b)

Out:

[0 1 2 3 4]
[5 6 7 8 9]

you can add a scaler to the array.

To perform addition between the arrays, make sure that both are of the same dimension.

In :

print("adding any scaler to the array elements")

Out:

adding any scaler to the array elements
[15 16 17 18 19]

[ 5  7  9 11 13]

Subtraction

you can subtract scaler to/from the array.

To perform a subtraction between the arrays, make sure that both are of the same dimension.

In :

print("subtracting any scaler to the array elements")
diff = 15-a # a-15 can is also a valid
print(diff)

print("subtraction between any two arrays")
diff =a-b
print(diff)

Out:

subtracting any scaler to the array elements
[15 14 13 12 11]

subtraction between any two arrays
[-5 -5 -5 -5 -5]

Multiplication

you can multiply any scaler with the array.

To perform Multiplication between the arrays, make sure that both are of the same dimension.

The Multiplication between the two arrays using star ('*') is always element-wise multiplication

In :

print("multiplying any scaler with the array elements")
mul =a*15
print(mul)

print("multiplication between any two arrays")
mul =a*b
print(mul)

Out:

multiplying any scaler with the array elements
[ 0 15 30 45 60]

multiplication between any two arrays
[ 0  6 14 24 36]

Matrix Operations

Matrix Multiplication

Matrix multiplication is one of the most common operations that are to be performed while performing any data science-related tasks. If we simply try to perform matrix multiplication using the * operator, as seen in the above example, it will perform only element-wise multiplication but not matrix multiplication. So to perform matrix multiplication, there is a special function dot() provided by Numpy itself. let's see how to use it

In :

A = np.ones((3, 3))
B = np.arange(0,9).reshape(3,3)

print("A is")
print(A)

print("B is")
print(B)

Out:

A is
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]

B is
[[0 1 2]
[3 4 5]
[6 7 8]]

if we use operator*

In :

A * B

Out:

array([[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]])

using dot() Function

In :

product_AxB = np.dot(A,B) # A.dot(B) is also the same
product_BxA = np.dot(B,A) # B.dot(A) is also the same

print(" The matrix Mult AxB is")
print(product_AxB)

print(" The matrix Mult BxA is")
print(product_BxA)

Out:

The matrix Mult AxB is
[[ 9. 12. 15.]
[ 9. 12. 15.]
[ 9. 12. 15.]]

The matrix Mult BxA is
[[ 3.  3.  3.]
[12. 12. 12.]
[21. 21. 21.]]

Transpose of Matrix

you can obtain the transpose of a matrix using syntax matrix_name.T as follows

In :

A = np.arange(0,9).reshape(3,3)
print("A is")
print(A)

# transpose calculation
print('n')

A_transpose = A.T
print(" The transpose of A is")
print(A_transpose)

Out:

A is
[[0 1 2]
[3 4 5]
[6 7 8]]

The transpose of A is
[[0 3 6]
[1 4 7]
[2 5 8]]

Determinant Calculation

You can calculate the determinant of a square matrix A using np.linalg.det(A)

In :

A = np.arange(1,10).reshape(3,3)
determinant = np.linalg.det(A)
print("Determinant of A is")
print(determinant)

Out:

Determinant of A is
-9.51619735392994e-16

Inverse Calculation

You can calculate the Inverse of a non-Singular matrix A using np.linalg.inv(A)

In :

A = np.arange(1,10).reshape(3,3)
print("A is")
print(A)

# Inverse calculation
print('n')

A_inv = np.linalg.inv(A)
print(" The Inverse of A is")
print(A_inv)

Out:

A is
[[1 2 3]
[4 5 6]
[7 8 9]]

The Inverse of A is
[[ 3.15251974e+15 -6.30503948e+15  3.15251974e+15]
[-6.30503948e+15  1.26100790e+16 -6.30503948e+15]
[ 3.15251974e+15 -6.30503948e+15  3.15251974e+15]]

Pseudo-Inverse Calculation

The pseudo-inverse or Moore-Penrose pseudo inverse is a generalization of the matrix inverse when the matrix may not be invertible.

You can calculate the Pseudo-Inverse of a matrix A using np.linalg.pinv(A)

In:

A = np.arange(0,9).reshape(3,3)
print("A is")
print(A)

'''Here A is a singular matrix,
if you try to find its inverse,
you will get an error
you can try finding inverse as done in above example'''

# Pseudo-Inverse calculation
print('n')

A_pinv = np.linalg.pinv(A)
print(" The pseudo-Inverse of A is")
print(A_pinv)

Out:

A is
[[0 1 2]
[3 4 5]
[6 7 8]]

The pseudo-Inverse of A is
[[-5.55555556e-01 -1.66666667e-01 2.22222222e-01]
[-5.55555556e-02 1.83880688e-16 5.55555556e-02]
[ 4.44444444e-01 1.66666667e-01 -1.11111111e-01]]

Aggregate Functions

An aggregate function or aggregation function is a function that performs an operation on a set of values, for example, an array, and produces a single summary value. Common aggregate functions include:

sum() - calculate the sum of all elements in the array

min() - returns the element with minimum numeric value

max() - returns the element with maximum numeric value

mean() - returns the average of the array elements

std() - returns the standard deviation

In :

import numpy as np

A = np.arange(1,6,0.6)

print("the array is")
print(A)

print("the sum is")
print(A.sum())

print("the min is")
print(A.min())

print("the max is")
print(A.max())

print("the mean is")
print(A.mean())

print("the std is")
print(A.std())

Out:

the array is
[1.  1.6 2.2 2.8 3.4 4.  4.6 5.2 5.8]

the sum is
30.600000000000005

the min is
1.0

the max is
5.800000000000001

the mean is
3.4000000000000004

the std is
1.549193338482967

Indexing, Slicing, and Iterating

Indexing

Array indexing always uses square brackets ([ ]) to index the elements of the array so that the elements can then be referred individually for various, uses such as extracting a value, selecting items, or even assigning a new value.

In python Indexing always starts from 0 and it is increased by one for every next element.

for example, if A = [1,2,3] is an array, then the index for the elements will be 0,1,2 respectively.

In order to access the single element of an array, you can refer to its index.

In :

array = np.array([23,4,23,11,2])

print("element with Index 0 =>",array)
print("element with Index 1 =>",array)
print("element with Index 2 =>",array)
print("element with Index 3 =>",array)
print("element with Index 4 =>",array)

Out:

element with Index 0 => 23
element with Index 1 => 4
element with Index 2 => 23
element with Index 3 => 11
element with Index 4 => 2

It is to be noted that Numpy also accepts the negative indexes. The negative index starts from -1 to -(size of the array).

The index -1 represents the last element while the -(size of the array) represents the first element of the array

Let's visualize it with an example

In :

array = np.array([23,4,23,11,2])

#size of array is 5

print("element with Index 0 or Index -5 =>",array[-5])
print("element with Index 1 or Index -4 =>",array[-4])
print("element with Index 2 or Index -3 =>",array[-3])
print("element with Index 3 or Index -2 =>",array[-2])
print("element with Index 4 or Index -1 =>",array[-1])

Out:

element with Index 0 or Index -5 => 23
element with Index 1 or Index -4 => 4
element with Index 2 or Index -3 => 23
element with Index 3 or Index -2 => 11
element with Index 4 or Index -1 => 2

In a multi-dimensional array let's say in a 2x2 array (i.e. matrix), you can access the values using the row index and column index i.e. array[row index, col index]

In :

A = np.arange(1,5).reshape(2,2)
print(A)

print(A[0,0])
print(A[0,1])
print(A[1,0])
print(A[1,1])

Out:

[[1 2]
[3 4]]
1
2
3
4

Slicing

slicing allows you to extract a portion of the array to generate a new array. We use a colon(:) within square brackets to slice an array. let there is an array A with 5 elements in it. If you want to slice it from index 2 to index 4 (the 4th element is not included), use A[2:4]. you can also use a third number that defines the gap in the sequence. For example, in A[0:4:2], you are slicing array from index 0 to 4 with the gap of 2 elements, i.e 0,2

let's understand slicing more with examples

In :

A = np.arange(1, 10)
print(A)

print('n')

print("Slice from index 5 upto 8")
print(A[5:8])

print('n')

print("Slice from index 2 upto 8 with gap of 3")
print(A[2:8:3])

Out:

[1 2 3 4 5 6 7 8 9]

Slice from index 5 up to 8
[6 7 8]

Slice from index 2 up to 8 with a gap of 3
[3 6]

In the Slicing Syntax

---->If you omit the first number then Numpy implicitly understands it as 0

---->If you omit the second Number, then Numpy will interpret it as a maximum index of the array

---->If the last Number is omitted, it will be interpreted as 1

Let's look it with examples

In :

A = np.arange(1, 10)
print(A)

print('n')

print("Omitting first and second number")
print(A[::2])

print('n')

print("Omitting first number only")
print(A[:7:2])

print('n')

print("Omitting first and last number ")
print(A[:7:])

Out:

[1 2 3 4 5 6 7 8 9]

Omitting first and second number
[1 3 5 7 9]

Omitting the first number only
[1 3 5 7]

Omitting first and the last number
[1 2 3 4 5 6 7]

Slicing in 2-d array

In 2-d array slicing holds true, but it is defined separately for rows and columns (The same is for multi-dimensional array).

All other rules that you have looked at will hold true for the 2-d array also.

let's see an example,

In :

A = np.arange(10, 19).reshape((3, 3))
print(A)
print('n')
print("sliced array is")
sliced = A[0:2,0:2]
print(sliced)
print(sliced.shape)

Out:

[[10 11 12]
[13 14 15]
[16 17 18]]

sliced array is
[[10 11]
[13 14]]
(2, 2)

Iterations

you can iterate a NumPy array using for loop

In :

A = np.arange(1, 10)

for i in A:
print(i)

Out:

1
2
3
4
5
6
7
8
9

Shape Manipulation

While performing calculations with arrays, in many situations you have to manipulate the shape of your array. Numpy provides a number of functions that can be used for the shape manipulation of your array.

Some of the most commonly used among them are as follows,

reshape()

You have used this function multiple times before. what it does is takes numbers as parameters and reshape the array accordingly.

for example, reshape(3,3) will reshape the array into 3 rows and 3 columns.

In :

A = np.arange(1,10)
print(A)

print('n')

print("After reshaping to 3x3")
A_3x3 = A.reshape(3,3)
print(A_3x3)

Out:

[1 2 3 4 5 6 7 8 9]

After reshaping to 3x3
[[1 2 3]
[4 5 6]
[7 8 9]]

ravel()

This function is used to convert the multi-dimensional array into a single dimensional array.

In :

A_3x3 = np.arange(1,10).reshape(3,3)
print(A_3x3)

print('n')

print("After using ravel ")
A_ravel = A.ravel()
print(A_ravel)

Out:

[[1 2 3]
[4 5 6]
[7 8 9]]

After using ravel
[1 2 3 4 5 6 7 8 9]

flatten()

This function is similar to ravel() as it also reshapes the multi-dimensional array into a single-dimensional array.

But the key difference is that

flatten() => is a method of a ndarray object and hence can only be called for true NumPy arrays.

ravel() => s a library-level function and hence can be called on any object that can successfully be parsed.

In :

A_3x3 = np.arange(1,10).reshape(3,3)
print(A_3x3)

print('n')

print("After flattening ")
A_flattened = A.flatten()
print(A_flattened)

Out:

[[1 2 3]
[4 5 6]
[7 8 9]]

After flattening
[1 2 3 4 5 6 7 8 9]

Joining and Splitting of Arrays

Joining of Arrays

Multiple arrays can be stacked together to form a new array. you can use function vstack() for vertical stacking and function hstack() for horizontal stacking.

In vertical stacking, the second array is combined vertically with the first array growing the size of the array in a vertical direction i.e. the number of rows is increased

In horizontal stacking, the second array is combined horizontally with the first array growing its size in a horizontal direction i.e. number of columns is increased.

Note:

1. For vertical stacking, the number of columns should match
2. For horizontal stacking, the number of rows should match

vstack()

In :

A = np.ones((3, 3))
B = np.zeros((3, 3))
np.vstack((A, B))

Out:

array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])

hstack()

In :

A = np.ones((3, 3))
B = np.zeros((3, 3))
np.hstack((A, B))

Out:

array([[1., 1., 1., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.]])

Splitting of Arrays

Numpy provides several functions that can be used to split an array into several parts. Similar to those of horizontal and vertical stacking, Numpy provides us functions for horizontal and vertical splitting viz. hsplit() and vsplit()

hsplit(array,number of split) e.g. hsplit(A,2) => will split array A into two equal parts horizontally i.e. column-wise

vsplit(array,number of split) e.g. vsplit(A,2) => will split array A into two equal parts vertically i.e. row-wise

hsplit()

In :

A = np.arange(16).reshape((4, 4))
print(A)

[a,b] = np.hsplit(A, 2)
print('n')

print(a)
print('n')

print(b)

Out:

[[ 0  1  2  3]
[ 4  5  6  7]
[ 8  9 10 11]
[12 13 14 15]]

[[ 0  1]
[ 4  5]
[ 8  9]
[12 13]]

[[ 2  3]
[ 6  7]
[10 11]
[14 15]]

vsplit()

In :

A = np.arange(16).reshape((4, 4))
print(A)
print('n')

[a,b] = np.vsplit(A, 2)
print(a)
print('n')
print(b)

Out:

[[ 0  1  2  3]
[ 4  5  6  7]
[ 8  9 10 11]
[12 13 14 15]]

[[0 1 2 3]
[4 5 6 7]]

[[ 8  9 10 11]
[12 13 14 15]]

Unsymmetrical Splitting

You can split any array unsymmetrically using split() function

The function split() takes 3 arguments,

1. the array you want to split
2. list of indices e.g. [1,2,3] will split array in 4 parts from [0-1],[1-2],[2-3],[3-last]
3. axis : axis = 0 means row-wise split, axis = 1 means column-wise split

In :

A = np.arange(16).reshape((4, 4))

[A1,A2,A3,A4] = np.split(A,[1,2,3],axis=1)

print(A1)
print('n')
print(A2)
print('n')
print(A3)
print('n')
print(A4)

Out:

[[ 0]
[ 4]
[ 8]
]

[[ 1]
[ 5]
[ 9]
]

[[ 2]
[ 6]

]

[[ 3]
[ 7]

]

In :

A = np.arange(16).reshape((4, 4))

[A1,A2,A3,A4] = np.split(A,[1,2,3],axis=0)

print(A1)
print('n')
print(A2)
print('n')
print(A3)
print('n')
print(A4)

Out:

[[0 1 2 3]]

[[4 5 6 7]]

[[ 8  9 10 11]]

[[12 13 14 15]]

Reading and Writing Array Data on Files

Numpy allows you to save and retrieve data to and from binary files. Functions save() and load() are used to save and load data.

save()

To save data you supply name_of_file in which you want to save data as the first argument and array you want to save as the second argument to the function save().

save('my_file_name',array)

The file will be saved with an extension of .npy

In :

array_to_save = np.array([1,2,3,4,5,6])
np.save("saved_data",array_to_save)

In :

Out:

[1 2 3 4 5 6]

you can save and load data from CSV files also. Usually saving data in CSV files is considered a better option as these files can be opened easily by any text editor or spreadsheet softwares.

Saving Data

To save data into CSV format, there are several options provided by Numpy, one of them is by using savetxt() function.

Let's see an example,

In :

data = np.array([ [1.2,2,3], [4,5,6], [7,8,9] ])
np.savetxt("saved_data.csv", data,fmt="%f", delimiter=",")

Here parameter fmt controls the format in which you want to store data.

for example, if you want to store data in integer use fmt=%d, and if in float format use fmt=%f

Parameter delimiter specifies how you want to separate values.

delimiter = "," will separate the values with a comma

you can load data from csv file using function genfromtxt()

In :