Apiumhub

Posted on Dec 3, 2021 • Originally published at apiumhub.com on Sep 30, 2021

Getting Started with Numpy – Lesson 1

#datascience

Introduction

NumPy is a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays. Its primary type is the array type called ndarray. This library contains many routines for statistical analysis.

Creating, Getting Info, Selecting and Util Functions

The 2009 data set ‘Wine Quality Dataset’ elaborated by Cortez et al. available at UCI Machine Learning , is a well-known dataset that contains wine quality information.It includes data about red and white wine physicochemical properties and a quality score.

Before we start, we are going to visualize the head a little example dataset

Creating

In Numpy you can create arrays in different ways, we are going to see examples of the most common and those that can be most useful for data processing.

Unidimensional array from list:

Import numpy as np
list = [1, 2, 3]
uni_numpy_array = np.array(list)

array([1, 2, 3])

Multidimensional array from list:

list = [[1, 2, 3], [4, 5, 6]]
multi_numpy_array = np.array(list)

array([[1, 2, 3],
       [4, 5, 6]])

Multidimensional array all values are zeros:

zeros_array = np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

Multidimensional array all values are random:

random_array = np.random.rand(3, 4)

array([[0.98195491, 0.34964712, 0.13426036, 0.55065786],
       [0.4180283 , 0.36018953, 0.44374156, 0.4366695],
       [0.69893273, 0.01089244, 0.4297768 , 0.6985924]])

Getting Info

There are several functions that can help us extract information from the data. We are going to explain one by one with examples of its operation and its usefulness.

Get array dimensions:

For this we are going to use the shape() function that returns the number of rows and the number of columns (rows, columns).

wines_df.shape

(1599, 12)

Get data type:

NumPy has several different data types, which mostly map to Python data types, like float, and str. You can find a full listing of most important NumPy data types here:

float – numeric floating point data.
int – integer data.
string – character data.
object – Python objects.

In this case we will use the dtype attribute that returns the data type of the array.

wines_df.dtype

dtype('float64')

Selecting

Use the syntax np.array[i,j] to retrieve an element at row index i and column index j from the array.

To retrieve multiple elements, use the syntax np.array[(row_values), (column_values)] where row_values and column_values are a tuple of the same size.

Now we are going to show different examples of how to select elements within an array:

Get first row:

first_row = wines_df[:1]

array([[ 7.4 , 0.7 , 0. , 1.9 , 0.076 , 11. , 34. ,
         0.9978, 3.51 , 0.56 , 9.4 , 5. ]])

Select the second element from the third row:

second_third = wines_df[2, 1:2]

array([0.76])

Select the first three items from the fourth column:

first_three_items = wines_df[:3, 3]

array([1.9, 2.6, 2.3])

Select the entire fourth column:

fourth_column = wines_df[:, 3]

array([1.9, 2.6, 2.3, ..., 2.3, 2. , 3.6])

Util Functions

Numpy is a library that has an infinity of mathematical operation functions, so we are going to try to summarize in several examples the functions that as Data Scientist we are going to use with more probability.

Sum up the whole 11th column:

twelveth_column_sum = wines_df[:, 11].sum()

9012.0

Sum up all the columns:

all_columns_sum = wines_df.sum(axis=0)

array([13303.1 , 843.985 , 433.29 , 4059.55 , 139.859 ,
       25384. , 74302. , 1593.79794, 5294.47 , 1052.38 ,
       16666.35 , 9012. ])

Mean of the first row:

first_row_mean = wines_df[:1].mean()

6.211983333333333

Return a bool array where the position value of the 11th column is True if the value was minor than 5 and False in other case:

bool_array = wines_df[:,11] > 5

array([False, False, False, ..., True, False, True])

Get the traspose matrix of wines matrix:

traspose = np.transpose(wines_df)
traspose.shape

(12, 1599)

Get the flatten array of wines:

flatten = wines_df.ravel()
flatten.shape

(19188,)

Turn the 12th row of wines into a 2-dimensional array with 3 rows and 4 columns:

wines_df[1:2].reshape((3,4))

array([[7.8 , 0.88 , 0. , 2.6],
       [0.098 , 25. , 67. , 0.9968],
       [3.2 , 0.68 , 9.8 , 5.]])

Training your abilities

If you want to bring your skills further in Data Science, we have created a course that you can download for free here.

DEV Community