DEV Community

Apiumhub
Apiumhub

Posted on • Originally published at apiumhub.com on

Getting Started with Numpy – Lesson 1

Introduction

NumPy is a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays. Its primary type is the array type called ndarray. This library contains many routines for statistical analysis.

Creating, Getting Info, Selecting and Util Functions

The 2009 data set ‘Wine Quality Dataset’ elaborated by Cortez et al. available at UCI Machine Learning , is a well-known dataset that contains wine quality information.It includes data about red and white wine physicochemical properties and a quality score.

Before we start, we are going to visualize the head a little example dataset

t ozBeiHHe7CXrn7kqTQb7yhWmbBp3i3dPEEAx4uyG5DLf4TZWrK8ww83eOtvVjZffZkoRBFAHgNvsvRaB46G0vxTtZbe29TC 5gCKlMX 9Zk7w3Oc0nWOLbYi7HMYPGdHfRHsVg=s0

Creating

In Numpy you can create arrays in different ways, we are going to see examples of the most common and those that can be most useful for data processing.

Unidimensional array from list:

Import numpy as np
list = [1, 2, 3]
uni_numpy_array = np.array(list)

array([1, 2, 3])
Enter fullscreen mode Exit fullscreen mode

Multidimensional array from list:

list = [[1, 2, 3], [4, 5, 6]]
multi_numpy_array = np.array(list)

array([[1, 2, 3],
       [4, 5, 6]])
Enter fullscreen mode Exit fullscreen mode

Multidimensional array all values are zeros:

zeros_array = np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
Enter fullscreen mode Exit fullscreen mode

Multidimensional array all values are random:

random_array = np.random.rand(3, 4)

array([[0.98195491, 0.34964712, 0.13426036, 0.55065786],
       [0.4180283 , 0.36018953, 0.44374156, 0.4366695],
       [0.69893273, 0.01089244, 0.4297768 , 0.6985924]])
Enter fullscreen mode Exit fullscreen mode

Getting Info

There are several functions that can help us extract information from the data. We are going to explain one by one with examples of its operation and its usefulness.

Get array dimensions:

For this we are going to use the shape() function that returns the number of rows and the number of columns (rows, columns).

wines_df.shape

(1599, 12)
Enter fullscreen mode Exit fullscreen mode

Get data type:

NumPy has several different data types, which mostly map to Python data types, like float, and str. You can find a full listing of most important NumPy data types here:

  1. float – numeric floating point data.

  2. int – integer data.

  3. string – character data.

  4. object – Python objects.

In this case we will use the dtype attribute that returns the data type of the array.

wines_df.dtype

dtype('float64')
Enter fullscreen mode Exit fullscreen mode

Selecting

Use the syntax np.array[i,j] to retrieve an element at row index i and column index j from the array.

To retrieve multiple elements, use the syntax np.array[(row_values), (column_values)] where row_values and column_values are a tuple of the same size.

Now we are going to show different examples of how to select elements within an array:

Get first row:

first_row = wines_df[:1]

array([[ 7.4 , 0.7 , 0. , 1.9 , 0.076 , 11. , 34. ,
         0.9978, 3.51 , 0.56 , 9.4 , 5. ]])
Enter fullscreen mode Exit fullscreen mode

Select the second element from the third row:

second_third = wines_df[2, 1:2]

array([0.76])
Enter fullscreen mode Exit fullscreen mode

Select the first three items from the fourth column:

first_three_items = wines_df[:3, 3]

array([1.9, 2.6, 2.3])
Enter fullscreen mode Exit fullscreen mode

Select the entire fourth column:

fourth_column = wines_df[:, 3]

array([1.9, 2.6, 2.3, ..., 2.3, 2. , 3.6])
Enter fullscreen mode Exit fullscreen mode

Util Functions

Numpy is a library that has an infinity of mathematical operation functions, so we are going to try to summarize in several examples the functions that as Data Scientist we are going to use with more probability.

Sum up the whole 11th column:

twelveth_column_sum = wines_df[:, 11].sum()

9012.0
Enter fullscreen mode Exit fullscreen mode

Sum up all the columns:

all_columns_sum = wines_df.sum(axis=0)

array([13303.1 , 843.985 , 433.29 , 4059.55 , 139.859 ,
       25384. , 74302. , 1593.79794, 5294.47 , 1052.38 ,
       16666.35 , 9012. ])
Enter fullscreen mode Exit fullscreen mode

Mean of the first row:

first_row_mean = wines_df[:1].mean()

6.211983333333333
Enter fullscreen mode Exit fullscreen mode

Return a bool array where the position value of the 11th column is True if the value was minor than 5 and False in other case:

bool_array = wines_df[:,11] > 5

array([False, False, False, ..., True, False, True])
Enter fullscreen mode Exit fullscreen mode

Get the traspose matrix of wines matrix:

traspose = np.transpose(wines_df)
traspose.shape

(12, 1599)
Enter fullscreen mode Exit fullscreen mode

Get the flatten array of wines:

flatten = wines_df.ravel()
flatten.shape

(19188,)
Enter fullscreen mode Exit fullscreen mode

Turn the 12th row of wines into a 2-dimensional array with 3 rows and 4 columns:

wines_df[1:2].reshape((3,4))

array([[7.8 , 0.88 , 0. , 2.6],
       [0.098 , 25. , 67. , 0.9968],
       [3.2 , 0.68 , 9.8 , 5.]])
Enter fullscreen mode Exit fullscreen mode

Training your abilities

If you want to bring your skills further in Data Science, we have created a course that you can download for free here.

Discussion (0)