DEV Community

Kiran U Kamath
Kiran U Kamath

Posted on • Originally published at blog.learnwithdata.me

Peep into the basics of Numpy and Pandas

This blog is written in Jupyter notebook, so you can experiment and learn by editing the notebook.

Click here for notebook.

Just change the input and check the output.

Learning by experiment and hands-on exercises is always better.

The purpose of this notebook is just to revise python basics.

Let's get started.

1. NUMPY BASICS

NumPy is a Linear Algebra Library used for multidimensional arrays

NumPy brings the best of two worlds:

  • C/Fortran computational efficiency,
  • Python language easy syntax
import numpy as np 

# Let's define a one-dimensional array 
my_list = [10, 20, 30, 40, 50, 60, 70, 80]
my_list
Enter fullscreen mode Exit fullscreen mode
[10, 20, 30, 40, 50, 60, 70, 80]
Enter fullscreen mode Exit fullscreen mode

Let's create a numpy array from the list "my_list"

x = np.array(my_list)
x
Enter fullscreen mode Exit fullscreen mode
array([10, 20, 30, 40, 50, 60, 70, 80])
Enter fullscreen mode Exit fullscreen mode

Get shape

x.shape
Enter fullscreen mode Exit fullscreen mode
(8,)
Enter fullscreen mode Exit fullscreen mode

Let's create a Multi-dimensional numpy array from the list "my_list"


matrix = np.array([[5, 8], [9, 13]])
matrix
Enter fullscreen mode Exit fullscreen mode
array([[ 5,  8],
       [ 9, 13]])
Enter fullscreen mode Exit fullscreen mode
# "rand()" uniform distribution between 0 and 1
xy = np.random.rand(7)
xy
Enter fullscreen mode Exit fullscreen mode
array([0.40408966, 0.12527144, 0.04465052, 0.39450693, 0.93339664,
       0.14009694, 0.94461679])
Enter fullscreen mode Exit fullscreen mode

you can create a matrix of random number from random.rand


xy = np.random.rand(2, 2)
xy
Enter fullscreen mode Exit fullscreen mode
array([[0.86152202, 0.22526627],
       [0.41562272, 0.33467273]])
Enter fullscreen mode Exit fullscreen mode
# "randn()" normal distribution between 0 and 1
xy = np.random.randn(7)
xy
Enter fullscreen mode Exit fullscreen mode
array([-1.27678101,  1.20667812,  0.7945132 ,  0.62421099, -0.44447512,
       -0.57038096,  2.19949273])
Enter fullscreen mode Exit fullscreen mode

"randint" is used to generate random integers between upper and lower bounds


xy = np.random.randint(1, 10)
xy
Enter fullscreen mode Exit fullscreen mode
9
Enter fullscreen mode Exit fullscreen mode

Create an evenly spaced values with a step of 7

xy = np.arange(1, 50, 7)
xy
Enter fullscreen mode Exit fullscreen mode
array([ 1,  8, 15, 22, 29, 36, 43])
Enter fullscreen mode Exit fullscreen mode
# Array of ones
xy = np.ones(7)
xy
Enter fullscreen mode Exit fullscreen mode
array([1., 1., 1., 1., 1., 1., 1.])
Enter fullscreen mode Exit fullscreen mode
# Matrices of ones
xy = np.ones((2, 2))
xy
Enter fullscreen mode Exit fullscreen mode
array([[1., 1.],
       [1., 1.]])
Enter fullscreen mode Exit fullscreen mode
# Array of zeros
xy = np.zeros(5)
xy
Enter fullscreen mode Exit fullscreen mode
array([0., 0., 0., 0., 0.])
Enter fullscreen mode Exit fullscreen mode

Reshape 1D array into a matrix

z = x.reshape(2,4)
print(x)
print(z)
Enter fullscreen mode Exit fullscreen mode
[10 20 30 40 50 60 70 80]
[[10 20 30 40]
 [50 60 70 80]]
Enter fullscreen mode Exit fullscreen mode

Obtain the maximum element (value)

x.max()
Enter fullscreen mode Exit fullscreen mode
80
Enter fullscreen mode Exit fullscreen mode

Obtain the minimum element (value)

x.min()
Enter fullscreen mode Exit fullscreen mode
10
Enter fullscreen mode Exit fullscreen mode

Obtain the location of the max element

x.argmax()
Enter fullscreen mode Exit fullscreen mode
7
Enter fullscreen mode Exit fullscreen mode
# Obtain the location of the min element
x.argmin()
Enter fullscreen mode Exit fullscreen mode
0
Enter fullscreen mode Exit fullscreen mode
# Access specific index from the numpy array
x[0]
Enter fullscreen mode Exit fullscreen mode
10
Enter fullscreen mode Exit fullscreen mode
# Starting from the first index 0 up until and NOT including the last element
x[0:3]
Enter fullscreen mode Exit fullscreen mode
array([10, 20, 30])
Enter fullscreen mode Exit fullscreen mode
# Broadcasting, altering several values in a numpy array at once
x[0:2] = 10
x
Enter fullscreen mode Exit fullscreen mode
array([10, 10, 30, 40, 50, 60, 70, 80])
Enter fullscreen mode Exit fullscreen mode

2. Pandas

Pandas is a data manipulation and analysis tool that is built on Numpy.

Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python).

DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).

Series Vs. DataFrame? Series is considered a single column of a DataFrame.

import pandas as pd 
Enter fullscreen mode Exit fullscreen mode
# Let's define two lists as shown below:
stock_list = ['Reliance','AMAZON','facebook']
stock_list

Enter fullscreen mode Exit fullscreen mode
['Reliance', 'AMZN', 'facebook']
Enter fullscreen mode Exit fullscreen mode
label   = ['stock#1', 'stock#2', 'stock#3']
label
Enter fullscreen mode Exit fullscreen mode
['stock#1', 'stock#2', 'stock#3']
Enter fullscreen mode Exit fullscreen mode

Let's create a one dimensional Pandas "series"

Note that series is formed of data and associated labels


x_series = pd.Series(data = stock_list, index = label)
Enter fullscreen mode Exit fullscreen mode
# Let's view the series
x_series
Enter fullscreen mode Exit fullscreen mode
stock#1    Reliance
stock#2        AMZN
stock#3    facebook
dtype: object
Enter fullscreen mode Exit fullscreen mode

Let's obtain the datatype

type(x_series)
Enter fullscreen mode Exit fullscreen mode
pandas.core.series.Series
Enter fullscreen mode Exit fullscreen mode

Let's define a two-dimensional Pandas DataFrame

Note that you can create a pandas dataframe from a python dictionary


bank_client_df = pd.DataFrame({'Bank client ID':[1111, 2222, 3333, 4444], 
                               'Bank Client Name':['Kiran', 'Chaitanya', 'dheeraj', 'shreyas'], 
                               'Net worth [$]':[3500, 29000, 10000, 2000], 
                               'Years with bank':[3, 4, 9, 5]})
bank_client_df
Enter fullscreen mode Exit fullscreen mode
.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
Enter fullscreen mode Exit fullscreen mode
Bank client ID Bank Client Name Net worth [$] Years with bank
0 1111 Kiran 3500 3
1 2222 Chaitanya 29000 4
2 3333 dheeraj 10000 9
3 4444 shreyas 2000 5

Let's obtain the data type


type(bank_client_df)
Enter fullscreen mode Exit fullscreen mode
pandas.core.frame.DataFrame
Enter fullscreen mode Exit fullscreen mode

you can only view the first couple of rows using .head()

bank_client_df.head(2)
Enter fullscreen mode Exit fullscreen mode
.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
Enter fullscreen mode Exit fullscreen mode
Bank client ID Bank Client Name Net worth [$] Years with bank
0 1111 Kiran 3500 3
1 2222 Chaitanya 29000 4

you can only view the last couple of rows using .tail()

bank_client_df.tail(1)
Enter fullscreen mode Exit fullscreen mode
.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
Enter fullscreen mode Exit fullscreen mode
Bank client ID Bank Client Name Net worth [$] Years with bank
3 4444 shreyas 2000 5

Pandas is used to read a csv file and store data in a DataFrame

bank_df = pd.read_csv('sample.csv')

write to a csv file without an index

bank_df.to_csv('sample_output.csv', index = False)

CONCATENATING AND MERGING WITH PANDAS

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
Enter fullscreen mode Exit fullscreen mode
df1
Enter fullscreen mode Exit fullscreen mode
.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
Enter fullscreen mode Exit fullscreen mode
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
index=[4, 5, 6, 7]) 
Enter fullscreen mode Exit fullscreen mode
df2
Enter fullscreen mode Exit fullscreen mode
.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
Enter fullscreen mode Exit fullscreen mode
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                    'B': ['B8', 'B9', 'B10', 'B11'],
                    'C': ['C8', 'C9', 'C10', 'C11'],
                    'D': ['D8', 'D9', 'D10', 'D11']},
index=[8, 9, 10, 11])
Enter fullscreen mode Exit fullscreen mode
df3
Enter fullscreen mode Exit fullscreen mode
.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
Enter fullscreen mode Exit fullscreen mode
A B C D
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
pd.concat([df1, df2, df3])
Enter fullscreen mode Exit fullscreen mode
.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
Enter fullscreen mode Exit fullscreen mode
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

Top comments (0)