## DEV Community is a community of 616,519 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

# Peep into the basics of Numpy and Pandas

Kiran U Kamath Originally published at blog.learnwithdata.me ・7 min read

This blog is written in Jupyter notebook, so you can experiment and learn by editing the notebook.

Just change the input and check the output.

Learning by experiment and hands-on exercises is always better.

The purpose of this notebook is just to revise python basics.

Let's get started.

# 1. NUMPY BASICS

NumPy is a Linear Algebra Library used for multidimensional arrays

NumPy brings the best of two worlds:

• C/Fortran computational efficiency,
• Python language easy syntax
``````import numpy as np

# Let's define a one-dimensional array
my_list = [10, 20, 30, 40, 50, 60, 70, 80]
my_list
``````
``````[10, 20, 30, 40, 50, 60, 70, 80]
``````

Let's create a numpy array from the list "my_list"

``````x = np.array(my_list)
x
``````
``````array([10, 20, 30, 40, 50, 60, 70, 80])
``````

Get shape

``````x.shape
``````
``````(8,)
``````

Let's create a Multi-dimensional numpy array from the list "my_list"

``````
matrix = np.array([[5, 8], [9, 13]])
matrix
``````
``````array([[ 5,  8],
[ 9, 13]])
``````
``````# "rand()" uniform distribution between 0 and 1
xy = np.random.rand(7)
xy
``````
``````array([0.40408966, 0.12527144, 0.04465052, 0.39450693, 0.93339664,
0.14009694, 0.94461679])
``````

you can create a matrix of random number from random.rand

``````
xy = np.random.rand(2, 2)
xy
``````
``````array([[0.86152202, 0.22526627],
[0.41562272, 0.33467273]])
``````
``````# "randn()" normal distribution between 0 and 1
xy = np.random.randn(7)
xy
``````
``````array([-1.27678101,  1.20667812,  0.7945132 ,  0.62421099, -0.44447512,
-0.57038096,  2.19949273])
``````

"randint" is used to generate random integers between upper and lower bounds

``````
xy = np.random.randint(1, 10)
xy
``````
``````9
``````

Create an evenly spaced values with a step of 7

``````xy = np.arange(1, 50, 7)
xy
``````
``````array([ 1,  8, 15, 22, 29, 36, 43])
``````
``````# Array of ones
xy = np.ones(7)
xy
``````
``````array([1., 1., 1., 1., 1., 1., 1.])
``````
``````# Matrices of ones
xy = np.ones((2, 2))
xy
``````
``````array([[1., 1.],
[1., 1.]])
``````
``````# Array of zeros
xy = np.zeros(5)
xy
``````
``````array([0., 0., 0., 0., 0.])
``````

Reshape 1D array into a matrix

``````z = x.reshape(2,4)
print(x)
print(z)
``````
``````[10 20 30 40 50 60 70 80]
[[10 20 30 40]
[50 60 70 80]]
``````

Obtain the maximum element (value)

``````x.max()
``````
``````80
``````

Obtain the minimum element (value)

``````x.min()
``````
``````10
``````

Obtain the location of the max element

``````x.argmax()
``````
``````7
``````
``````# Obtain the location of the min element
x.argmin()
``````
``````0
``````
``````# Access specific index from the numpy array
x[0]
``````
``````10
``````
``````# Starting from the first index 0 up until and NOT including the last element
x[0:3]
``````
``````array([10, 20, 30])
``````
``````# Broadcasting, altering several values in a numpy array at once
x[0:2] = 10
x
``````
``````array([10, 10, 30, 40, 50, 60, 70, 80])
``````

# 2. Pandas

Pandas is a data manipulation and analysis tool that is built on Numpy.

Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python).

DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).

Series Vs. DataFrame? Series is considered a single column of a DataFrame.

``````import pandas as pd
``````
``````# Let's define two lists as shown below:
stock_list

``````
``````['Reliance', 'AMZN', 'facebook']
``````
``````label   = ['stock#1', 'stock#2', 'stock#3']
label
``````
``````['stock#1', 'stock#2', 'stock#3']
``````

Let's create a one dimensional Pandas "series"

Note that series is formed of data and associated labels

``````
x_series = pd.Series(data = stock_list, index = label)
``````
``````# Let's view the series
x_series
``````
``````stock#1    Reliance
stock#2        AMZN
dtype: object
``````

Let's obtain the datatype

``````type(x_series)
``````
``````pandas.core.series.Series
``````

Let's define a two-dimensional Pandas DataFrame

Note that you can create a pandas dataframe from a python dictionary

``````
bank_client_df = pd.DataFrame({'Bank client ID':[1111, 2222, 3333, 4444],
'Bank Client Name':['Kiran', 'Chaitanya', 'dheeraj', 'shreyas'],
'Net worth [\$]':[3500, 29000, 10000, 2000],
'Years with bank':[3, 4, 9, 5]})
bank_client_df
``````
``````.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}
``````
Bank client ID Bank Client Name Net worth [\$] Years with bank
0 1111 Kiran 3500 3
1 2222 Chaitanya 29000 4
2 3333 dheeraj 10000 9
3 4444 shreyas 2000 5

Let's obtain the data type

``````
type(bank_client_df)
``````
``````pandas.core.frame.DataFrame
``````

you can only view the first couple of rows using .head()

``````bank_client_df.head(2)
``````
``````.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}
``````
Bank client ID Bank Client Name Net worth [\$] Years with bank
0 1111 Kiran 3500 3
1 2222 Chaitanya 29000 4

you can only view the last couple of rows using .tail()

``````bank_client_df.tail(1)
``````
``````.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}
``````
Bank client ID Bank Client Name Net worth [\$] Years with bank
3 4444 shreyas 2000 5

### write to a csv file without an index

bank_df.to_csv('sample_output.csv', index = False)

## CONCATENATING AND MERGING WITH PANDAS

``````df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
``````
``````df1
``````
``````.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}
``````
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
``````df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']},
index=[4, 5, 6, 7])
``````
``````df2
``````
``````.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}
``````
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
``````df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']},
index=[8, 9, 10, 11])
``````
``````df3
``````
``````.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}
``````
A B C D
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
``````pd.concat([df1, df2, df3])
``````
``````.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}
``````
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11