Today we gonna cover Pandas library.

**Pandas** is a python library which usually use for data manipulation and data analysis. Mostly used in **Data Science** and **Machine Learning**. In this notebook we gonna show how powerful pandas library is!

Let's get started!

Let's call the numpy and pandas library into our workspace. Here, we are using kaggle notebook where these libraries are already installed.

```
import numpy as np
import pandas as pd
```

If these libraries aren't installed in you IDE, you have to install them before calling them.

## 1. Pandas Series

Let's create a series with pandas.

```
a1=['a','b','c']
my_data=[50,70,30]
ar=np.array(my_data)
d={'a':50,'b':70,'c':30}
```

```
pd.Series(data=my_data, index=a1)
```

Same thing could be done with:

```
pd.Series(my_data,a1)
```

and also with:

```
pd.Series(d)
```

**Indexing in series**

```
series1=pd.Series([1,2,3,4],['A','B','C','D'])
series1
```

```
series1['C']
```

## 2. Pandas DataFrames

Call the required library for creating data frame in python with pandas.

```
import numpy as np
import pandas as pd
```

```
from numpy.random import randn
```

Setting a fixed seed point as we want to draw the same set of random numbers each time we run the code. Otherwise our result would be vary every time we run the code.

```
np.random.seed(1011)
```

```
df=pd.DataFrame(randn(5,4),['A','B','C','D','E'],['W','X','Y','Z'])
df
```

Here is our data frame.

If we want to grab the column 'W', output gives a series

```
df['W']
```

another way to grab a column like sql

```
df.W
```

If we want to grab multiple column, output gives a dataframe

```
df[['W','Z']]
```

**Add a column**

Let's add a column to the data frame

```
df['H']=df['W']+df['Z']
```

**Delete a column**

To delete a column we will use drop function

```
df.drop('H',axis=1)
```

But if you run again the dataframe new column is still there, so we have to add another argument.

```
df.drop('H',axis=1,inplace=True)
```

this permanently deletes the column.

**Selecting rows, labelbased index**:

```
df.loc[['A','B'],['W','Y']]
```

**Conditional selection**

Select rows where W column value is greater than zero along with Y and X column.

```
df[df['W']>0][['Y','X']]
```

**Multiple selectio**n: Can you explain what result will give the following code?

```
df[(df['W']>0) & (df['Y']>1)]
```

```
df[(df['W']>0) | (df['Y']>1)]
```

**Multi-level index or index higher key**

Now we will create a data frame with index more than one level.

```
outside=['G1','G1','G1','G2','G2','G2']
inside=[1,2,3,1,2,3]
hi_index=list(zip(outside,inside))
hi_index=pd.MultiIndex.from_tuples(hi_index)
```

```
df=pd.DataFrame(randn(6,2),hi_index,['A','B'])
```

```
df
```

To grab everything under G1

```
df.loc['G1']
```

Try to explain which value we want to grab with following code:

```
df.loc['G2'].loc[2]['B']
```

## 3. Read CSV file

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

```
df = pd.read_csv('/kaggle/input/pandas/data_set.csv')
print(df.to_string())
```

## 4. Correlations

The relationship between each column in your data set can be calculated by cor() method. The relationship between the columns of our data

```
df.corr()
```

Correlation value varies from -1 to 1. Negative value indicate negative relationship that is if values of variable increases, other will decreases. Positive value mean a positive relationship, values of variable increases, other will increase too. 1 indicates perfect relationship.

You can practice more example at your own. The notebook link is given below. Go to the link and practice.

Notebook Link: [https://www.kaggle.com/code/azizaafrin/powerful-pandas-part-1]

Happy Learning!❤️

*Aziza Afrin*

## Top comments (0)