Beginner guide on python🐍🐼pandas

#python #panda #datascience #machinelearning

What is pandas?

Python pandas is an open-source library that is widely used for data analysis.
Pandas library is used for reading and manipulating data in ML and data science.

pip install pandas

pip command to install pandas in your system.

What is DataFrame?

A pandas DataFrame is a 2 dimensional data array or a table with rows and columns.

Create Data Frame In pandas:

import pandas as pd
car_dataset = {
'cars': ['Tata', 'Maruti', 'Tesla'], 'Model': ['Nano', 'i10', '11x3'], 'Range: [300, 315, 400]
}
car_df = pd.DataFrame(car_dataset)
print(car_df)

Basic column operation on Data Frame
You can easily access the data frame column using square brackets and also assign or update its values.
Below are some basic operations you can perform on a Data Frame Column.

#Accessing Single Column
print(car_df[['cars']])
# you can also use single square brackets to access single column
#Accessing Multiple Column
print(car_df [[ 'Model', 'Range']])
# Add New Column
car_df['new_column_name'] = [1, 2, 3] # new column value
# Delete Column
car_df.drop(columns=['new_col_name'], inplace=True)
# rename column
#Syntax: df.renamel columns={"oldName":"NewName"}, inplace=True)
car_df.rename(columns={ 'Model' : 'model'}, inplace=True)

Read CSV File:

A simple way to store big data sets is to use CSV files (Comma Separated Files).
CSV files is the common file type you will use while working in Machine Learning or Data Science.

import pandas as pd
df = pd.read_csv('Housing.csv') print(df)
# print(df.to_string())
# use to_string() to print the entire DataFrame.

Peek Into The Data:

To understand the high-level overview of the data, pandas offers multiple functions and some of them are:

import pandas as pd
 Read CSV File
df = pd.read_csv('Housing.csv')
#head of the data
print(df.head(10)) print first 19 rows of dataframe
#tall of the data
print(df.tail(10)) print last 10 rows of dataframe
#shape = To know the dimensions of the data print(df.shape)
#(545, 19) 11's means 545 rows and 13 columns
#Features
print(df.columns) # it return the columns name
#Index("price", "area", "bedrooms bathrooms, stories", "matnroad"
#guestroom", "basement, hotwaterheating', 'airconditioning,
#parking prefarea", furnishingstatus ], dtype="object")
#info
print(df.info())
prints info about the null values and the data types of each cols.

Statistical Analysis Using Pandas:
Pandas offer some functions which help you to dig deeper and find more useful insight from the data and some of the useful functions are:

# describe : returns statistical measures such as min and max values, mean, standard deviation and more.
df.describe()
# unique : return all the unique values in column.
df['columnName'].unique()
#value_count : returns the frequency of the values df['columnName'].value_counts()
# correlation : find the correlation among the features respectively.
df.corr()

Pandas also have functions to find other statistical measures like mean, median and, mode, etc.

DEV Community

Beginner guide on python🐍🐼pandas

Top comments (0)

Read next

What is RAG (Retrieval-Augmented Generation)?

Survey: Large Language Models for Supply Chain Optimization

FiftyOne Computer Vision Tips and Tricks - Feb 23, 2024

Bedrock Jumpstart Series: Bedrock Overview