Zaynul Abedin Miah

Posted on Dec 8, 2022 • Edited on Dec 27, 2022

Pandas Library

#pandas #machinelearning #ai #learnwithmitul

Pandas is an open source library built on top of NumPy. It allows fast analysis and data cleaning preparation. Pandas is fast and it has high performance & productivity for users. It also has built in visualization.

Panda series
You can make a series in Pandas from any type of data, including a list, a dictionary, a scalar value, etc. Different types of series are created in the following ways:

An array can be converted into a series by using the array() function and the numpy module.

Missing Data occurs when a unit or object has no data. Real-world data loss is a major issue. Pandas call missing data NA values. Many DataFrame datasets include missing data, either because it never existed or was never collected.

Pandas Data Frames
Pandas DataFrame is a tabular data structure with two axes that are labeled and whose size can be changed (rows and columns). A Data frame is a two-dimensional data structure, which means that the data is set up in rows and columns like in a table. Pandas DataFrame is made up of three main parts: rows, columns, and data.

pandas.DataFrame.drop
DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

You can get rid of rows or columns by giving their label names and the axis they belong to, or by giving their index or column names directly. When you use a multi-index, you can remove labels from different levels by stating the level.

loc vs iloc
The.loc [] method is based on the names or labels of the index. The.iloc [] method, on the other hand, is based on the position of the index. It works like a normal slicing, where we just need to give the positional index number and get the right slice.

Boolean Dataframes
Pandas dataframes allow for boolean indexing which is quite an effective technique to filter a dataframe for various conditions. In boolean indexing, boolean vectors generated depending on the conditions are used to filter the data.

Subset selection

Indexing in Pandas means selecting rows and columns of data from a Dataframe. It can be selecting all the rows and the particular number of columns, a particular number of rows, and all the columns or a particular number of rows and columns each. Indexing is also known as Subset selection.

Working with missing data

We use the fillna(), replace(), and interpolate() functions to fill in NaN values in a dataset. These functions replace NaN values with their own values. All of these functions help fill in missing data in a DataFrame's datasets. The Interpolate() function is used to fill in NA values in the dataframe. Instead of hard-coding the value, it does this by using different interpolation techniques. Code #1: Adding a single value to null values

groupby
Pandas groupby is used to put data into groups based on their categories and apply a function to each group. It also makes it easier to gather data in an effective way.

With Pandas's dataframe.groupby() function, the data is split into groups based on certain criteria. Pandas objects can be cut in any direction. In a general sense, grouping means to provide a way to link labels to group names.

pandas.concat
pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

Let’s understand how we can concatenate two or more Data Frames. A concatenation of two or more data frames can be done using pandas.concat() method. concat() in pandas works by combining Data Frames across rows or columns. We can concat two or more data frames either along rows (axis=0) or along columns (axis=1)

pandas.DataFrame.merge

# importing the module
import pandas as pd
# creating the first DataFrame
df1 = pd.DataFrame({"fruit" : ["apple", "banana", 
                               "avocado", "grape"],
                    "market_price" : [21, 14, 35, 38]})
display("The first DataFrame")
display(df1)

# creating the second DataFrame
df2 = pd.DataFrame({"fruit" : ["apple", "banana", "grape"],
                    "wholesaler_price" : [65, 68, 71]})
display("The second DataFrame")
display(df2)

# joining the DataFrames
# here both common DataFrame elements are in df1 and df2, 
# so it extracts apple, banana, grapes from df1 and df2.  
display("The merged DataFrame")
pd.merge(df1, df2, on = "fruit", how = "inner")

*Outputs: *

Dataframe.join()
Pandas Dataframe.join() can be characterized as a method of joining standard fields of various DataFrames. The columns which consist of basic qualities and are utilized for joining are called join key.

Data Input and Output
You can also read data from files like Html, Excel, SQL, CSV.
In order to work with HTML files and SQL database, along with pandas, we would need to install the below library as well,

conda install sqlalchemy
conda install lxml
conda install html5lib
conda install BeautifulSoup4

All codes that I've solved with pandas are given below:

https://github.com/azaynul10/Python-For-Data-Science-And-Machine-Learing-Bootcamp-Exercise-Solutions/blob/78aa3f5acb9bea8a751ebb5395af72925e1a74ad/Pandas_Library1.py

DEV Community

Pandas Library

Top comments (0)

Read next

Unpacking AI Risks: Oversight, Self-Exfiltration, and Data Manipulation in OpenAI’s o1 Model

1 Week to Build the Future of AI with Humiris

How to Use Twitter API v2 (X API Free): A Complete Guide for Developers

AI for 3D Object Manufacturing: Innovate Your Workflow