DEV Community

Abisola Oyetunji
Abisola Oyetunji

Posted on • Edited on

Python Pandas Beginner's Tutorial.

Python is a widely used programming language that is renowned for its simplicity and adaptability.
Pandas is a versatile and user-friendly Python package that is mostly used for working with data sets.
Specifically for cleaning, exploring, manipulating, and analyzing data, Pandas is a fantastic tool for data analysis.

In this tutorial I will be shring some basic operations using pandas.
Let's dive right!

Installing Pandas.

Installing pandas is quite easy, just open your terminal program if you are using Mac, or your command line for (Pc users).
Enter the following commands.

Image description

Next We Want To import Pandas.

Use this:

Image description

A quick background knowledge:
The series and the dataframe are two parts of pandas. A dataframe is a multi-dimensional table made up of a collection of series, whereas a series is essentially a column.

Let's examine the Python dataframe creation process.

There are several ways to build a DataFrame from start, and using a simple dictionary is one of your best options.

Let's imagine we operate a fruit stand with a focus on selling apples and oranges. Our goal is to have a row for each customer's purchase and a column for each fruit.
In order to organize this data into a pandas dictionary, we may use the following strategy:

Image description

Then Using pandas dataframe constructor to create the dataframe:

Image description

How to read data in pandas

Image description

We used pd.read_csv here because we are working with a csv file, an excel file is read as pd.read_excel.

Loading the bikes dataset:
Let's take a look at the dataset

Image description

.head() gives an output of the first five rows of your dataframe, you could also pass the number
of your desired output.

To get Information about your data, run this command:

Image description

To see the shape of your dataset:

Shape is another attribute that helps you quickly see the numbers of rows and columns in your dataset.

Image description

Dropping Duplicates

Image description
drop_duplicates() is a method used to remove duplicates.

Selecting Column:

Image description

Using this method makes it simple to choose the column so that you can clean it up as needed as some datasets may contain column names containing symbols, upper- and lowercase words, spaces, and mistakes.

Checking Missing Value:

Image description

You'll probably come across missing or null values when analyzing data, which are simply placeholders for values that don't exist.
Depending on whether a cell is null, isnull() produces a DataFrame with each cell having a True or False value.

Removing Missing Values:

Image description

To drop rows with missing values you can also drop columns with null values by setting axis=1:

There are various methods and functions not covered in this tutorial, this is just to introduce you to basic analysis in pandas.

The pandas methods we didn't cover in this tutorial, such as "nunique," "describe," "merge," "pivot," "unique," and many others, will be expanded upon in my subsequent post.

Wrapping up

Data cleaning is more important when analyzing data; as an analyst, this will occupy roughly 80% of your time.
You should work on projects more, and you can read more about pandas documentation by clicking the link below: https://pandas.pydata.org/docs/reference/general_functions.html.

Keep working!

Top comments (0)