DEV Community

Cover image for Let's use pandas effectively in our code
Jubaeir Islam
Jubaeir Islam

Posted on

Let's use pandas effectively in our code

In the field of data science, the use of powerful and efficient tools is essential for effectively analyzing and interpreting large datasets. One such tool that is widely used by data scientists is Pandas, a library for Python that provides fast and flexible data structures for data analysis.

Pandas is a powerful library for Python that is widely used in the field of data science for data analysis and manipulation. It provides fast and flexible data structures, such as DataFrames and Series, that make it easy to work with large datasets. In this blog post, we will explore some of the most popular methods used in Pandas and how they can be effectively utilized in data science.

One of the most popular methods in Pandas is the read_csv() function, which is used to read and import data from a CSV file. This function can be used to import data into a Pandas DataFrame and is a quick and easy way to load data for analysis.

import pandas as pd
data = pd.read_csv('data.csv')
Enter fullscreen mode Exit fullscreen mode

Another popular method in Pandas is the head() function, which is used to view the first few rows of a DataFrame. Always use this function to quickly inspect the structure and contents of a dataset.

data.head()
Enter fullscreen mode Exit fullscreen mode

Output:

   col1  col2  col3
0     1     2     3
1     4     5     6
2     7     8     9
3    10    11    12
4    13    14    15

Enter fullscreen mode Exit fullscreen mode

You can use describe(). This method returns the basic statistical summary of the numerical columns in a DataFrame.

df.describe()
Enter fullscreen mode Exit fullscreen mode

Output:

           col1       col2       col3
count  10.00000  10.000000  10.000000
mean   17.50000  18.500000  19.500000
std    11.77439  11.774437  11.774437
min     1.00000   2.000000   3.000000
25%     9.25000  10.250000  11.250000
50%    17.50000  18.500000  19.500000
75%    25.75000  26.750000  27.750000
max    34.00000  35.000000  36.000000
Enter fullscreen mode Exit fullscreen mode

Pandas also provides a variety of methods for data cleaning and preparation, such as the dropna() and fillna() methods. The dropna() method is used to remove rows or columns with missing data, while the fillna() method is used to fill in missing values with a specific value or method.

data.dropna()
data.fillna(value=0)
Enter fullscreen mode Exit fullscreen mode

Pandas also provides powerful methods for data manipulation and transformation, such as groupby()and pivot_table(). The groupby() method is used to group data by a specific column, while the pivot_table() method is used to reshape data and create a pivot table.

data.groupby('column_name').mean()
data.pivot_table(values='column_name', index='grouping_column', aggfunc='mean')
Enter fullscreen mode Exit fullscreen mode

Let's break it down. in first line we group the dataset by a column and by default it will give a mean value. But for writing cleaner code we can use .pivot_table(). The pivot_table() method allows you to create a new table by grouping rows based on one column and calculating aggregate values for another column.

Top comments (0)