Category type in pandas

#pandas #dataframe #python

Python pandas library supports a data type called Category. When working with pandas dataframe, using Category will help in many ways. Let's see about Category datatype.

What is Category data type in pandas?

Category is a datatype which can be used when we have a fixed number of string values like
- Months(Jan, Feb)
- Country Names(India, Singapore)
- Size(Small, Medium, Large)
In a simple way is using a sequence of integer values for the strings(Jan - 1, Feb - 2 etc)
Categories are similar to ENUM data types in other programming languages like C/C++, Java.

Advantages of using Category:

Saving lot of memory by reducing the size
Increasing processing speed

How to use Category in pandas dataframe:

- While reading the CSV file:

We can convert column from object to category while reading the file like below

filename = "~/Downloads/US_Accidents_Dec20.csv"
# Converting into category data type while reading CSV file
us_accidents_dec20_cat = pd.read_csv(filename, dtype = {'State' : 'category', 'City' : 'category'})

- Converting column into category type:

We can convert the column on the fly like below

# Loading csv file into data frame
filename = "~/Downloads/US_Accidents_Dec20.csv"
us_accidents_dec20_cat = pd.read_csv(filename,)

# Normal column access
us_accidents_dec20['State']

# Converting to category data type
us_accidents_dec20['State'].astype('category')

Memory comparison between Object vs Category data types:

Normal object column:

us_accidents_dec20['State'].memory_usage(deep=True) / 1e6

Result:
249.720047

Category column:

us_accidents_dec20['State'].astype('category').memory_usage(deep=True) / 1e6

Result:
4.23684

We can clearly observe storage space reduced from 249 to 4 which is a very huge difference.

Converting to Category data type will certainly help improve processing speed and space with a large set of data.

Happy Learning!!

P.S: Used Accidents' data of December 2020 from The USA, You can get this data from kaggle.

DEV Community

Category type in pandas

What is Category data type in pandas?

Advantages of using Category:

How to use Category in pandas dataframe:

- While reading the CSV file:

- Converting column into category type:

Memory comparison between Object vs Category data types:

Top comments (0)

Read next

Practical Experience: Integrating Over 50 Neural Networks Into One Open-Source Project

Enhancing Observability in Machine Learning with OpenTelemetry: InsightfulAI Update

The ultimate guide to Retrieval-Augmented Generation (RAG)

How I Automated My Workflow by Connecting Python to Google Sheets API