DEV Community

Cover image for Python Pandas: Introduction to pandas (part 2)
justkmike
justkmike

Posted on

Python Pandas: Introduction to pandas (part 2)

In the previous article, we looked at what DataFrames and Series are in pandas. We also explored how to create a pandas DataFrame (referred to as df here) from a list and a dictionary. I hope you've also researched the key differences between a DataFrame and a Series in pandas. In this article, we will build on that knowledge.

To understand better, in data science, data can be sourced from various sources, and each source may store the data in different formats, such as comma-separated values (CSV), Excel sheets, SQL files, and more. It's up to you to know what tools to use. Now, let's dive into the available functions in pandas.

Reading and Writing CSV Files

Pandas provides read_csv() and to_csv() functions to work with CSV files.

Reading CSV Files

read_csv() is used to read data from CSV files. For example:

import pandas as pd

csvdf = pd.read_csv("file_path.csv")
Enter fullscreen mode Exit fullscreen mode

By default, read_csv() assumes the first row in your file is the header row. If this is not the case, you can use the header parameter to specify the row number to use as the header:

csvdf = pd.read_csv("file_path.csv", header=3)
Enter fullscreen mode Exit fullscreen mode

Note that all rows above the specified header will be ignored when creating the DataFrame.

Reading CSV Files with Multiple Headers

If your CSV file has multiple header rows, you can specify them using a list with the header parameter:

csvdf = pd.read_csv("file_path.csv", header=[0, 2])
Enter fullscreen mode Exit fullscreen mode
Reading CSV Files Without Headers

If your CSV file doesn't include headers, but you have the column names separately, you can use the names parameter to provide a list of column names:

csvdf = pd.read_csv("file_path.csv", names=["first_name", "last_name", "gender"])
Enter fullscreen mode Exit fullscreen mode

You can check the set headers using csvdf.columns.

Adding a Prefix to Column Names

If you have many columns and labeling them is cumbersome, you can add a prefix to the columns using the prefix parameter:

csvdf = pd.read_csv("file_path.csv", header=None, prefix="Col_")
Enter fullscreen mode Exit fullscreen mode

Setting the Index Column

By default, DataFrames are indexed from 0 to n-1. If you want to use a specific column as the index, you can do so with the index_col parameter:

csvdf = pd.read_csv("file_path.csv", index_col='company')
# Alternate approach using column index
csvdf = pd.read_csv("file_path.csv", index_col=0)
Enter fullscreen mode Exit fullscreen mode

If it's a multi-index DataFrame, use a list to indicate the index, similar to how we did for the headers.

Reading CSV Files with Defined Columns and Rows

If you don't need all the columns and rows in the provided file, you can filter them using the usecols and nrows parameters. For example:

csvdf = pd.read_csv("file_path.csv", index_col=0, usecols=['names', 'gender'], nrows=10)
Enter fullscreen mode Exit fullscreen mode

This means you'll get a DataFrame with only the 'names' and 'gender' columns and only 10 rows of the available rows.

Skipping Rows

If you need to skip specific rows, like those with even row numbers, you can use the skiprows functionality:

csvdf = pd.read_csv("file_path.csv", index_col=0, skiprows=lambda x: x % 2 == 0)
Enter fullscreen mode Exit fullscreen mode

This uses a lambda function to skip rows where the row number is even.

Writing to CSV

To save a DataFrame to a CSV file, use the to_csv() method. By default, it also includes the index in the file. You can disable this by setting the index parameter to `False:

python
csvdf.to_csv("file_name.csv", index=False)

Most of these functionalities ie read_json() and read_excel() are similar _(and I will not be handling that at least for now)_when writing to CSV as when reading from CSV. Note that this is not an exhaustive guide on read_csv, so I encourage you to explore further and practice to better understand. You can also find free courses on platforms like Simplilearn to boost your skills and resume.

Top comments (3)

Collapse
 
sreno77 profile image
Scott Reno

Good stuff! I've just started learning pandas myself. Please keep posting these and continue getting more in depth.

Collapse
 
justkmike profile image
justkmike

I will, and thank you

Collapse
 
aman-2018 profile image
Aman kumar

Interesting, please keep posting