DEV Community

John Enad
John Enad

Posted on

Day 39-40: Reading and Writing CSV Files in Python

For the past couple of days I focused on reading and writing Comma-Separated Values (CSV) files. I figured this would a very important skill to know as this text format is widely used and therefore mastery of CSV file-handling would be necessary in order to become a successful Python developer.

Python has a built-in csv module that provides the functionality needed for reading and writing CSV files.

For the examples I have a few rows of data in a CSV file called: employees.csv

EMPID,FNAME,LNAME,HIREDT,JOBID,SAL
1,Blaine,Calhoun,1998-03-02,55,8400
2,Mark,Farley,1996-08-03,55,8100
3,April,Chang,1985-05-23,12,10800
4,Anne,Guevara,1990-11-24,15,12000
5,Luka,Ryan,1995-04-09,20,8700
Enter fullscreen mode Exit fullscreen mode

Let's start with the open() method for reading CSV files.

import csv

with open('employees.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Output:
['EMPID', 'FNAME', 'LNAME', 'HIREDT', 'JOBID', 'SAL']
['1', 'Blaine', 'Calhoun', '1998-03-02', '55', '8400']
['2', 'Mark', 'Farley', '1996-08-03', '55', '8100']
['3', 'April', 'Chang', '1985-05-23', '12', '10800']
['4', 'Anne', 'Guevara', '1990-11-24', '15', '12000']
['5', 'Luka', 'Ryan', '1995-04-09', '20', '8700']
Enter fullscreen mode Exit fullscreen mode

Here, we are using the open() function to open the CSV file in 'read' mode ('r'). Then, this file object is passed to the csv.reader(), which returns a reader object that we iterate over, printing each row. Each row is then returned as a list of string items.

While the csv.reader() function is pretty straightforward, it can be less readable when dealing with large files. For a more sophisticated approach, csv.DictReader() is something to consider. With this method, it reads the CSV file and maps the information into a dictionary, where the keys are the fieldnames (column names).

import csv

with open('employees.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row)

Output:
{'EMPID': '1', 'FNAME': 'Blaine', 'LNAME': 'Calhoun', 'HIREDT': '1998-03-02', 'JOBID': '55', 'SAL': '8400'}
{'EMPID': '2', 'FNAME': 'Mark', 'LNAME': 'Farley', 'HIREDT': '1996-08-03', 'JOBID': '55', 'SAL': '8100'}
{'EMPID': '3', 'FNAME': 'April', 'LNAME': 'Chang', 'HIREDT': '1985-05-23', 'JOBID': '12', 'SAL': '10800'}
{'EMPID': '4', 'FNAME': 'Anne', 'LNAME': 'Guevara', 'HIREDT': '1990-11-24', 'JOBID': '15', 'SAL': '12000'}
{'EMPID': '5', 'FNAME': 'Luka', 'LNAME': 'Ryan', 'HIREDT': '1995-04-09', 'JOBID': '20', 'SAL': '8700'}
Enter fullscreen mode Exit fullscreen mode

Here, each row is an OrderedDict and these rows can be accessed using the field names. This approach is considered to be more intuitive when dealing with complex datasets.

Just like when reading, the csv module that comes with Python allows for simple but powerful writing operations.

At it's most basic, csv.writer() can be used to write to CSV files.

import csv

data = [
    ['EMPID', 'FNAME', 'LNAME', 'HIREDT', 'JOBID', 'SAL'],
    [1, 'Blaine', 'Calhoun', '1998-03-02', 55, 8400],
    [2, 'Mark', 'Farley', '1996-08-03', 55, 8100]
]

with open('employees-out-1.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

Output:
EMPID,FNAME,LNAME,HIREDT,JOBID,SAL
1,Blaine,Calhoun,1998-03-02,55,8400
2,Mark,Farley,1996-08-03,55,8100
Enter fullscreen mode Exit fullscreen mode

Here, we use the open() function to open a file in 'write' mode ('w'), then create a writer object using csv.writer(). We then use writer.writerows() to write the list of lists to the file. Each inner list is then written as a row in the CSV file.

For more complex datasets, csv.DictWriter() may come in handy because it allows writing dictionaries into a CSV file.

import csv

fieldnames = ['EMPID', 'FNAME', 'LNAME', 'HIREDT', 'JOBID', 'SAL']
data = [
    {'EMPID': 1, 'FNAME': 'Blaine', 'LNAME': 'Calhoun', 'HIREDT': '1998-03-02', 'JOBID': 55, 'SAL': 8400},
    {'EMPID':2, 'FNAME': 'Mark', 'LNAME': 'Farley', 'HIREDT': '1996-08-03', 'JOBID': 55, 'SAL': 8100}
]

with open('employees-out-2.csv', 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)

Output:
EMPID,FNAME,LNAME,HIREDT,JOBID,SAL
1,Blaine,Calhoun,1998-03-02,55,8400
2,Mark,Farley,1996-08-03,55,8100
Enter fullscreen mode Exit fullscreen mode

Each dictionary in this example represents a row in the CSV file.
The keys in the dictionary correspond to the fieldnames (column names) in the CSV file.

While Python's csv module can do the job, the Pandas library has a more powerful and user-friendly interface for reading and writing CSV files. Check out this example for reading:

import pandas as pd

df = pd.read_csv('employees.csv')
print(df)
df.to_csv('employees-out-3.csv', index=False)

Output:
EMPID,FNAME,LNAME,HIREDT,JOBID,SAL
1,Blaine,Calhoun,1998-03-02,55,8400
2,Mark,Farley,1996-08-03,55,8100
3,April,Chang,1985-05-23,12,10800
4,Anne,Guevara,1990-11-24,15,12000 
5,Luka,Ryan,1995-04-09,20,8700
Enter fullscreen mode Exit fullscreen mode

This loads the CSV file into a DataFrame, a powerful data structure that allows for efficient data manipulation and just writes to a employees-out-3.csv file again with Pandas which is equally as simple:

Pandas handles complex data types and missing data better and also offers advanced functionalities like selecting specific columns, skipping rows, reading chunks of the file, and many more. It is quite an awesome tool.

Mastery of these various ways of handling CSV files is quite useful in data processing, data analysis, and machine learning.

Top comments (0)