One of the most common things you will do in Python is working with CSV files. Whether it is to create a CSV file with data from an API or perform some analysis on an existing CSV, working with CSV files is an essential step in learning Python.
Opening Files in Python
When working with files, your first step is to open the file. In Python, this looks like this:
fh = open('somefile.csv')
The open function the file and return a file object, usually set to an fh
variable for "File Handler". By default, Python's open function will open in reading mode. You can change this with the mode
argument. If you wanted to open a file, delete everything in it, and then start writing to it, you would use:
fh = open('somefile.csv' mode='w')
Or, to open for reading and writing, you would use:
fh = open('somefile.csv' mode='r+')
Lastly, if you wanted to open the file to write to but keep the existing data, you use a
for "append" like this:
fh = open('somefile.csv' mode='a')
Once you open the file, you need to close it using the close()
method:
fh.close()
Instead of using the close()
method, you can also use the with
statement. Since I often forget to close the file (resulting in many issues), this is my preferred approach when I only need to open one file.
fh = open('somefile.csv' mode='w')
# Do stuff
fh.close()
is the same as
with open('somefile.csv' mode='w') as fh:
# Do stuff
Opening CSV Files in Python
While the above works with a lot of CSV files, there are two more arguments to the open()
function you may need to use.
Newlines
The first is the newlines
argument. Depending on how the CSV was created and where it may be used, the newlines may be saved a few different ways. So, we want to turn on "universal newlines mode" but return the actual unaltered line endings by setting the newline argument like this:
with open('somefile.csv', newline='') as fh:
# Do stuff
Encoding
Depending on the encoding of the file, you may also need to specify the encoding
argument. This would look like this:
with open('somefile.csv', newline='', encoding='utf-8') as fh:
# Do stuff
Reading and Writing Your First CSV File
To get started, let's do a simple exercise of writing a row to a new file and then reading that row.
To work with CSV files, we first need to import the csv
module.
import csv
Then, we will use it's writer()
method to create a writer object that will perform the writing for us. We will then use its writerow()
method to write a row in our CSV.
with open('example_one.csv', mode='w', newline='') as fh: # Open our file.
csv_writer = csv.writer(fh) # Create our writer object.
csv_writer.writerow(['Frank', 'Corso']) # Write our row of data.
Each item in the list given to writerow()
will be in a different column within the CSV.
Now that we have our CSV created, we can now read it to get the first row. This time, we will use csv's reader()
method to create a reader object which can be iterated over.
with open('example_one.csv', newline='') as fh: # Open our file.
csv_reader = csv.reader(fh) # Create our reader object.
for row in csv_reader: # Iterate over each row.
print(row) # Print entire row
print(row[0], row[1]) # Print only certain columns
['Frank', 'Corso']
Frank Corso
Each row
in this example would be one row within the CSV file with each item corresponding to a column in the row.
Reading and Writing CSV Files With Headers
Instead of using writer()
and reader()
, we can use DictReader()
and DictWriter()
when working with headers in our CSV file.
If we wanted to write the same data from above but set the columns as "FirstName" and "LastName", our code would look very similar. The main difference is that DictWriter()
requires fieldnames to know the names of the columns.
with open('example_two.csv', mode='w', newline='') as fh: # Open our file.
columns = ['FirstName', 'LastName'] # Define our columns.
csv_writer = csv.DictWriter(fh, fieldnames=columns) # Create our writer object.
csv_writer.writeheader() # Write our header row.
csv_writer.writerow({'FirstName': 'Frank', 'LastName': 'Corso'}) # Write our row of data.
If the CSV you are reading from has headers, you can use the DictReader()
method instead to not only skip the header row but also create dictionaries for the rows instead of just lists.
We will use similar code to before with only a slight change:
with open('example_two.csv', newline='') as fh:
csv_reader = csv.DictReader(fh) # Create our reader object, automatically using the first row as the column names.
for row in csv_reader: # Iterate over each row.
print(row) # Print entire row
print(row['FirstName'], row['LastName']) # Print only certain columns
{'FirstName': 'Frank', 'LastName': 'Corso'}
Frank Corso
Next Steps
Now that you can read and write CSV files, what will you do with them? You can:
- download data from an API and save it to a CSV
- clean data from a raw CSV and create a new "processed" CSV
- analyze data from a CSV
- and much more!
If you are analyzing data from CSV's, your next step may be learning pandas, the data analysis module, which can load CSV files and Excel files.
Top comments (0)