DEV Community

Main
Main

Posted on • Originally published at pynerds.com on

csv module in Python

The csvmodule provide tools for working with and manipulating csv(comma separated values) files.

csv files are used to store tabular data in plain text formats. csv data consists of records and each record is made up of fields. Normally, a line in a csv file represents a single record while fields in a record are separated by commas.

The following is an example of csv data.


first_name,last_name,age,gender
John,Main,30,male
Jane,Doe,24,female
Morrison,Smith,22,male
Brian,Gates,32,male
Mary,Reagan,25,female
Enter fullscreen mode Exit fullscreen mode

Typically, files with a .csvextension are normally associated with csv data.

Reading csv data

To read data from a csv file, use the reader()function. The function has the following basic syntax:


csv.reader(iterable)
Enter fullscreen mode Exit fullscreen mode

The reader()function takes any iterable that yields a line during each iteration. This can be a file object, a list or any other iterable made of lines. It returns an iterator of lists, where each list represents a record in the csv data.

Consider if the previous csv data exists in a file called demo.csv.


import csv

with open('demo.csv') as file:
  reader = csv.reader(file)

  for row in reader:
     print(row)
Enter fullscreen mode Exit fullscreen mode

['first_name', 'last_name', 'age', 'gender']

['John', 'Main', '30', 'male']

['Jane', 'Doe', '24', 'female']

['Morrison', 'Smith', '22', 'male']

['Brian', 'Gates', '32', 'male']

['Mary', 'Reagan', '25', 'female']

In the above example, we used data stored in a file object but as earlier mentioned, you can literally use any iterable. In the following example, we use a list instead of a file object.


import csv

data = """first_name,last_name,age,gender
John,Main,30,male
Jane,Doe,24,female
Morrison,Smith,22,male
Brian,Gates,32,male
Mary,Reagan,25,female"""

reader = csv.reader(data.splitlines())
for row in reader:
  print(row)
Enter fullscreen mode Exit fullscreen mode

['first_name', 'last_name', 'age', 'gender']

['John', 'Main', '30', 'male']

['Jane', 'Doe', '24', 'female']

['Morrison', 'Smith', '22', 'male']

['Brian', 'Gates', '32', 'male']

['Mary', 'Reagan', '25', 'female']

Writing csv data

The writer() function creates an object for writing csv data. It has the following basic syntax:


csv.writer(fileobj)
Enter fullscreen mode Exit fullscreen mode

Where the fileobjis the opened file where the csv data will be written to.

After creating the writer object, we can then usewriterow() method to write a single record into the file or writerows() to write multiple rows at once.


import csv

with open('demo.csv', 'a') as file:
  writer = csv.writer(file)

  writer.writerow(('Ruth', 'Miller', 24, 'female'))
  writer.writerow(('Brandy', 'Jones', 22, 'male'))
Enter fullscreen mode Exit fullscreen mode

Note that in the above example, we opened the demo.csv file with the "a" mode. This ensures that the new rows are appended to the file, if we had used the "w" mode instead, the existing rows would have been overwritten.


import csv

with open('demo.csv') as file:
  print(file.read())
Enter fullscreen mode Exit fullscreen mode

first_name,last_name,age,gender

John,Main,30,male

Jane,Doe,24,female

Morrison,Smith,22,male

Brian,Gates,32,male

Mary,Reagan,25,female

Ruth,Miller,24,female

Brandy,Jones,22,male

As you can see above, the new records have been added successfully.

To add multiple rows at once, use the writer.writerows() method.


import csv

rows = [
   ('Stephen', 'King', 32, 'male'),
   ('Molly', 'Jones', 22, 'female')
  ]

with open('demo.csv', 'a') as file:
  writer = csv.writer(file)
  writer.writerows(rows)
Enter fullscreen mode Exit fullscreen mode

Quoting behavior

In cases where you want records of certain types to be stored in quotes, you can specify the quotingflag when creating the writer object. For example, if you want string items to be in quotes but numeric values not to be, you can specify the quoting parameter as csv.QUOTE_NONNUMERIC.


import csv

rows = [
   ('Stephen', 'King', 32, 'male'),
   ('Molly', 'Jones', 22, 'female')
  ]

with open('demo.csv', 'w') as file:
  writer = csv.writer(file, quoting = csv.QUOTE_NONNUMERIC)
  writer.writerows(rows)
Enter fullscreen mode Exit fullscreen mode

If you open thedemo.csvfile, you will see that in the written records , string fields are quoted.


import csv

with open('demo.csv') as file:
  print(file.read())
Enter fullscreen mode Exit fullscreen mode

"Stephen","King",32,"male"

"Molly","Jones",22,"female"

All the quoting options are as shown in the following table:

| QUOTE_ALL | Quote any field regardless of type |
| QUOTE_NONE | Don't quote any field. |
| QUOTE_MINIMAL | Quote only fields containing some special characters. This is the default quoting behavior. |
| QUOTE_NONNUMERIC | Quote all fields except those with numeric values like integers and floats. |

csv dialects

Separating csv fields with a comma is just the popular convention, there is really no well defined standard on csv format. For example, you could as well separate the fields with spaces instead of commas. This means that any csv parser needs to be very flexible when dealing with csv delimiters.

Dialects allow us to specify to the parser the delimiter and other important parameters that will be used when parsing the csv files. The csv.list_dialects() function returns a list of the registered dialects.


import csv

print(csv.list_dialects())
Enter fullscreen mode Exit fullscreen mode

['excel', 'excel-tab', 'unix']

The default dialect is excel, in which the fields are delimited with commas.

If for example, you have csv data in which the fields are delimited by a pipe character"|" or any other character instead of a comma, as shown below:


first_name|last_name|age|gender
John|Main|30|male
Jane|Doe|24|female
Morrison|Smith|22|male
Brian|Gates|32|male
Mary|Reagan|25|female
Enter fullscreen mode Exit fullscreen mode

You can register a new dialect using thecsv.register_dialect(). You can then use the registered dialect with readerand writerobjects.


import csv

data = """first_name|last_name|age|gender
John|Main|30|male
Jane|Doe|24|female
Morrison|Smith|22|male
Brian|Gates|32|male
Mary|Reagan|25|female"""

csv.register_dialect('pipe', delimiter = '|')
print("Dialects: ", csv.list_dialects())

reader = csv.reader(data.splitlines(), dialect = 'pipe') #use the dialect

for row in reader:
  print(row)

Enter fullscreen mode Exit fullscreen mode

Apart from delimiter, you can specify other dialect parameters depending on the nature of your csv data. The supported parameters are shown in the following tables:

parameter default value description
delimiter , Field separator.
quotechar " The character to enclose fields where quoting is allowed.
lineterminator \r \n The string/character used to terminate a line
doublequote True Whether quotchar parameters are doubled.
escapechar None Character used to indicate an escape sequence
skipintialspace False Ignore whitespace after the delimiter

Automatic dialect detection

In cases where the csv format is not well known, we can use the csv.Snifferclass, which automatically constructs a dialect object given a sample of csv data.

Consider if we have the following csv data.


​first_name&last_name&age&gender
John&Main&30&male
Jane&Doe&24&female
Morrison&Smith&22&male
Brian&Gates&32&male
Mary&Reagan&25&female
Enter fullscreen mode Exit fullscreen mode

In the above example, we used the "&"character as the delimiter.


import csv

data="""first_name&last_name&age&gender
John&Main&30&male
Jane&Doe&24&female
Morrison&Smith&22&male
Brian&Gates&32&male
Mary&Reagan&25&female"""

sniffer=csv.Sniffer()
dialect=sniffer.sniff(sample=data)

print("delimiter: ", dialect.delimiter)

reader = csv.reader(data.splitlines(), dialect = dialect)
for row in reader:
  print(row)
Enter fullscreen mode Exit fullscreen mode

As you can see, the sniffer correctly detected that "&" is used as the delimiter character. However, you should keep in mind that the sniffer is not always correct, it can only be said to be making educated guesses.

Using named fields

You can usecsv.DictReader and csv.DictWriterclasses to use named fields when reading or writing csv data.

The two classes allows you to specify in advance a list of field names to alias the fields in the csv data.

csv.DictReader

The DictReaderclass allows you to read rows with the fields having an alias name.

Consider the following example:


import csv

data = """first_name,last_name,age,gender
John,Main,30,male
Jane,Doe,24,female
Morrison,Smith,22,male
Brian,Gates,32,male
Mary,Reagan,25,female"""

fields = 'fname', 'lname', 'age', 'gender'

reader = csv.DictReader(data.splitlines(), fieldnames = fields)
for row in reader:
  print(row['fname'], row['lname'])
Enter fullscreen mode Exit fullscreen mode

Iterating through the DictReaderobject will yield a list of dictionaries, where each dictionary represents a row in the csv data.

csv.DictWriter

DictWriter objects writes data into a csv file with argument given as dictionaries. Consider the following example:


import csv

fields = 'fname', 'lname', 'age', 'gender'
data_to_wite = [
    {'fname':'Stephen', 'lname':'King', 'age':32, 'gender':'male'},
    {'age':22, 'lname':'Jones', 'fname':'Molly', 'gender':'female'},
]

with open('pynerds.txt', 'w') as file:
  writer = csv.DictWriter(file, fieldnames = fields)
  writer.writerows(data_to_wite)
Enter fullscreen mode Exit fullscreen mode

As shown above, when using named fields, the order is not important as long as the field names are correct.


import csv

with open('demo.csv') as file:
  print(file.read())
Enter fullscreen mode Exit fullscreen mode

Stephen,King,32,male

Molly,Jones,22,female

Related articles


Top comments (0)