Dumping Data with Python's CSV DictWriter

#python #csv

I really love Python's csv module. But I do wish it was a little better documented.

The DictWriter lets you write CSV files very neatly and semantically by defining each row as a Python dict.

Here's the example right out of the docs:

import csv

with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})

Alas, it overlooks what I would consider a (if not the) standard use case, the one I come across all the time, in which I don't write the rows out one by one literally, but in a loop, more like:

import csv

data = [('Baked', 'Beans'),
        ('Lovely', 'Spam'),
        ('Wonderful', 'Spam'),
       ]

with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    for datum in data:
        writer.writerow({'first_name': datum[0], 
                         'last_name': datum[1]})

Of course:

The source won't generally be a list of tuples, rather an iterator of objects more generally, and I'm dumping some of their properties.
I will be working with a much longer list of fieldnames.

which, raises the spectre of repeating the same list of field names, most definitely not DRY, and highly undesirable and difficult to maintain as I tweak the list of fields I want to dump to CSV.

And yet I find no documentation that gets around that. So, having just nutted one out and tested it, it's worth putting down in a document (right here and now).

The Problem

The problem in a nutshell is that csv.DictWriter demands to know the fieldnames (and writer.writeheader() needs to have them known), but they are specified in the dictionary build inside the loop. fieldnames is not even an optional argument to csv.DictWriter and the writer is poorly documented.

The Solution

The solution rests in two empirically determined (for lack of documentation) facts:

csv.DictWriter accepts fieldnames=None
the writer if returns has a fieldnames attribute that can be set post creation.

To wit, this works beautifully:

import csv

data = [('Baked', 'Beans'),
        ('Lovely', 'Spam'),
        ('Wonderful', 'Spam'),
       ]

with open('names.csv', 'w', newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=None)

    for i, datum in enumerate(data):
        row = {'first_name': datum[0], 
               'last_name': datum[1]}

        if i == 0:
             writer.fieldnames = row.keys()
             writer.writeheader()

        writer.writerow(row)

And now the row keys are not doubly specified. And the CSV file receives its header row.

A simple paradigm once discovered, I am using for rapidly dumping CSV files describing objects, mostly for testing and study purposes. It means I can play with the row definition in situ, adding fields and changing fields etc, without having to change them in two places and I get the benefit of a DictWriter and its simple syntax for writing CSV files.