loading...
Cover image for Parsing CSV Files in Node.js with fs.createReadStream() and csv-parser

Parsing CSV Files in Node.js with fs.createReadStream() and csv-parser

isalevine profile image Isa Levine ・5 min read

Junior JavaScript Jobhunting: Quick Tips for Technicals and Takehomes (5 Part Series)

1) Learning JavaScript Testing Quickly with Mocha, Chai, and Sinon (and a Lot of Gaps) 2) Passing Command-Line Arguments in Node.js 3) Three Ways to Retrieve JSON from the Web using Node.js 4) A Quick Review of JavaScript’s Array.prototype.sort() Method 5) Parsing CSV Files in Node.js with fs.createReadStream() and csv-parser

Cover image credit: Hunter x Hunter manga by Yoshihiro Togashi, meme-ified by yours truly. <3

Ah, good ol' CSV files: spreadsheet/database data rendered as pure text. Beautiful!

Parsing CSV (Comma-Separated Value) files involves reading a sequence of records row-by-row. Typically (but not necessarily), the first row is interpreted as the header for each column. All rows after the first are considered individual records, with each comma-separated-value belonging to a single column. Because of this, each row of a CSV must have the exact same number of values as each other row. (OK, I'm realizing that writing about CSVs is very tricky--see below for a visualization!)

The Challenge

In a recent takehome challenge, I was asked to parse a CSV of purchases that included:

  • date (as a String)
  • item (as an Integer) -- for this example, we'll use candy!
  • price (as a Float)
  • name (as a String)

Here's the example code we'll be working with (and a reveal about my recent TV-watching):

Raw CSV:

PURCHASE DATE,CANDY PURCHASED,CASH PAID,BUYER NAME
2016-04-03,1000,10000.11,Charlie Kelly
2017-11-14,1000,12000.22,Frank Reynolds
2018-01-20,2000,40000.33,Frank Reynolds
2018-03-20,2000,40000.44,Mac
2019-01-02,2000,50000.55,Sweet Dee
2019-01-02,1500,13500.66,Dennis Reynolds

More readable formatting:

PURCHASE DATE, | CANDY PURCHASED, | CASH PAID, | BUYER NAME
---------------|------------------|------------|-----------------
2016-04-03,    | 1000,            | 10000.11,  | Charlie Kelly
2017-11-14,    | 1000,            | 12000.22,  | Frank Reynolds
2018-01-20,    | 2000,            | 40000.33,  | Frank Reynolds
2018-03-20,    | 2000,            | 40000.44,  | Mac
2019-01-02,    | 2000,            | 50000.55,  | Sweet Dee
2019-01-02,    | 1500,            | 13500.66,  | Dennis Reynolds

Our Tools

We'll use two particular Node.js tools to accomplish our goal:

  1. The Node.js fs (file system) module, specifically the fs.createReadStream() method
  2. The npm package, csv-parser, which will convert our CSV into JSON

Since the fs module is native to Node.js, no external packages are needed. For our csv-parser npm package, go ahead and install it by running $ npm install csv-parser in your terminal.

Let's start off our app.js code by importing both with require:

// app.js

const csvParser = require('csv-parser');
const fs = require('fs');

Importing csv-parser this way will allow us to call csvParser() as a method, which will allow us to pipe in its data row-by-row. Now we're ready to dive in!

Reading the CSV File

gif of anime hand turning pages on a black book with a pentagram on the cover
Let's read some external data!

The CSV data above is currently stored in example_data.csv, located in the same directory as app.js. Let's create a variable that holds the filepath:

// app.js

const filepath = './example_data.csv'

Now let's put that fs.createReadStream() method to use:

fs.createReadStream(filepath)
    .on('error', () => {
        // handle error
    })

    .pipe(csvParser())
    .on('data', (row) => {
        // use row data
    })

    .on('end', () => {
        // handle end of CSV
    })

Let's break down each step:

  • fs.createReadStream(filepath) initializes the fs.ReadStream object -- since this is under-the-hood, we won't interact with fs.ReadStream explicitly.
  • .on('error', () => {} ) checks for errors with the given filepath before we start trying to pipe in its data.
  • .pipe(csvParser()) begins to pipe data into our ReadStream, which is now listening for the next two events:
    • .on('data', (row) => {} ) returns each line of the CSV row-by-row, accessible in its callback as row.
    • .on('end', () => {} ) listens for the end of the CSV. You can use this event to call methods you need after the entire file is read.

csv-parser converts CSV data to JSON

Let's take a look at how csv-parser is converting each row into usable JSON data.

Remember, the first line of a CSV is recorded as the column-headers! In our JSON, this will show up as the keys of each row Object, with their values being the data in their respective fields. This will make accessing the data in each row much simpler.

Here's our complete code:

const csv = require('csv-parser');
const fs = require('fs');

const filepath = "./example_data.csv"

fs.createReadStream(filepath)
    .on('error', () => {
        // handle error
    })

    .pipe(csv())
    .on('data', (row) => {
        console.log(row);
    })

    .on('end', () => {
        // handle end of CSV
    })

Running $ node app.js outputs:

// $ node app.js

Row {
'PURCHASE DATE': '2016-04-03',
'CANDY PURCHASED': '1000',
'CASH PAID': '10000.11',
'BUYER NAME': 'Charlie Kelly' }
Row {
'PURCHASE DATE': '2017-11-14',
'CANDY PURCHASED': '1000',
'CASH PAID': '12000.22',
'BUYER NAME': 'Frank Reynolds' }
Row {
'PURCHASE DATE': '2018-01-20',
'CANDY PURCHASED': '2000',
'CASH PAID': '40000.33',
'BUYER NAME': 'Frank Reynolds' }
Row {
'PURCHASE DATE': '2018-03-20',
'CANDY PURCHASED': '2000',
'CASH PAID': '40000.44',
'BUYER NAME': 'Mac' }
Row {
'PURCHASE DATE': '2019-01-02',
'CANDY PURCHASED': '2000',
'CASH PAID': '50000.55',
'BUYER NAME': 'Sweet Dee' }
Row {
'PURCHASE DATE': '2019-01-02',
'CANDY PURCHASED': '1500',
'CASH PAID': '13500.66',
'BUYER NAME': 'Dennis Reynolds' }

A couple things to note:

  • csv-parser does some pretty-printing for its console.log()-ing of the row-JSON. Keep this in mind if you have a challenge where the console output must match a specific format!
  • The keys from the first row have been stored as Strings, so in order to access each value, we must use the bracket notation: row["PURCHASE DATE"]. If the titles in the first row are single words with no special characters or spaces (i.e. if we had DATE instead of PURCHASE DATE in the first row), they can be accessed with dot notation: row.DATE

Using the row data

Let's revise our .on('data', (row) => {} ) callback to use the row's data, and print out a sentence describing each candy-purchase:

    .pipe(csv())
    .on('data', (row) => {
        let str = `${row["BUYER NAME"]} bought ${row["CANDY PURCHASED"]} pieces of candy on ${row["PURCHASE DATE"]} and paid $${row["CASH PAID"]}.`;
        console.log(str)
    })

Of course, you could use destructuring on each JSON-row to avoid writing out row["PURCHASE DATE"] and the other verbose key names inside the string...but I'll save that for another article. ;)

Running $ node app.js now outputs:

// $ node app.js

Charlie Kelly bought 1000 pieces of candy on 2016-04-03 and paid $10000.11.
Frank Reynolds bought 1000 pieces of candy on 2017-11-14 and paid $12000.22.
Frank Reynolds bought 2000 pieces of candy on 2018-01-20 and paid $40000.33.
Mac bought 2000 pieces of candy on 2018-03-20 and paid $40000.44.
Sweet Dee bought 2000 pieces of candy on 2019-01-02 and paid $50000.55.
Dennis Reynolds bought 1500 pieces of candy on 2019-01-02 and paid $13500.66.

Voila! CSV parsed!

Conclusion

Working with CSVs in Node.js is very easy and straightforward with the csv-parser package. However, there are plenty of other ways to approach this, including numerous other npm package options (surprise surprise), as well as purely using the fs module.

This particular code snippet from Dev.to user ItsASine does a great job of illustrating how to use fs only -- long story short, use fs.createReadStream() to return a Promise that resolves to a utf-8 encoded String. However, you MUST handle the first row headers and newline characters manually!

Have any other strategies to recommend? Please feel free to share and comment below!

Junior JavaScript Jobhunting: Quick Tips for Technicals and Takehomes (5 Part Series)

1) Learning JavaScript Testing Quickly with Mocha, Chai, and Sinon (and a Lot of Gaps) 2) Passing Command-Line Arguments in Node.js 3) Three Ways to Retrieve JSON from the Web using Node.js 4) A Quick Review of JavaScript’s Array.prototype.sort() Method 5) Parsing CSV Files in Node.js with fs.createReadStream() and csv-parser

Posted on by:

isalevine profile

Isa Levine

@isalevine

Isa (ee-suh). She/her pronouns. Full stack developer working with Rails and Vue. Drinks too much bubbly water.

Discussion

markdown guide
 

I'm running this. Is there something weird with Dates and CSV, for some reason when I try to console.log my statements, Date reads as undefined and the others read fine.

 

Hi Willina! As-is, the dates should just be processed as strings, so if the other string is printing fine, the date should too...

Just to test, I created a new directory, did a local install of csv-parser, and here's the code I'm running (copied verbatim):

// app.js

const csv = require('csv-parser');
const fs = require('fs');

const filepath = "./example_data.csv"

fs.createReadStream(filepath)
  .on('error', () => {
    // handle error
  })

  .pipe(csv())
  .on('data', (row) => {
    let str = `${row["BUYER NAME"]} bought ${row["CANDY PURCHASED"]} pieces of candy on ${row["PURCHASE DATE"]} and paid $${row["CASH PAID"]}.`;
    console.log(str)
  })

  .on('end', () => {
    // handle end of CSV
  })



// example_data.csv

PURCHASE DATE,CANDY PURCHASED,CASH PAID,BUYER NAME
2016-04-03,1000,10000.11,Charlie Kelly
2017-11-14,1000,12000.22,Frank Reynolds
2018-01-20,2000,40000.33,Frank Reynolds
2018-03-20,2000,40000.44,Mac
2019-01-02,2000,50000.55,Sweet Dee
2019-01-02,1500,13500.66,Dennis Reynolds



// console

Isas-MacBook-Pro:node-csv-parser isalevine$ node app.js 
Charlie Kelly bought 1000 pieces of candy on 2016-04-03 and paid $10000.11.
Frank Reynolds bought 1000 pieces of candy on 2017-11-14 and paid $12000.22.
Frank Reynolds bought 2000 pieces of candy on 2018-01-20 and paid $40000.33.
Mac bought 2000 pieces of candy on 2018-03-20 and paid $40000.44.
Sweet Dee bought 2000 pieces of candy on 2019-01-02 and paid $50000.55.
Dennis Reynolds bought 1500 pieces of candy on 2019-01-02 and paid $13500.66.

Any discrepancies between the code above and what you're running? Feel free to reply with the code itself and I can take a look :)

 

I'm sorry, I was running this with my own data, everything shows up except the date which is formatted like this: 2017:12:16 12:45. The header shows up, but when I try to access the rows, I get undefined. Could the formatting be the issue?