Cover image credit: Hunter x Hunter manga by Yoshihiro Togashi, meme-ified by yours truly. <3
Ah, good ol' CSV files: spreadsheet/database data rendered as pure text. Beautiful!
Parsing CSV (Comma-Separated Value) files involves reading a sequence of records row-by-row. Typically (but not necessarily), the first row is interpreted as the header for each column. All rows after the first are considered individual records, with each comma-separated-value belonging to a single column. Because of this, each row of a CSV must have the exact same number of values as each other row. (OK, I'm realizing that writing about CSVs is very tricky--see below for a visualization!)
The Challenge
In a recent takehome challenge, I was asked to parse a CSV of purchases that included:
- date (as a String)
- item (as an Integer) -- for this example, we'll use candy!
- price (as a Float)
- name (as a String)
Here's the example code we'll be working with (and a reveal about my recent TV-watching):
Raw CSV:
PURCHASE DATE,CANDY PURCHASED,CASH PAID,BUYER NAME
2016-04-03,1000,10000.11,Charlie Kelly
2017-11-14,1000,12000.22,Frank Reynolds
2018-01-20,2000,40000.33,Frank Reynolds
2018-03-20,2000,40000.44,Mac
2019-01-02,2000,50000.55,Sweet Dee
2019-01-02,1500,13500.66,Dennis Reynolds
More readable formatting:
PURCHASE DATE, | CANDY PURCHASED, | CASH PAID, | BUYER NAME
---------------|------------------|------------|-----------------
2016-04-03, | 1000, | 10000.11, | Charlie Kelly
2017-11-14, | 1000, | 12000.22, | Frank Reynolds
2018-01-20, | 2000, | 40000.33, | Frank Reynolds
2018-03-20, | 2000, | 40000.44, | Mac
2019-01-02, | 2000, | 50000.55, | Sweet Dee
2019-01-02, | 1500, | 13500.66, | Dennis Reynolds
Our Tools
We'll use two particular Node.js tools to accomplish our goal:
- The Node.js fs (file system) module, specifically the fs.createReadStream() method
- The npm package, csv-parser, which will convert our CSV into JSON
Since the fs
module is native to Node.js, no external packages are needed. For our csv-parser npm package, go ahead and install it by running $ npm install csv-parser
in your terminal.
Let's start off our app.js
code by importing both with require
:
// app.js
const csvParser = require('csv-parser');
const fs = require('fs');
Importing csv-parser
this way will allow us to call csvParser()
as a method, which will allow us to pipe in its data row-by-row. Now we're ready to dive in!
Reading the CSV File
Let's read some external data!
The CSV data above is currently stored in example_data.csv
, located in the same directory as app.js
. Let's create a variable that holds the filepath:
// app.js
const filepath = './example_data.csv'
Now let's put that fs.createReadStream() method to use:
fs.createReadStream(filepath)
.on('error', () => {
// handle error
})
.pipe(csvParser())
.on('data', (row) => {
// use row data
})
.on('end', () => {
// handle end of CSV
})
Let's break down each step:
-
fs.createReadStream(filepath)
initializes thefs.ReadStream
object -- since this is under-the-hood, we won't interact withfs.ReadStream
explicitly. -
.on('error', () => {} )
checks for errors with the given filepath before we start trying to pipe in its data. -
.pipe(csvParser())
begins to pipe data into our ReadStream, which is now listening for the next two events:-
.on('data', (row) => {} )
returns each line of the CSV row-by-row, accessible in its callback asrow
. -
.on('end', () => {} )
listens for the end of the CSV. You can use this event to call methods you need after the entire file is read.
-
csv-parser converts CSV data to JSON
Let's take a look at how csv-parser
is converting each row into usable JSON data.
Remember, the first line of a CSV is recorded as the column-headers! In our JSON, this will show up as the keys of each row
Object, with their values being the data in their respective fields. This will make accessing the data in each row much simpler.
Here's our complete code:
const csv = require('csv-parser');
const fs = require('fs');
const filepath = "./example_data.csv"
fs.createReadStream(filepath)
.on('error', () => {
// handle error
})
.pipe(csv())
.on('data', (row) => {
console.log(row);
})
.on('end', () => {
// handle end of CSV
})
Running $ node app.js
outputs:
// $ node app.js
Row {
'PURCHASE DATE': '2016-04-03',
'CANDY PURCHASED': '1000',
'CASH PAID': '10000.11',
'BUYER NAME': 'Charlie Kelly' }
Row {
'PURCHASE DATE': '2017-11-14',
'CANDY PURCHASED': '1000',
'CASH PAID': '12000.22',
'BUYER NAME': 'Frank Reynolds' }
Row {
'PURCHASE DATE': '2018-01-20',
'CANDY PURCHASED': '2000',
'CASH PAID': '40000.33',
'BUYER NAME': 'Frank Reynolds' }
Row {
'PURCHASE DATE': '2018-03-20',
'CANDY PURCHASED': '2000',
'CASH PAID': '40000.44',
'BUYER NAME': 'Mac' }
Row {
'PURCHASE DATE': '2019-01-02',
'CANDY PURCHASED': '2000',
'CASH PAID': '50000.55',
'BUYER NAME': 'Sweet Dee' }
Row {
'PURCHASE DATE': '2019-01-02',
'CANDY PURCHASED': '1500',
'CASH PAID': '13500.66',
'BUYER NAME': 'Dennis Reynolds' }
A couple things to note:
-
csv-parser
does some pretty-printing for itsconsole.log()
-ing of the row-JSON. Keep this in mind if you have a challenge where the console output must match a specific format! - The keys from the first row have been stored as Strings, so in order to access each value, we must use the bracket notation:
row["PURCHASE DATE"]
. If the titles in the first row are single words with no special characters or spaces (i.e. if we had DATE instead of PURCHASE DATE in the first row), they can be accessed with dot notation:row.DATE
Using the row data
Let's revise our .on('data', (row) => {} )
callback to use the row's data, and print out a sentence describing each candy-purchase:
.pipe(csv())
.on('data', (row) => {
let str = `${row["BUYER NAME"]} bought ${row["CANDY PURCHASED"]} pieces of candy on ${row["PURCHASE DATE"]} and paid $${row["CASH PAID"]}.`;
console.log(str)
})
Of course, you could use destructuring on each JSON-row to avoid writing out row["PURCHASE DATE"]
and the other verbose key names inside the string...but I'll save that for another article. ;)
Running $ node app.js
now outputs:
// $ node app.js
Charlie Kelly bought 1000 pieces of candy on 2016-04-03 and paid $10000.11.
Frank Reynolds bought 1000 pieces of candy on 2017-11-14 and paid $12000.22.
Frank Reynolds bought 2000 pieces of candy on 2018-01-20 and paid $40000.33.
Mac bought 2000 pieces of candy on 2018-03-20 and paid $40000.44.
Sweet Dee bought 2000 pieces of candy on 2019-01-02 and paid $50000.55.
Dennis Reynolds bought 1500 pieces of candy on 2019-01-02 and paid $13500.66.
Voila! CSV parsed!
Conclusion
Working with CSVs in Node.js is very easy and straightforward with the csv-parser
package. However, there are plenty of other ways to approach this, including numerous other npm package options (surprise surprise), as well as purely using the fs
module.
This particular code snippet from Dev.to user ItsASine does a great job of illustrating how to use fs
only -- long story short, use fs.createReadStream()
to return a Promise that resolves to a utf-8 encoded String. However, you MUST handle the first row headers and newline characters manually!
Have any other strategies to recommend? Please feel free to share and comment below!
Top comments (11)
Hi Isa!
Great code!
Thank you for sharing that!
I am using this code, but I'm stuck...
I did the imports and everything, and I push the csv processed data into an array. After the csv processing I with to consume the data from the array, but the array is empty. Please see my code below:
The console.logs show me that the csv processing is going fine, but the last console.log that should show me the final array with the comment 'transactionsArray outside loop: ' is printed before the processing occurs:
csvFilePath: E:\tmp\transactions.csv
transactionsArray outside loop: []
{ title: 'Loan', type: 'income', value: '1500', category: 'Others' }
transactionsArray inside loop: [
{ title: 'Loan', type: 'income', value: '1500', category: 'Others' }
]
{
title: 'Cellphone',
type: 'outcome',
value: '50',
category: 'Others'
}
transactionsArray inside loop: [
{ title: 'Loan', type: 'income', value: '1500', category: 'Others' },
{
title: 'Cellphone',
type: 'outcome',
value: '50',
category: 'Others'
}
]
{ title: 'Ice cream', type: 'outcome', value: '3', category: 'Food' }
transactionsArray inside loop: [
{ title: 'Loan', type: 'income', value: '1500', category: 'Others' },
{
title: 'Cellphone',
type: 'outcome',
value: '50',
category: 'Others'
},
{ title: 'Ice cream', type: 'outcome', value: '3', category: 'Food' }
]
Csv file successfully processed!
I understand that fs.createReadStream is an async operation, but I am struggling on how to make the code wait for that to finish, before going on with the execution.
Please help!
Thank you!
Hi Lejepra! I didn't actually use
async / await
in my own example code, so I'm basically guessing here.I'd try moving the
async
to before thefs.createReadStream
call, and seeing if that forces theawait
call to let the CSV-parsing finish.Let me know if you're still working on this, and what you find out! :)
Hi. I came across this situation in an academic project and we had to use the traditional Promise() constructor.
csv-parser does not support promises natively. Same for FS stream and console.log methods
Your code would be more or less like this, trying as hard as possible to make it look like it.
I'll take the opportunity and thank the author for the post that clarified other things for me.
Thank you!
I have fixed this issue by using Promices,
that issue because the method in createReadStream is executed asynchronously
function createReadStream(){
var csvData = [];
return new Promise(resolve => {
fs.createReadStream('myfile.csv')
.pipe(csv())
.on('data', (data) => csvData.push(data))
.on('end', () => {
resolve(csvData)
});
})
}
const finalData = await createReadStream()
I tried the code above but it is telling me
TypeError: csv is not a function
at Object. (C:\Users\user\Desktop\Kepler Project\index.js:19:9)
at Module._compile (node:internal/modules/cjs/loader:1126:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1180:10)
at Module.load (node:internal/modules/cjs/loader:1004:32)
at Function.Module._load (node:internal/modules/cjs/loader:839:12)
at Function.executeUserEntryPoint as runMain
at node:internal/main/run_main_module:17:47
I'm running this. Is there something weird with Dates and CSV, for some reason when I try to console.log my statements, Date reads as undefined and the others read fine.
Hi Willina! As-is, the dates should just be processed as strings, so if the other string is printing fine, the date should too...
Just to test, I created a new directory, did a local install of
csv-parser
, and here's the code I'm running (copied verbatim):Any discrepancies between the code above and what you're running? Feel free to reply with the code itself and I can take a look :)
I'm sorry, I was running this with my own data, everything shows up except the date which is formatted like this: 2017:12:16 12:45. The header shows up, but when I try to access the rows, I get undefined. Could the formatting be the issue?
Hi Willina, I'm so sorry I missed your reply!! I hope you were able to get unstuck--if not, I'd be happy to take a look at the data you were using, and see if we can find what's resulting in
undefined
! :)Try this package:
npmjs.com/package/select-csv
Hello, If I use this code in a helper function, how can I return the array to the main function without doing console.log? Thank you for helping