Hey all, this is my first blog, I have recently started working as a full stack web developer at Adda52Rummy and I was assigned with a task to insert some records in database, which I thought I handled appropriately. But my manager gave me csv file of 5K records to insert to database. It was synchronous code and took some time.
About tech Stack, I was working in Node.js with MariaDB and Fastify framework (performs better than express and other sever frameworks or library). I used Sequelize as ORM library, it is a promise-based Node.js ORM for Postgres, MySQL, MariaDB, SQLite and Microsoft SQL Server. To improve the performance of my operations, I needed to run insert operations in multiple batches and those batches had to run in parallel. Aha! Sound nice, but to how to implement it. After spending ample time of searching I found some solution on how to do it.
Thanks to Evan Siroky.
The solution was to use bulkCreate
to insert multiple batches and use asnyc.cargo
function to run batches in parallel. After using this I was able to insert data in 2 seconds and guess what, I was able to insert 100K records in 5 ~ 7 seconds.
Let's jump straight to code:
In my controllers, coupon.js
const Coupon = require("../models/coupon");
// exports.createCoupon = async (row) => {
// await Coupon.sync();
// return Coupon.create({
// coupon_code: row.coupon_code,
// });
// };
// Here I am using bulkCreate
exports.createCoupon = (tasks, inserterCb) => {
Coupon.bulkCreate(tasks).then(function () {
inserterCb();
});
};
I have created a utility folder, where I have created a csv-to-db.js_ file:
const fs = require("fs");
const path = require("path");
const async = require("async");
const csv = require("csv");
function csvToDb(filename, fn) {
let input = fs.createReadStream(
path.resolve(__dirname, "../public/assets", filename)
);
let parser = csv.parse({
columns: true,
relax: true,
});
let inserter = async.cargo(function (tasks, inserterCb) {
fn(tasks, inserterCb);
}, 1000 // -> task size);
/* you can specify how many task you want to handle, I have set it to
1000 tasks at a time parallel, I can say for 100K let's handle
50K - 50K to take much less time, but it doesn't work that way,
it will take maybe two or more seconds to execute input of size 50K,
as it is large input. So take a size which execute faster but handles
many task as possible. */
parser.on("readable", function () {
while ((line = parser.read())) {
inserter.push(line);
}
});
parser.on("error", function (err) {
throw err;
});
parser.on("end", function (count) {
inserter.drain();
});
input.pipe(parser);
}
module.exports = csvToDb;
Now, when I call this route 'http::/localhost:3000/coupon', this is the code that gets executed.
const Coupon = require("../controllers/coupon");
module.exports = function routes(fastify, options, done) {
fastify.get("/coupon", async (req, res) => {
csvToDb("filename.csv", Coupon.createCoupon);
return { statusCode: 200, message: "sucess" };
});
done();
}
Thanks to read out.
All references:
Top comments (3)
what format csv delimeter ? can you provide file.csv ??
It had one column.
dev-to-uploads.s3.amazonaws.com/up...