I recently had to fix a Node.js lambda function that was abruptly stopped by AWS before it completed processing, reaching the maximum memory given to the function. On a bare metal server you can add one more RAM module and hope for the best. In a serverless environment there are limitations. In AWS in particular, the most you can give to a lambda function is 3,008 MB. Quite enough you would think... And you would be wrong, as we were.
This function was not particular complicated. It had to parse a CSV and for each record do a bunch of things: validate, read something from DynamoDB then do one or two writes per record, depending on some data.
The complicated thing was that it had to wait until all rows were processed and return a result: when the overall process completed and how many of the rows were processed successfully, how many gave an error and which error (validation).
The even more complicated thing was that at some point someone wanted to process a 70k records file. The 3,000 MB was not enough for this it seems.
Proposed solutions
1. Don't do serverless
Of course, the first thing we thought was to just move this outside of lambda. In AWS this could be done with ECS (Elastic Container Service). Could work, but it would add yet one more service to maintain and know about.
2. Split the CSV
Possible, but error prone. How few is too few and how do we ensure this is done? The CSVs were uploaded by a third party. Most synchronized nightly. Probably automated. Ugly.
3. Try to improve the code
Probably time consuming, but easily the solution that scales best if it proves to have effect.
Implementing solution #3
The code was pretty outdated, built on Node v6, with the well known callback hell, a bit managed with the famous async library.
Step 0: Refactor
Tasks:
- use a newer version of node
- rethink the logic
Up until recently AWS supported 6 and 8.10, so we went with 8, which brings support for Promises and native async/await to get rid of some of that callback hell.
The initial implementation had a pretty major issue: each record was processed individually, although it contained some data that was duplicate with other records. So there were duplicate reads from DynamoDB. A lot of them.
A better solution was to group the records by the common criteria and then process the groups in parallel and for each group all the records in parallel. Promise
and async
/await
FTW! The resulting code was way smaller, easier to understand, did ~90% less reads from the DB and ... still reached the memory limit.
Here I have the result from a demo repo I set up to test this (processsing 100 groups with 1000 records each):
$ node index.js
Memory used before processing all records: 9.17 MB
Memory used after processing all records: 92.79 MB
Process time: 3352.570ms
Step 1
After digging up what could eat up all the juicy RAM, it turns out that Promise is not that particularly memory friendly. Bluebird was suggested, so let's try it.
Changes required:
$ npm i bluebird
const Promise = require('bluebird');
Easy fix. Memory dropped. By ~30%. But the function still timed out for the big files. Not good.
Here's the test output:
$ node index.js
Memory used before processing all records: 9.3 MB
Memory used after processing all records: 67.32 MB
Process time: 3169.421ms
Step 2
It turns out that waiting for all the promises to continue means that we have all those promises stored in memory. Go figure...
So we need to reduce the number of requests we do in parallel. Bluebird to the rescue again, with Promise.map. Using the concurency
option of this function we can set how many concurrent items should be processed at a given time.
And the final test output:
$ node index.js
Memory used before processing all records: 9.29 MB
Memory used after processing all records: 17.34 MB
Process time: 30132.855ms
What's even better is that with this approach the memory peak is stable. It does not increase with the number of items there are to process, because after each batch of records is processed, the GC kicks in.
Granted, this did increase the total time it takes to process the entire set, but for this particular scenario we're only interested in not consuming all the memory.
The real world code uses ~400 MB of memory and processes 10k records in about 30 seconds. We deemed that acceptable.
Check the commits in this GitHub repository to follow the above steps:
Promise performance improvements in Node.js (v8.10.0)
The script tries to emulate processing a matrix of records. EG:
const records = [[1, 2], [3, 4]];
To know when all records are processed, we need to know when each row has been processed and when all rows have been processed.
Improvements
Step 0 (no improvements)
The idea is to handle each record with a promise and for each row, await Promise.all
the rows, returning only after all records in row have processed.
Then for the entire set, await Promise.all
the promises returned for the rows.
Observation
Memory usage is high. Script uses ~99.8MB and does not free up memory after each row has processed. Quite interesting...
Step 1
Looks like Bluebird could help: nodejs/node#6673
Changes required:
$ npm i bluebird
const Promise = require('bluebird');
Observation
Memory usage dropped…
Top comments (2)
Really interesting problem! Thanks for sharing!
I don’t know if this is of any help, but here might be an alternative way to solving your problem.
The gist is to process the requests in batches and as soon as one gets resolved, you go and resolve another promise that hasn’t been included yet, and so forth.
Best of luck! 🍻
Thanks for sharing your solution as well. I believe that what you did (
take3subtake1part1
) can be achieved by Bluebird'sPromise.map
with the concurrency option. From the docs:"The concurrency limit applies to Promises returned by the mapper function and it basically limits the number of Promises created. For example, if concurrency is 3 and the mapper callback has been called enough so that there are three returned Promises currently pending, no further callbacks are called until one of the pending Promises resolves. So the mapper function will be called three times and it will be called again only after at least one of the Promises resolves."