Extracting files from large (i.e. > 1Gb) .zip files can be a challenging task specially when resources are limited or when you are billed based on the amount of memory used and execution time (as it is the case with Lambdas).
Most Node.js packages that are used for this task work as follows: load the entire file in memory and then extract its contents. This results in a huge memory footprint (as big as the file itself) and long execution times.
Unzipper package, on the other hand, works based on Node.js streams. In short, streams allow us to process (read/write) data in chunks, keeping memory footprint as well as execution time very low.
The following snippet shows an example of the usage of this package.
const AWS = require("aws-sdk");
const s3 = new AWS.S3({ apiVersion: "2006-03-01" });
const unzipper = require("unzipper");
exports.handler = async (event) => {
//...initialize bucket, filename and target_filename here
try {
/**
* Step 1: Get stream of the file to be extracted from the zip
*/
const file_stream = s3
.getObject({ Bucket: bucket, Key: filename })
.createReadStream()
.on("error", (e) => console.log(`Error extracting file: `, e))
.pipe(
unzipper.ParseOne("file_name_inside_zip.ext", {
forceStream: true,
})
);
/**
* Step 2: upload extracted stream back to S3: this method supports a readable stream in the Body param as per
* https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property
*/
await s3
.upload({ Bucket: bucket, Key: target_filename, Body: file_stream })
.promise();
} catch (error) {
console.log("Error: ", error.message, error.stack);
}
};
Hope this helps!
Top comments (0)