Unzip large files in AWS S3 using Lambda and Node.js

#aws #lambda #node #unzip

Extracting files from large (i.e. > 1Gb) .zip files can be a challenging task specially when resources are limited or when you are billed based on the amount of memory used and execution time (as it is the case with Lambdas).

Most Node.js packages that are used for this task work as follows: load the entire file in memory and then extract its contents. This results in a huge memory footprint (as big as the file itself) and long execution times.

Unzipper package, on the other hand, works based on Node.js streams. In short, streams allow us to process (read/write) data in chunks, keeping memory footprint as well as execution time very low.

The following snippet shows an example of the usage of this package.

const AWS = require("aws-sdk");
const s3 = new AWS.S3({ apiVersion: "2006-03-01" });
const unzipper = require("unzipper");

exports.handler = async (event) => {

  //...initialize bucket, filename and target_filename here
  try {
    /**
     * Step 1: Get stream of the file to be extracted from the zip
     */
    const file_stream = s3
      .getObject({ Bucket: bucket, Key: filename })
      .createReadStream()
      .on("error", (e) => console.log(`Error extracting file: `, e))
      .pipe(
        unzipper.ParseOne("file_name_inside_zip.ext", {
          forceStream: true,
        })
      );

    /**
     * Step 2: upload extracted stream back to S3: this method supports a readable stream in the Body param as per
     *  https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property
     */
    await s3
      .upload({ Bucket: bucket, Key: target_filename, Body: file_stream })
      .promise();

  } catch (error) {
    console.log("Error: ", error.message, error.stack);
  }

};

Hope this helps!

DEV Community

Unzip large files in AWS S3 using Lambda and Node.js

Top comments (0)

Read next

⚙️ "Dynamic Scaling & Performance: EKS Auto Mode Insights" 🔄

AWS announces UDP support for AWS PrivateLink and dual-stack Network Load Balancers

Unlocking Aurora DSQL with AWS Lambda: A Seamless Solution for Serverless, Scalable, and Event-Driven Architectures

Migrating AWS Organizations: How I Did It and Why