DEV Community

Cover image for S3 Multi-Part Upload: Part 2 Conclusion
Mitansh Gor for Distinction Dev

Posted on • Updated on

S3 Multi-Part Upload: Part 2 Conclusion

Image description

Howdy, my tech-savvy pals! 🌟 Remember our last rendezvous? We chatted about the multipart upload basics - the whole shebang! Today, get ready to roll up your sleeves because we're plunging into the deep end! πŸ’¦ We're talking all about those tricky low-level hurdles you might encounter while playing around with multipart stuff using Node.js + the Serverless framework. But hey, fear not! We're a dynamic duo, and together, we'll smash these challenges and soar to victory! πŸ’ͺπŸš€

🌟 Remember our chat about multipart upload (Part 1)? It's like a three-step dance for your data! First, you start the upload party. Then, you groove through uploading the object parts. Finally, when all the parts are in place, you wrap up the multipart upload! πŸš€πŸ“¦

Image description

But before we take the plunge into each step's deep end, let's first secure our multipart permission set. Ready, set, go! 🏁

# serverless.yml
provider:
    ....
    iamRoleStatements:
            ......
        - Effect: Allow
          Action:
            - s3:GetObject
            - s3:PutObject
            - s3:AbortMultipartUpload
          Resource:
            - arn:aws:s3:::${Bucket}/*
        - Effect: Allow
          Action:
            - s3:ListBucketMultipartUploads
          Resource:
            - arn:aws:s3:::${Bucket}
Enter fullscreen mode Exit fullscreen mode

πŸ› οΈ "PERMISSIONS ADDED!" πŸ› οΈ

Now that the permissions are in check, nothing's holding us back from diving deep into each thrilling step of the multipart saga! πŸ’ͺ🌊


Multipart Upload Initiation

Image description

Let's dive right into the exciting world of Multipart Upload Initiation with this code snippet! This magical piece of code helps us kickstart the multipart upload process for an S3 bucket and key. Remember, it's crucial to save the response (upload ID) this function gives us. This upload ID is like the secret key to the opened stream, and we'll need it for the rest of our multipart adventure! πŸ—οΈβœ¨

/**
 * Create a multipart upload for a given S3 bucket and key.
 *
 * @param {string} bucket - The S3 bucket name.
 * @param {string} key - The S3 object key.
 * @returns {Promise<string>} The upload ID for the multipart upload.
 * @throws {Error} If there's an issue with the multipart upload creation.
 */
const createMultipartUpload = async (bucket, key) => {
  if (typeof bucket !== 'string' || !bucket.trim()) {
    throw new Error('Invalid bucket name. Please provide a valid string value for the bucket.')
  }

  if (typeof key !== 'string' || !key.trim()) {
    throw new Error('Invalid object key. Please provide a valid string value for the key.')
  }

  try {
    const params = {
      Bucket: bucket,
      Key: key
    }

    // Use S3's createMultipartUpload with promise()
    const data = await S3.createMultipartUpload(params).promise()

    return data.UploadId
  } catch (error) {
    throw new Error(`Error creating multipart upload: ${error.message}`)
  }
}
Enter fullscreen mode Exit fullscreen mode

Uploading Parts - The 'Part-y' Begins

Image description

Uploading chunks or parts might seem like a breeze, but here's the catch: the multipart upload has its own set of rules. One big rule? Chunk sizes must be greater than 5GB. But sometimes, we might encounter situations where certain chunks/parts are less than 5MB. πŸ€”πŸ˜…

Handling these scenarios and validating them is crucial. Imagine, our multipart adventure encountering these unexpected smaller chunks - we need to be ready to address them! πŸ’‘πŸ“¦πŸ’»

To tackle this challenge, we're diving into 'chunk mode'. Picture this: we've got an array of files fetched with their sizes from S3. Now, we're slicing and dicing this array into chunks, making sure each chunk has at least 5MiB of data. πŸŽ²βœ‚οΈ
Check out the magic of chunkifying:

Image description

// Below is the chunk result of the list of metadata of files fetched from s3.
chunkifyArray = [
  [
    {
      fileName: 'A.csv',
      ContentLength: 1728033, // 1.64798069 MiB
    },
    {
      fileName: 'B.csv',
      ContentLength: 53326970, // 50.856561661 MiB
    },
  ],
  {
    fileName: 'C.csv',
    ContentLength: 21646619, // 20.643824577 MiB
  },
  [
    {
      fileName: 'D.csv',
      ContentLength: 1728033, // 1.64798069 MiB
    },
    {
      fileName: 'E.csv',
      ContentLength: 5226970, // 4.984827042 MiB
    },
  ],
]
Enter fullscreen mode Exit fullscreen mode

See how we cleverly grouped those files into chunks? It's like solving a puzzle! πŸ§©πŸ’» This way, we ensure each chunk meets our 5MB rule.

So, picture this: files A and B are like two peas in a pod, individually less than 5 MiB each, but when they team up, they surpass that mark. Meanwhile, file C is a lone ranger, confidently exceeding the 5 MiB mark all by itself. Then we've got files D and E, best buddies, also teaming up to go beyond that 5 MiB limit.

This clever strategy ensures our chunks are just the right size for this multipart upload adventure! πŸš€πŸ”

Now, for files A and B, we're planning a little readStream party! πŸŽ‰πŸ“š We'll grab the records from both files, blend them into one mighty string, and that fusion will become the uploadable part. Think of it as a superhero team-up! πŸ’ͺπŸ¦Έβ€β™‚οΈ The same goes for the dynamic duo, files D and E.

But hey, file C is a solo act. We'll simply read its data and smoothly upload it via stream. πŸŒŸπŸ’Ύ

Imagine this snippet as our trusty guide to converting an array of files' metadata into chunks. Buckle up, we're diving into some code magic! βœ¨πŸš€


/**
 * Converts an array of file metadata into chunks based on a size threshold.
 *
 * @param {Object[]} data - Array of file metadata objects.
 * @param {number} THRESHOLDLIMIT_5MB - Size threshold for chunking.
 * @returns {Object[]} Array of chunks with grouped file metadata.
 */
const convertToChunks = async (data, THRESHOLDLIMIT_5MB) => {
  const chunkifyArray = [];
  let totalSize = 0;

  data.map((data) => {
    if (data.ContentLength) {
      totalSize += data.ContentLength;

      if (chunkifyArray.length === 0) {
        if (data.ContentLength >= THRESHOLDLIMIT_5MB) {
          chunkifyArray.push(data);
        } else {
          const obj = {
            content: [data],
            size: data.ContentLength,
          };
          chunkifyArray.push(obj);
        }
      } else {
        const currentRec = chunkifyArray[chunkifyArray.length - 1];

        if (
          currentRec.size === undefined ||
          (currentRec.size !== undefined && currentRec.size > THRESHOLDLIMIT_5MB)
        ) {
          if (data.ContentLength >= THRESHOLDLIMIT_5MB) {
            chunkifyArray.push(data);
          } else {
            const obj = {
              content: [data],
              size: data.ContentLength,
            };
            chunkifyArray.push(obj);
          }
        } else {
          // push into existing element of chunkifyArray
          chunkifyArray[chunkifyArray.length - 1].content.push(data);
          chunkifyArray[chunkifyArray.length - 1].size += data.ContentLength;
        }
      }
    }
  });

  return { chunkifyArray, totalSize } ;
};
Enter fullscreen mode Exit fullscreen mode

Now that we've prepared our readStream data from individual files, it's time for the grand finale: uploading each chunk or part to our multipart stream. Enter our superhero function, uploadMultiPartHelper! πŸ’ͺπŸ“€

/**
 * Uploads a part of a multipart upload to an S3 bucket.
 *
 * @param {Buffer | Uint8Array | string} body - The content of the part to upload.
 * @param {string} bucket - The name of the S3 bucket.
 * @param {string} key - The key (path) where the part will be stored in the bucket.
 * @param {number} partNumber - The part number for the multipart upload.
 * @param {string} uploadId - The ID of the multipart upload.
 * @returns {object} - The ETag and partNumber of the uploaded part.
 * @throws {Error} - If any validation or upload error occurs.
 */
const uploadMultiPartHelper = async (body, bucket, key, partNumber, uploadId) => {
  try {
    const params = {
      Body: body,
      Bucket: bucket,
      Key: key,
      PartNumber: partNumber,
      UploadId: uploadId
    }
    const data = await S3.uploadPart(params).promise()
    return {
      ETag: data.ETag,
      PartNumber: partNumber
    }
  } catch (error) {
    throw new Error(`Upload failed: ${error.message}`)
  }
}

Enter fullscreen mode Exit fullscreen mode

With this uploadMultiPartHelper function ready to roll, our multipart upload strategy is almost complete! πŸŽ‰ But wait, there's a twist in the tale! What if the total size of all our files doesn't exceed the 5 MiB mark? πŸ€” Let's tackle that scenario head-on with another code snippet:

/**
 * Validates the scenario where the total size of all files doesn't exceed 5 MiB.
 *
 * @param {Object[]} data - Array of file metadata objects.
 * @param {number} totalSize - The total size of all files.
 * @param {number} THRESHOLDLIMIT_5MB - Size threshold for validation.
 * @returns {boolean} Indicates whether the total size is under the threshold.
 */
const chunkDataWriteIntoStream = async (initalArrayOfFileMetaData, THRESHOLDLIMIT_5MB ) => {
  const { chunkifyArray, totalSize } = await convertToChunks( initalArrayOfFileMetaData, THRESHOLDLIMIT_5MB );
  if (totalSize < THRESHOLDLIMIT_5MB) {
    // If we're under the 5 MiB mark,
    // let's manually handle the upload using s3.upload() by combining all files into one.
  } else {
    // As the total size is bigger than 5MB, handle using multipart chunks upload...

    const uploadId = await createMultipartUpload(bucket, key);
        const respArr = []
    for (let i = 0; i < chunkifyArray.length; i++) {
      const partNumber = i + 1;
      const body = fetchCombinedRecordsFromMultipleFileObjects(chunkifyArray[i]); // create this function to fetch data from chunks.      
            // push the chunkifyArray[i] record into multipart stream having uploadID and partnumber 'i'.
            const uploadResponse = await uploadMultiPartHelper( body, bucket, key, partNumber, uploadId );
        respArr.push(uploadResponse)
        }
        return respArr 
        // respArr = [
        //    { 
      //      ETag: '',
      //      PartNumber: 0
      //    }
        //  ]
  }
};
Enter fullscreen mode Exit fullscreen mode

Turbocharging Processing Speed

Image description

Hey, so you know that loop thing (for (let i = 0; i < chunkifyArray.length; i++) {) we've got above, running through our chunks? πŸ”„ In tech terms, it's kinda like a slowpoke when we're in a rush, especially with time limits like the 15-minute cap we've got in Lambda functions. ⏳

But guess what? We've got a secret recipe to speed things up! 🌟✨

Ingredient 1: Let's chop our chunkifyArray into smaller bits and use the power of promises to run those bits all at the same time! πŸŽ‰πŸ”ͺπŸš€ Imagine it like a well-coordinated dance where multiple chunks perform their tasks simultaneously.
But in the world of Lambda functions, there's a limit on how much they can handle within that 15-minute timeframe. Based on some real-world testing and tinkering, Lambdas typically handle around 95 to 100 MB of files within that time span. πŸ•’πŸ“

Now, imagine this: what if we've got larger files than that? πŸ€―πŸ“¦ That's where Lambda might start feeling a bit overwhelmed, like trying to fit an elephant through a mouse hole! πŸ˜πŸ•³οΈ

Ingredient 2: Now, here's the cherry on top! Instead of relying on Lambda's time constraints, let's switch gears and implement this in Step Functions. It's like upgrading to a turbocharged engine for processing! πŸŽοΈπŸ’¨ By using Step Functions' Map state and iterating through each chunk in parallel, we're hitting the fast lane! And guess what? To turbo-boost the speed even more, we can set a MaxConcurrencysetting while configuring the Map step! πŸŒͺ️πŸ”₯🌐

Image description

With this added perspective, we're preparing a recipe for success that considers all the ingredients and ensures we handle any file size without breaking a sweat! πŸŒŸπŸš€πŸ”


The Grande Finale - Completing the Multipart Upload

Image description

After uploading every relevant part, it's showtime! We call in the big guns with the "Complete Multipart Upload" action. πŸŽ‰

Here's where the magic happens: Amazon S3 takes all those parts, arranges them in ascending order by part number, and voilà! 🎩✨ A brand new object is born! 🌟🧬 It's like assembling the Avengers - each part plays a vital role in creating the ultimate superhero object!

But wait, there's a catch! πŸ€”πŸ“ Your proposed upload should be larger than the minimum allowed object size. Each part has to be at least 5 MiB in size, except for the very last part. It's like ensuring each puzzle piece is big enough, fitting the puzzle guidelines! πŸ§©πŸ“

Now, let's dive into this helper function, the secret sauce that makes all of this possible! 🍝✨

/**
 * Completes a multipart upload to an S3 bucket and returns the uploaded object's location.
 *
 * @param {string} bucket - The name of the S3 bucket.
 * @param {string} key - The key or path for the uploaded object.
 * @param {Array<{ ETag: string, PartNumber: number }>} partArray - An array of parts with ETag and PartNumber.
 * @param {string} uploadId - The unique upload identifier for the multipart upload.
 * @returns {Promise<string>} A Promise that resolves to the uploaded object's location.
 * @throws {Error} Throws an error if the upload fails.
 */
const completeMultiPartUpload = async (bucket, key, partArray, uploadId) => {
  try {
    const params = {
      Bucket: bucket,
      Key: key,
      MultipartUpload: {
        Parts: partArray
      },
      UploadId: uploadId
    }

    const data = await S3.completeMultipartUpload(params).promise()
    return data.Location
  } catch (error) {
    throw new Error(error.message)
  }
}
Enter fullscreen mode Exit fullscreen mode

Aborting Multipart Uploads: When Things Go Awry

Image description

Ever wondered what happens if an error sneaks into the multipart process? Money talks, and in this case, it's about those unwanted charges! πŸ’°πŸ’Έ

Image description

Here's the deal: if an error occurs between the multipart processes, the multipart stream remains open, and that means the billing continues. Yikes! 😱 It's like a stage curtain that should be closed after the show - it's gotta be done for the costs to stop! 🎭🚫

That's where the magic of aborting the multipart stream comes into play! 🌟✨

Image description

So, let's dive into the superhero function, the abortMultiPartHelper! This function performs the crucial task of aborting a multipart upload in an S3 bucket. It's like the emergency exit button for our multipart process! πŸš€πŸ›‘

/**
 * Aborts a multipart upload in an S3 bucket.
 *
 * @param {string} bucket - The S3 bucket name.
 * @param {string} key - The S3 object key.
 * @param {string} uploadId - The upload ID of the multipart upload.
 * @returns {Promise<Object>} A promise that resolves with the response from the S3 service.
 * @throws {Error} If any validation fails or an error occurs during the operation.
 */
const abortMultiPartHelper = async (bucket, key, uploadId) => {
  // Validation checks for bucket, key, and uploadId
  // It's like checking the keys before opening the treasure chest! πŸ”‘πŸ’°

  try {
    const params = {
      Bucket: bucket,
      Key: key,
      UploadId: uploadId
    }
    const data = await S3.abortMultipartUpload(params).promise()
    return data
  } catch (error) {
    throw new Error(`Error during abortMultiPartHelper: ${error.message}`)
  }
}
Enter fullscreen mode Exit fullscreen mode

Remember, this function helps in preventing those unwanted charges by stopping the multipart upload in its tracks! πŸ›‘πŸ’Ό It's the safety net we need backstage to ensure everything runs smoothly. πŸŒŸπŸ”§

Best Practices: Aborting Multipart Streams Safely

let's talk about some genius moves to avoid those unexpected wallet withdrawals! πŸ’°πŸ’Έ

Imagine this scenario: an open stream that's silently siphoning money from your pocket - not cool, right? As savvy backend devs, its always recommended to create two fantastic Lambda functions that act like financial guards when working with multipart! πŸ¦Έβ€β™‚οΈπŸ”’
1️⃣ The Specific Stream Terminator: This Lambda function is your go-to buddy! It's like having a specific key to shut down any particular multipart stream gone rogue! πŸ—οΈπŸ›‘

/**
 * Aborts a specific multipart stream based on the provided uploadId.
 *
 * @param {string} bucket - The S3 bucket name.
 * @param {string} key - The S3 object key.
 * @param {string} uploadId - The unique ID of the multipart stream to be aborted.
 * @returns {Promise<Object>} Resolves with the response from S3.
 * @throws {Error} If any validation fails or an error occurs during the process.
 */
const abortSpecificStream = async (bucket, key, uploadId) => {
  try {
    const params = {
      Bucket: bucket,
      Key: key,
      UploadId: uploadId
    }
    const data = await S3.abortMultipartUpload(params).promise()
    return data 
  } catch (error) {
    throw new Error(`Error during abortSpecificStream: ${error.message}`)
  }
}
Enter fullscreen mode Exit fullscreen mode

2️⃣ The Stream Terminator Deluxe: This Lambda function is your ultimate guardian! It's designed to sweep through and close any open multipart streams from the past. πŸŒͺοΈπŸ”’


/**
 * Lists in-progress multipart uploads on a specific bucket.
 *
 * @param {string} bucket - The S3 bucket name.
 * @returns {Promise<Object>} A promise that resolves with the response containing in-progress multipart uploads.
 * @throws {Error} If any validation fails or an error occurs during the operation.
 */
const listMultiPartUploads = async (bucket) => {
  try {
    const params = {
      Bucket: bucket
    };
    const data = await S3.listMultipartUploads(params).promise();
    return data;
  } catch (error) {
    throw new Error(`Error during listMultiPartUploads: ${error.message}`);
  }
};
/**
 * Aborts all open multipart uploads for a given S3 bucket.
 *
 * @param {string} bucket - The S3 bucket name.
 * @returns {Promise<Object[]>} A promise resolving to an array containing information about aborted uploads.
 * @throws {Error} Throws an error if the operation encounters any issues.
 */
module.exports.abortMultiPart = async (bucket) => {
  try {
    // Fetches information about open multipart uploads
    const data = await listMultiPartUploads(bucket);
    const output = [];

    // Iterates through each open upload and aborts it
    for (const obj of data) {
      const key = obj.Key;
      const uploadId = obj.UploadId;
      // Aborts the multipart upload
      const response = await abortMultiPartHelper(bucket, key, uploadId);
      // Records the abort response for the upload
      output.push(response);
    }

    return output; // Returns an array containing information about aborted uploads
  } catch (e) {
    throw new Error(e.message); // Throws an error if any issues occur during the operation
  }
}
Enter fullscreen mode Exit fullscreen mode

That's the secret sauce! With these Lambda heroes on our side, we're safeguarding against any unwanted ongoing expenses. πŸŒŸπŸ’Ό Now, that's smart backend development! πŸ”§πŸ‘¨β€πŸ’»


We've reached the finish line, folks! πŸπŸŽ‰ Across these two blogs, we've dived deep into everything about multipart uploadβ€”theory, practice, highs, lows, you name it! Hope you had a blast and picked up some cool new tricks along the way! πŸš€πŸ“š
Dear, data trailblazers! πŸš€πŸ“Š Thanks a million for joining this thrilling data adventure! πŸ›‘οΈπŸŽΈ As we navigate through the digital realm, remember to stay safe, keep chasing those data dreams, and always seek out new knowledge! 🌟✨

See you down the data highway, fellow voyagers! Until next timeβ€”farewell! πŸ‘‹πŸŒŸ

Image description


References :

Top comments (0)