It's always amazing to me that AWS S3 hasn't offered a bulk download service from their console. For the systems I'm working on, downloading a set of files is a common activity.
Our original design for bulk downloading was setting up an EC2 machine with the aws cli installed, rabbitmq installed, and a small nodejs service. The node service would consume the messages off rabbitmq, use the aws cli to retrieve files from s3, zip them, and return them to an s3 location. This solution worked ok, but what if there was a way to do it more reliably and more cost effectively?
AWS Labmda and API Gateway to the rescue
We settled on the following strategy:
- AWS Lambda would be responsible for zipping
- AWS API Gateway using WebSockets would be used for triggering and managing the zipping process
- Upon completing creation of the .zip file, get a pre-signed url of the new .zip file and pass it back to the websocket client.
There are some terrific articles and examples that I'll list below, but we ended up using this example from s3-zip as our baseline.
To wrap the zip lambda function in a websocket we used the aws simple-websockets-chat-app as a blueprint.
Finally, to make deployment easier, we created the SAM yaml files for pushing our code to AWS Cloudformation.
S3 Zip Lambda Function
The payload to the Lamda function will be a set of files to zip and the destination instructions for the resulting .zip file
const payload = {
region: 'us-east-1',
bucket: 'demobucket', <-- all assets in same bucket
folder: 'audio/', <-- must have trailing slash
files: [
'file1.mp3', <-- files in above folder
'file2.mp3'
],
zipBucket: 'demobucket', <-- bucket for resulting .zip file
zipFolder: 'temp/', <-- must have trailing slash
zipFileName: 'demo1.zip',
signedUrlExpireSeconds: 60 * 60 * 10 <-- expiration time of signedUrl of s3object
}
API Gateway WebSocket
I've setup a simple API Gateway using CloudFormation
DemoDevS3ZipWebSocket:
Type: AWS::ApiGatewayV2::Api
Properties:
Name: DemoDevS3ZipWebSocket
ProtocolType: WEBSOCKET
RouteSelectionExpression: "$request.body.action"
Deployment:
Type: AWS::ApiGatewayV2::Deployment
DependsOn:
- ZipRoute
Properties:
ApiId: !Ref DemoDevS3ZipWebSocket
Stage:
Type: AWS::ApiGatewayV2::Stage
Properties:
StageName: demo
Description: Demo Deployment
DeploymentId: !Ref Deployment
ApiId: !Ref DemoDevS3ZipWebSocket
The important thing to note here is the RouteSelectionExpression value. The "action" parameter will trigger our "onzip" route.
To actually facilitate the zipping, we are using s3-zip .
When the 'onzip' route is invoked, the params are parsed from the event.body.params :
const params = JSON.parse(event.body).params
Do a quick validation to make sure that there are actual files:
if (!(files.length > 0)) {
return {
statusCode: 500,
body: JSON.stringify({
statusCode: 'error',
message: 'No files to zip'
})
}
}
In order to communicate back through the WebSocket, we needed to use the ApiGatewayManagementApi. The constructor requires the deployed WebSocket address as the endpoint:
const stage = event.requestContext.stage
const domainName = event.requestContext.domainName
// Allows the Lambda function to communicate with the websocket client
const api = new AWS.ApiGatewayManagementApi({
endpoint: 'https://' + domainName + '/' + stage
})
// event.requestContext.connectionId contains the Websocket id
const apiParams = {
ConnectionId: event.requestContext.connectionId,
Data: null
}
Once the ApiGatewayManagementApi object is available, we can communicate back through the WebSocket by calling:
apiParams.Data = 'some msg -or- JSON.stringify(obj)'
await api.postToConnection(apiParams).promise()
This call will trigger a WebSocket 'onmessage' event.
Next, the main try/catch block will invoke s3zip and AWS.S3.upload to stream the content into a zip file using the s3zip archive method. Using the archive method allows the process to bypass the need for downloading or storing the target files in any kind of temporary location.
try {
const body = s3Zip.archive({ region: region, bucket: bucket }, folder, files)
const zipKey = zipFolder + zipFileName
const zipParams = { params: { Bucket: zipBucket, Key: zipKey } }
const zipFile = new AWS.S3(zipParams)
const promise = new Promise((resolve, reject) => {
zipFile.upload({ Body: body })
.on('httpUploadProgress',
async function (evt) {
evt.statusCode = 'progress'
evt.pctComplete = (100 * evt.loaded / totalBytes)
apiParams.Data = JSON.stringify(evt)
// communicate with the Websockets, returning the pctComplete
await api.postToConnection(apiParams).promise()
})
.send(async function (e, r) {
if (e) {
e.statusCode = 'error'
reject(e)
} else {
r.statusCode = 'success'
r.Files = files
r.Folder = folder
r.SignedUrl = s3.getSignedUrl('getObject', {
Bucket: zipBucket,
Key: r.Key,
Expires: signedUrlExpireSeconds
})
resolve(r)
}
})
})
// wait for s3zip process to complete
const res = await promise
// message push back through websocket with zip results
apiParams.Data = JSON.stringify(res)
await api.postToConnection(apiParams).promise()
return {
statusCode: 200,
body: JSON.stringify(res)
}
} catch (e) {
// send error messages back to websocket client
apiParams.Data = JSON.stringify(e)
await api.postToConnection(apiParams).promise()
return {
statusCode: 500,
body: JSON.stringify(e)
}
}
As the upload to s3 happens, the 'httpUploadProgress' event from the s3 upload method will emit and return a value for event.loaded. We push that event object back through the WebSocket using await api.postToConnection(apiParams).promise()
. What's nice is that this allows our web frontend to give the user a real-time progress display on the zipping process.
So far, we haven't seen any issues. We've tested out this process on 5GB worth of target files and haven't seen any issue.
When the zip file is complete, the send()
method is invoked. In the send()
method we do a quick call to getSignedUrl()
:
r.SignedUrl = s3.getSignedUrl('getObject', {
Bucket: zipBucket,
Key: r.Key,
Expires: signedUrlExpireSeconds
})
On a successful zip, the signedUrl is generated, and the lambda function will send a message through the websocket. On the calling side, a simple parse can extract the signedUrl and trigger the browser to begin the physical download, as if the user clicked on the .zip file.
The javascript on the caller .html page looks something like this:
socket.onmessage = function (event) {
if (event.data) {
var data = JSON.parse(event.data)
// zip progress
if (data.statusCode === 'progress') {
var pct = Math.round(data.pctComplete)
progressElement.text('Processing ... ' + pct + '%')
}
// zip completed
if (data.statusCode === 'success') {
if (Object.prototype.hasOwnProperty.call(data, 'SignedUrl')) {
window.location.assign(data.SignedUrl)
} else {
console.error('api did not returned SignedUrl')
}
socket.close()
}
}
So that's it. This satisfied our requirement for having a Lambda based zip function with a progress meter using NodeJS
Have a look at the README in the source. It will give instructions for loading both the AWS Api Gateway Websocket and Lambda function into AWS from the command line using SAM and CloudFormation.
There is also a test script for connecting to the websocket and confirming that the SignedUrl for the zipped assets is being returned correctly.
Plus there is a simple web page with the javascript to show you how to connect to the websocket and invoke the 'onzip' route.
Thank you for reading!
The source code is available from github.
Top comments (0)