Alexander Kogan

Posted on Apr 1, 2020

Data Processing With Lambda

#aws #lambda #serverless #deployment

What is the most valuable part of an app? Of course a high uptime is important. A good UX and intuitive interface benefits the user. Fast response time saves money, because as we all know: “Time is money”. But all of this has no value without data. Data is the reason why the customer uses your app. Data is the backbone that your app operates on.

So it’s no surprise that you want the data to be correct and up-to-date. That is why you spend time thinking about where to get the data. How to convert the data. How to make sure that it’s the latest information you can get.

Processing data can be a bothersome task. Not everybody is ready to sink his teeth into it. Can we automate these tasks without giving away too much control over our data pipeline?

To solve this dilemma we set off to discover AWS Lambda for our needs. In the next chapters I would like to present you some of our experiences.

Note: Most comparisons are heavily influenced by us using AWS and Kubernetes.

Event Driven Approach

A data pipeline is a perfect fit for a Lambda function. You have an input and convert it to an output. Furthermore usually those inputs are independent and idempotent. So you can break them down into single events and process them in parallel. The processing is the only part that the developer needs to worry about, since the surrounding services integrate really well.

Take the triggers for example. There are a lot of ways to provide an input for the Lambda functions. Just plug and play.

Event driven with queues and notifications
File driven with S3 file storage or database triggers
Incident driven by logs and alerts
More exotic triggers like AWS IoT button or Alexa intents
and some more…

Even setting up the pipeline is such a fun experience, you might be sad, when it’s already over. You don’t have to worry about polling and retry mechanisms. Separate out faulty events that you cannot process with a simple setting. Lambda takes care of this out-of-the-box.

Instant Profit

There are some points that strike as instant improvements when using Lambda instead of our classical approach with a dedicated instance. I won’t go into much detail with these to get them out of the way quickly.

No configuration of instances — Bundle up your code and the dependencies in a zip file and it’s good to go.
Automatic scaling — If there is more input, you get more parallel executions without additional configuration.
No input synchronisation/distribution — AWS takes care of who gets which message and retries or clears messages from the input queue.
Cost-efficient — Instead of stopping instances, when they are not needed, we create instances quickly, when they are needed. This reduces idle time, that you don’t have to pay for anymore.
Logging, Monitoring, Alerting — Lambda automatically collects metrics and logs during execution and provides them in Cloudwatch. Based on these you can create alerts and debug the function. Or you create dashboards to see the data flow through your pipeline.

Testing

With all the setup and fuss gone, the only thing left to test is the business logic itself. This is done in fast unit tests. Integration tests are then used to check the small touching points with the database or other services down the pipe. This leads to understandable and concise tests.

Debugging

If you want to debug a function directly inside AWS, the cycle time is still quite fast. Just upload an updated zip file and execute a prepared test statement to see how the function reacts.

If you write your function in Python you can even edit the code in the AWS web console. This is useful for debugging it in place, but it’s unfortunately not possible for other languages. Because of this I consider writing more functions in Python.

Automated Deployment

Typically deploying an importer consists of creating a docker container and running it in a Kubernetes cluster. But since deploying containers is not an option anymore, we have to find an automated and monitored way without much overhead. Automated; monitored; without much overhead — these strike a chord. The way to deploy Lambda functions, we decided, is of course by Lambda itself.

When a new version of a function is deployed, it is hot-swapped by a triggered deploy function. A nice side-effect: If you want pre-deploy hooks executed, trigger them in-between with little effort. That’s what we use for the database migration for example.

Try it Yourself

Our journey is definitely not over yet. There are a lot of things still to discover about Lambda. But the experience already gained definitely wet our appetite. Don’t be afraid to give AWS Lambda a try yourself.

Postscriptum

I wrote this article at the end of 2018, this is more than one year ago. Since then we had more time with AWS Lambda and serverless computing. Planning to write an article about our further journey, I decided to also publish this article. Because of that, the information might be a bit outdated. (You can actually edit small nodeJS functions in the web console too.)

Stay tuned for the next part.

This post was sponsored by itemis AG.

DEV Community

Data Processing With Lambda

Event Driven Approach

Instant Profit

Testing

Debugging

Automated Deployment

Try it Yourself

Postscriptum

Top comments (0)

Read next

AWS EC2 Instances purchasing options

Building Scalable E-commerce Order Processing Microservice with a Serverless Fan-Out Architecture

Unlocking Data Privacy: SageMaker Teams Up with Secure Multi-Party Computation for AI Advancements

The AWS Hub ☁️ Your new AWS learning resources