Vincenzo Marragony

Posted on Apr 24, 2023 • Updated on Apr 30, 2023 • Originally published at Medium

Saving 30% on costs and improve infrastructure reliability with profiling

#profiling #node #troubleshooting #reliability

Scenario

As the technical co-founder of MyStoryViewer, a SaaS product that automated Instagram tasks for customers, I was responsible for full-stack web development and cloud engineering. However, as we began to scale and reached 600 concurrent workers, the product's performance suffered, and our customers began to complain.

If you want to read more about the underlying infrastructure, I've written this post about how I've managed to set-up workers.

Problem

The software responsible for processing customer data was a NodeJS (Typescript) application running concurrently on large servers.

To troubleshoot the problem of degraded performance, I profiled the application at runtime to determine whether there were any issues causing the degradation.

Solution

I started using ClinicJS Doctor to profile my software, and as shown in the results, there were issues with memory usage and event loop delays.

The problem was clear: something was using too much memory.

Since I was also the developer, I knew that a function I had written was retrieving data from the Instagram API. The API provided information about a profile, and I was storing all of the retrieved data in an array. Since I was processing 500 profiles for each iteration, I had megabytes of unnecessary data stored in memory. The solution was to only save relevant data from the API in memory.

Results

With a single fix, I managed to cut costs and stop receiving complaints from customers.

The workers were running in parallel on big servers, and with this improvement in memory usage, I could run more instances per server, reducing costs by around 30%. Additionally, since the workers stopped crashing, I was able to improve infrastructure reliability and customer satisfaction.

On a personal level, I discovered a new way to troubleshoot and improve performance issues.

Conclusion

This article is part of a series where I write about production use cases as a cloud engineer. The focus is on meeting business needs, working as a team, and personal growth.

If you find this article useful or want to share your thoughts about it, feel free to write a comment. I would be happy to read it.

As a side note, I'd like to add that this article was written with the help of AI.

See you in the next chapter!

DEV Community

Saving 30% on costs and improve infrastructure reliability with profiling

Scenario

Problem

Solution

Results

Conclusion

Top comments (0)

Read next

Top 10 Micro Frontend Anti-Patterns

Can you help with a quick survey? 30 secs top 🙏

HTML to PDF renderers: A simple comparison

Create a Lambda function that responds to CloudWatch Alarm actions