EVERYONE IN THE CITY, EVERYWHERE IN 15 MINUTES.
That's our motto at Check Technologies, a shared mobility operator in The Netherlands where users can rent e-mopeds, e-kickscooters or e-cars. When founding the company, Check decided to hire a team of engineers to build a custom platform, as opposed to using an off-the-shelf SaaS product.
This team of just 6 engineers are responsible for not only building, maintaining and improving the Check application used by over 800K users today, but also building upon internal tooling, performing data analyses and taking care of hosting of the platform.
From launching back in February 2020 until now, the company has seen significant growth in users, trips and vehicles.
Date | Amount of Vehicles |
---|---|
1 Jan 2021 | 1170 |
1 Jan 2022 | 3146 |
1 Jan 2023 | 8160 |
With this new blog, we (Check's engineering team) would like to share some of the technical challenges we had to overcome, the solutions we came up with, and the insights we've gained along the way. Expect write-ups from different engineers within the team who will share their thoughts on topics related to their domain, such as app development, cloud infrastructure and data engineering.
First up: How having one million API requests an hour pointed us into building a Rust microservice that processes fleet updates
A microservice, why?
Up until the start of 2022, the Check backend was hosted as an Elastic Beanstalk web application. Even though this AWS service proved to be reliable for getting us off the ground initially, we had run into its limits multiple times. Getting the right autoscaling configuration was rough, the costs were growing month after month and most importantly: it's not made for hosting a microservice infrastructure.
By making the move to Kubernetes starting that year, we paved the way for building smaller applications that can run (and scale) independently. Microservices, as you'd call it.
Webhooks
Over 1 million requests every hour
It all started years back, during a moment of celebration. We reached the impressive number of 1 million requests an hour. A moment worth cheering, yet also a moment in which we discovered something remarkable. We analysed the distribution of these requests, and concluded that over 60% of these requests were webhooks.
Webhooks
At Check, users can rent different types of vehicles in our app. During the time of this project, we had integrated the mopeds NIU N1S and Segway E110S, as well as the kickscooter Segway Ninebot MAX. These providers have both developed APIs for executing commands on their vehicles (turning it on and off) and receiving information about their vehicles (location, mileage, battery percentage). Our backend exposed an API route, that these providers used to POST vehicle's information to, in the form of a webhook.
For processing a moped's location, this API route processed it as follows;
- Receive updates (Eg: moped [x] is now at [coordinates])
- Store raw location and time in the database
- Update the corresponding vehicle's location in the database
- Send a successful response
Even though this request was set up as small as possible, it still took our backend around 250ms to process these requests. We had around 5000 vehicles back then, so with each vehicle sending us an update every 5 seconds when turned on, it took our backend quite some time to process these updates, all while having to process app users requests as well.
Third-party bombs
Our platform heavily depends on this integration for processing a provider's constant stream of vehicle updates. Even though this integration worked flawlessly most of the time, every once in a while one of the providers would have a small hiccup on their side. These hiccups not only resulted in not receiving their vehicle's updates for a few minutes, but it also meant that we were about to receive something that we internally referred to as 'a bomb' -> a big batch of vehicle updates containing everything that happened during one of these hiccups. In short: we would sometimes get half an hour of vehicle updates worth within a few seconds.
Depending on how big they were, these 'bombs' were notorious for causing instability within our platform. Our backend was unable to process both the user traffic and all these vehicle updates at the same time.
Off to better things
Longing for a situation where user and fleet update traffic would no longer be processed by the same service, and given that over 60% of incoming traffic during peak hours were webhooks, we decided that this would be the perfect chance for putting our new Kubernetes infrastructure to the test, and so we started building our first microservice.
Rust
Due to the sheer volume and relative simplicity of these webhook requests, a clear input (webhook) and output (200 OK status-code), we decided to build a proof of concept using Rust. Rust is a low-level language, primarily known for being strongly type, memory safe and offering great performance.
The stack
The proof-of-concept was built using the following Cargo crates:
rocket
serde
tokio
postgres
redis
The project compiles into two separate binaries, one for the API service, and one for the consumer service.
Fleet webhook API service
The first component of our Rust microservice is the 'Fleet webhook API'. This service exposes a rocket API layer with an endpoint for each provider to send their vehicle updates to.
Once this service receives a webhook, it inserts the raw body to a Redis queue and immediately responds with a '200 OK'. By not having to read or write to the database during this request, we were able to shave off more than 10x the response time. These little requests now only take a max of 25ms!
Fleet consumer service
The second component of our Rust microservice is the 'Fleet consumer'. This binary is connected to the same Redis queue, and is responsible for actually processing the updates.
It updates the moped in the application's database and stores a raw entry of it to a TimeScale database (a PostgreSQL database specifically designed to handle large sets of event data).
Separate binaries
The great thing about this set up is that we're able to independently scale both of these components. Because the consumers that process the updates are doing most of the heavy lifting, we usually run around three times as many Kubernetes Pods of them, as opposed to the webhook API.
New situation
Dealing with bombs
This now means that user traffic, as well as back-office traffic, is handled independently from fleet update traffic. When a third party has a hiccup, resulting in loads of fleet updates to process at once, our users will not experience any latency in their apps because even though the microservice will be busy processing these updates, the main API is still sailing smoothly.
Extensibility
This microservice was built prior to when Check released e-cars on its platform. However, when integrating e-cars into the platform using Invers' Cloudboxx, we were able to swiftly implement their AMQP functionality to process live information about our cars, proving the extensibility of the service.
Independent scaling
We're able to scale our main API independently from this fleet update microservice. With 60% of our traffic being fleet updates, we were able to significantly downscale our main API. Additionally, Rust's focus on performance and minimal resource usage allowed us to reduce costs in the meantime.
Conclusions
At Check Technologies, our engineering efforts go beyond just adding new features for our users; we actively strive to enhance efficiency, scalability and resilience of the platform. By transitioning to Kubernetes and developing our first microservice in Rust, we were able to overcome the challenges associated with a high volume of webhook requests, ensuring a smooth experience for our users.
The adoption of a microservices architecture, in combination with our first Rust-based solution, has revolutionised the way we process fleet updates. The 'Fleet webhook API' and 'Fleet consumer services', operating as independent components, enable independent scaling, reducing latency and enhancing overall system stability. We have effectively mitigated the impact of 'third-party bombs,' allowing our main API to sail smoothly even during peak traffic hours.
As we look back on the progress achieved in our technology stack, we are enthusiastic about the opportunities that lie ahead. At Check Technologies, we are dedicated to raising the bar, finding innovative solutions, and ensuring that our platform continues to be at the forefront of shared mobility technology. The journey has been challenging, but the success of our fleet microservice marks a high note, laying the foundation for sustained growth and ongoing technological advancements in the field of shared mobility.
Top comments (3)
Rust performance is awesome, but the extra safety and great tooling make it my go-to default today.
I would advise switching to Axum from rocket , or loco (Axum based). It seems rocket team doesn’t maintain it at the same level.
Congratulations for this success. You did it.
I also experienced similar challenge with my Whatsapp AI Assistant.
It all started as a small openai assistant that i built for the fun. Then i showed it to some people arround me. The integration into whatsapp was like waouh to many people of my country Burkina Faso and the neighbourhood countries of west Africa. In only one day i got more than 200k requests in an hour. This has nothing to do with your webhook api requests load but it was significant to me.
Late in that night, i have split my script into 3 Node.js/Total.js microservices (The Whatsapp integration side, the api side and a total.js flow instance for integrating) and of course Redis.
Basically, Total.js flow along with TMS was Pup/Sub broker that distributes data via websockets.
The first microservice that is connected to whatsapp is producer when user ask questions to the assistant and consumer when assistant replies to questions.
The api microservice plays the inverted role.
Total.js flow is in between to queue and distribute data.
Redis was used to cache user questions history for the OpenAI API.
Thanks to cluster capabilities of Total.js i was able to auto scale all the 3 main microservices.
Today i added more microservices for storing files, printing files to pdf/docx for users, and many more and i have up to 300-400k requests per hour.
For now i am don't use Rust. I learned Rust for machine learning 2 years ago and i know it is such an amazing language. I can't wait to put it in one of my future big projects.
It was a pleasure reading how you challenged. Looking forward to follow you and learn about future challenges. Wish you guys, all of you from Check Technology all the best and happy coding!
Managing one million API requests an hour is no small feat. The decision to build a Rust microservice for processing fleet updates is both strategic and forward-thinking. Your team's dedication to custom solutions gta 3 download pc has clearly fueled the impressive growth in users, trips, and vehicles. Kudos to Check Technologies for this remarkable achievement.
Some comments have been hidden by the post's author - find out more