DEV Community

Should we consider migrating to Amazon EventBridge from Amazon SNS + SQS?

In early 2019, Betclic was in a corner, and we decided to migrate to microservices approach and migrate to the cloud.

The first step of this change was the implementation of an Event Driven Architecture (EDA).

Betclic presentation

Before speaking of architecture and EDA, it’s essential to know what Betclic is doing to understand our choice quickly.

Betclic is a French online gambling company. Betclic operates in Sports betting, horse race betting, and online poker.

The specificity of gambling is that we don’t have predictive traffic. We have high traffic peaks during significant events and weekends. We can see an example of traffic on a soccer event:

Example of traffic

EDA wishlist

When beginning 2019, we did a workshop to identify the key features of our ideal EDA; we would like something:

  • allowing multiple reads: the same message can be read by various services

  • with error recovery: in case of error, we would like to replay events

  • easy to monitor: particularly with Datadog, our centralized monitoring solution

  • with schema registry

  • language-agnostic: we use 3 main languages .net, *Kotlin, **and **Typescript *(for Lambda), and the chosen solution must work with all languages

  • with analytics capabilities: we would like to integrate all our events of EDA in our data lake

  • fully managed

  • with auto scaling: essential feature for us; we have unpredictable traffic with big spikes at the end of matches, so we need to have an EDA to be able to absorb our spikes

  • with low latency and high availability: also an essential feature, we would like our EDA to serve critical use cases for our final users, so we want something the most real-time and the most available possible

  • well-integrated with AWS services: EDA is one of our first bricks in the cloud, but we want something to help us to migrate to AWS

  • easy to use: we want all the time something easy to use :) but here it’s particularly true because EDA is a centralized solution, and the majority of our developers will use it (and back in 2019 majority of our developers wasn’t Cloud developer)

  • cost-effective: what is the cost-effectiveness?

2019: Amazon SNS + Amazon SQS: The best solution

Remember here; we were beginning 2019: Amazon EventBridge (announced in July 2019) or Amazon MQ for RabbitMq (announced in November 2020) weren’t out.

Following our needs, we decided to combine Amazon SNS with Amazon SQS.

Amazon SNS is essential in our solution because it helps us to do multiple reads and also helps us to centralize the solution.

Amazon SNS — Publisher

  • All events are published in a centralized Amazon SNS

  • Each functional domain has its own topic

  • Each functional domain can publish only on its own topic

  • Lambda/Fargate/On-Premise publish in the same manner

Amazon SNS + Amazon SQS — Consumer

  • A service consume only events of another functional domain

  • Each service has its own queue with its own filter

  • A filter is based on an Amazon SNS message attributes, and it’s limited to 10 attributes

  • Fargate/On-Premise service consume in the same manner

  • For Lambda, we prefer to create an Amazon SNS subscription of type Lambda with a filter and consume directly from Amazon SNS

Custom features

Following the wish list features listed above, the majority of features are taken into account in this architecture excepted:

  • schema registry

  • error recovery

  • analytics capabilities

We decided to make a custom implementation for these features.

Error recovery and analytics capabilities

  • Error recovery and analytics capabilities are implemented in the same architecture

  • All events are stored in our Data Lake with the help of Kinesis Data Firehose and Amazon S3

  • In case of an issue, we can replay events by extracting events from our data lake to Amazon S3. An object S3 triggers a Lambda and Lambda push events to correct topic in Amazon SNS.

  • We store events in a file in case of connectivity failure between our on-premise and AWS. Once connectivity comes back, events of the files are pushed to Amazon SNS on the correct topic with the help of FluentD

One feature that is missing for now on error recovery it’s the ability to replay events on a specific SQS queue. All the events are replayed to an SNS topic, so all events consumers service receive events again. **Idempotence** is necessary for consumer services.

Concerning schema registry, nothing is in production for now. We have just developed a portal with all the available events and a debug console to validate the event format in Stage environment.

Cost

  • 550 million events are published by month

  • The average size of an event is 630 B / event

  • 1.1 billion events are transferred for domain (average of 2 services consume an event)

  • 1.5 billion Amazon SQS requests are made to consume and delete events

Publication cost in Amazon SNS

  • 550 million events published = $ 275.50

Consumption cost in Amazon SNS + Amazon SQS

  • Amazon SNS: Data transfer = 1.1 billion events x 630 B = 693 Gb transferred ==> 693 x 0.09 (price by Gb out)= $ 63.78

  • Amazon SQS: Events sent = 1.1 billion events = $ 440

  • Amazon SQS: Receive + Delete = 1.5 billion requests = $ 600

Consumption cost = $ 1103.78

Error recovery + analytics capabilities

  • 550 million events published to Kinesis Data Firehose = $ 104.50

  • 346.5 Gb of data transferred from Amazon SNS to Kinesis Data Firehose = $ 31.09

  • Kinesis Data Firehose cost = $ 81.30

  • S3 cost with 1 month retention = 346.5 Gb of data = $ 7.97

Total cost = $ 1603.14

Lambda for replay isn’t estimated because we don’t know how many replays we do monthly.

Detail of the price is available in AWS calculator here.

2021: Amazon EventBridge ?

As we saw in 2019, our best choice for EDA was Amazon SNS + Amazon SQS. Is it still right in 2021?

A lot of interesting features have been out on Amazon EventBridge for 2 years:

According to this, all the features of our wishlist are natively supported.

We can imagine doing something like this for publishing events with Amazon EventBridge:

Amazon EventBridge — Publisher

  • All events are published in a centralized Amazon EventBridge

  • Each functional domain has its own Event Bus

  • Each functional domain can publish only in its own Event Bus

  • Lambda/Fargate/On-Premise publish in the same manner

  • Schema Registry is natively supported, Events are defined in Schema Registry, and developers generate code based on these events (Code bindings isn’t supported in the language that we use in Betclic)

  • Schema Discovery allows knowing which events are published to Amazon EventBridge and allows to identify events that don’t respect the contract

  • Schema Discovery and Schema Registry are fully managed

Schema Discovery is generally activated on-demand in Production.

For consuming events, something like that:

Amazon EventBridge — Consumer

  • A service consume only events of another functional domain

  • To send events to other accounts, we need to publish on a new Amazon EventBridge Event Bus

  • Events can be filtered on all the data of the events

  • Amazon EventBridge can’t publish directly to Fargate or On-Premise; we need to pass by an Amazon SQS queue to consume in this case

  • Each service has its own EventBridge Rule, its own queue with its own filter

  • Fargate/On-Premise service consume in the same manner

  • For Lambda, we consume directly from Amazon EventBridge through a rule

Error recovery and analytics capabilities

Amazon EventBridge — Error recovery and analytics capabilities

  • Error recovery and analytics capabilities are fully managed with Amazon EventBridge

  • All events are stored in our Data Lake with the help of Kinesis Data Firehose and Amazon S3

  • In case of issue, we can replay events directly from Amazon EventBridge

  • Possibility to replay data on a specific rule or all the rules

Cost

We have the following based on the same elements as Amazon SNS + SQS.

Publication cost in Amazon EventBridge

  • 550 million events published = $ 550

Consumption cost in Amazon EventBridge

  • Amazon EventBridge: Events transferred to others Event Bus= 1.1 billion events = $ 1100

  • Amazon SQS: Events sent = 1.1 billion events = $ 440

  • Amazon SQS: Receive + Delete = 1.5 billion requests = $ 600

Consumption cost = $ 2140

Error recovery + analytics capabilities

  • Kinesis Data Firehose cost = $ 81.30

  • S3 cost with 1 month retention = 346.5 Gb of data = $ 7.97

  • EventBridge archive for 3 months: = 346.5 Gb x 3 months = 1039 Gb x $ 0.11 = $ 114.29

Total cost = $ 2893.56

EventBridge replay and Schema Registry aren’t estimated because we don’t know how many replays/discoveries we will do monthly.

Detail of the price (without EventBridge archive) is available in AWS calculator here.

Conclusion

Amazon EventBridge has all the features that we want natively supported. No need to implement custom features with EventBridge.

However, cross-account EventBridge is more complicated than Amazon SNS because 2 rules need to be updated to consume a new event in a domain. In contrast, SNS need only the creation of 1 subscription with the correct filter.

It would have been nice to allow EventBridge to publish Events directly with Fargate and even better with On-Premise server. This feature is missing because we need to consume from Amazon SQS.

So in terms of features, Amazon EventBridge wins over Amazon SNS + SQS.

Concerning the price, it’s a little bit different.

The cost of the architecture with Amazon SNS + SQS, in our case, is $ 1603.14.

While with EventBridge, the cost of the same architecture is $ 2893.56.

80% of cost difference is huge. Is it the price of the missing features of Amazon SNS compared to EventBridge ?

In Betclic, the choice is quickly seen; we have already developed the missing features that we want, so we are not interested in migrating to Amazon EventBridge. But we follow the news of Amazon EventBridge carefully because it’s a constantly evolving service.

Top comments (3)

Collapse
 
jingxue profile image
Jing Xue • Edited

Interesting comparison. I was wondering if you looked into comparing the performances. I think SNS has higher throughput than EventBridge.

I haven't actually tried it but API Destination apparently allows you to send events from EB to any HTTP API endpoint, such as on-prem? aws.amazon.com/blogs/compute/using...

Collapse
 
guillan40 profile image
Guillaume LANNEBERE

No, I didn't have compare performance. But solution AWS SNS + SQS has been in production for 2 years, and performance is good (near real-time), and it's enough for our use case.

Concerning EB with an API Destination, the issue is that it's complicated to manage, and we start to disgrace from an EDA.
There is also an additional cost to use API Destination $ 0.20 / million.

Collapse
 
yourtechchick profile image
Simran Kaur

AWS has now released global endpoints in Eventbridge that support failover. SNS/SQS don't have that yet. What are your thoughts on it? Is it worth migrating to Eventbridge for that? Have you built something similar with SNS/SQS and has it been successful?