DEV Community

Cover image for Notification Service Design - with diagrams
Sahand Seifi for NotificationAPI

Posted on • Originally published at notificationapi.com

Notification Service Design - with diagrams

In our Ultimate Guide on Notification Services, we discussed if and when you should build a Notification Service. This article proposes a notification system architecture that you can use as your in-house notification system.

Motivation

Notifications are a common culprit for code smells, technical debt, and endless HTML/CSS templates lying around your code-base. They usually end up with no tests, no oversight or ownership, and cause pains such as repeated notifications or banned email accounts.

Objective

To design a notification service that can send product-to-user notifications across many channels at scale

Requirements:

  • Send API: Expose an authenticated end-point so we can trigger sending notifications from any back-end and microservice
  • Supported Channels: Support sending notifications to any channel that exposes an API, e.g., Email, SMS, Push
  • User Preferences: Allow users to pick their user preferences on each notification and channel
  • Respecting downstream service limits: Avoid getting throttled or suspended by your email or SMS service
  • Scalable: Allow horizontal scaling for (theoretically) unlimited scaling

High-Level Architecture

Notification Service Architecture Diagram

A Quick Overview

Let's imagine that your code should send a notification. The numbers below correspond with the numbers you see on the diagram.

  • Your code calls the POST /send endpoint. The request contains the userId of the recipient, the type of the notification, and the contents of the notification for every supported channel.
  • The /send end-point authenticates the request using OAuth2's Client Credentials Flow.
  • It then requests the user's notification preferences from the database. The preferences indicate whether the user is subscribed to a particular notification and channel or not.
  • It will then read the user attributes such as email address or phone number from the database.
  • This end-point will form a message object containing user attributes from step (4) along with the channels and content for each channel. However, it will exclude disabled channels based on step (3). Finally, the message is sent to a fan out service.
  • The fanout service is configured to broadcast incoming messages to job queues. However, there is filtering in place to ignore job queues related to channels that are not listed inside the message.
  • There is a job queue and processor per channel. The processor picks up the job and requests the appropriate service, e.g., a transactional email or SMS service.

Important Architecture Decisions:

POST /sent

  • You notice that the request to this end-point only contains the userId and not the email or phone number. This allows the services that send notifications have no knowledge of your users.
  • The end-point is behind a load balancer to ensure scalability.
  • The end-point is not protected with your regular user-facing authentication. Since the service that makes the request is a "program" itself, you need to use a different authentication mechanism known as the OAuth2 Client Credentials Flow used for server-to-server communication. Here are links to how to do this in Auth0 and Cognito.

Do we need a whole end-point for this?

There will be many parts of your application that will be issuing notifications. Implementing the send function as an end-point behind a load-balancer ensures that it is independently scalable and allows you to use it from virtually anywhere, e.g., from a new codebase or your build pipeline.

User Preferences

  • Use a highly scalable NoSQL or key/value pair database. Structure the records as: KEY:
  • sample_user_id:sample_notification_id, VALUE: [{channel: "email", state: true}, {channel: "sms", state: false}]‍
  • When the send end-point sees "false" values in the records, it will remove the related channel from the message sent to the fanout. If a record for a channel does not exist, it means the user has not explicitly set their preferences. In this case, you need to agree to a default.
  • The information in the user_preferences table is updated by the user through your UI, through a normal end-point protected by your common authentication mechanisms.

Why do we need user preferences?

Not allowing users to control their notification preferences will only frustrate them and force them to mark your notifications as spam or mute your notifications. This will further damage your user experience and mark your account for suspension by email or SMS delivery services.

Fan Out

  • Fanout takes a message and duplicates it to various places. They are cheap and highly scalable. In AWS, use SNS. In Google Cloud Platform, use Pub/Sub, and in Azure, use topics and subscriptions.
  • You can configure filtering between the fanout and job queues to avoid sending unnecessary messages to job queues of channels that have been excluded. For example, in AWS SNS, you can specify that the email job queue should only receive the fanout message if the message contains the "email" property inside the "channels" property.

Why do we need a fanout?

You could write code that puts the same message into the necessary job queues, but fanout is cheaper, and writing less code is good. Another benefit of fanout is being able to easily add/remove queues, thus allowing you to refactor and extend your channels.

Job Processing

  • Queues hold on to messages until your job processors process them. They are also cheap and highly scalable. Job processors are code that takes messages from the job queues and processes them. They can scale based on the number of messages in the queue.
  • In our case, the job processor should make an API call to the appropriate service to send out the notification through a transactional email service.
  • Most email, SMS, or similar delivery services have strict guidelines on the amount and quality of messages you send. You should also carefully review these and put proper systems in place. Here is our guide on how to prevent getting suspended on AWS SES.
  • You can configure a max number of job processors to avoid hitting the rate limits of the delivery services.

Why do we need job queues and processors?

‍Multiple reasons: A) External delivery services are generally slow. The queue mechanism allows you to process these dead jobs asynchronously from the rest of your code. B) The queue mechanism allows you to control the rate of your jobs, thus avoiding getting throttled. C) These external services could face outages. A job queue mechanism lets you decide what to do in cases of job failure without a single line of code, e.g., retry the job every 30 minutes for a maximum of 3 times.

Further Improvements

Here are a few things that are possible but we haven't covered. If you need any of these capabilities, read the next section.

  • The architecture of a scalable in-app notification service - they require their own APIs, tables, etc., so they deserve their own article
  • Removing notification contents from the code and instead allowing your product and design team to edit the notifications visually without code change
  • Dashboard for your team to enable/disable notifications or specific channels without code changes
  • Collecting and displaying open/click report

A Quicker Approach

There is so much work that goes into building a scalable notification service. That is precisely why we made NotificationAPI: a plug-and-play notification-as-a-service solution. It takes 5 minutes to set up. It is scalable and has every feature you can imagine for your notifications, such as a dashboard for your team to design and configure notifications visually without developer involvement.

At its core, all engineering comes down to making tradeoffs between the perfect and the workable.
-- Katie Hafner

Conclusion

In this article, we learned about the architecture of a scalable notification service. We used tools available in all the major cloud providers so that you can build your notifications based on this. Alternatively, you also learned of a notification-as-a-service product that could save you all the trouble. Software engineering, like any other engineering discipline, comes down to trade-offs. Perhaps the decision tree and table in our ultimate guide on notification services can help you figure out what trade-offs you make.

Discussion (0)