DEV Community

Cover image for Moving User Management from In-House to a Product: Why We Did It and What We Learned
Emil Pearce
Emil Pearce

Posted on

Moving User Management from In-House to a Product: Why We Did It and What We Learned

Table of Contents


This post was brought to you by Novu

GitHub logo novuhq / novu

Open-Source Notification Platform. Embeddable Notification Center, E-mail, Push and Slack Integrations.


NPM npm downloads MIT

The open-source notification infrastructure for developers

The ultimate service for managing multi-channel notifications with a single API


Explore the docs »

Report Bug · Request Feature · Join Our Discord · Roadmap · X · Notifications Directory

⭐️ Why Novu?

Novu provides a unified API that makes it simple to send notifications through multiple channels, including In-App, Push, Email, SMS, and Chat. With Novu, you can create custom workflows and define conditions for each channel, ensuring that your notifications are delivered in the most effective way possible.

✨ Features

  • 🌈 Single API for all messaging providers (In-App, Email, SMS, Push, Chat)
  • 💅 Fully managed GitOps Flow, deployed from your CI
  • 🔥 Define workflow and step validations with Zod or JSON Schema
  • 💌 React Email/Maizzle/MJML integrations
  • 🚀 Equipped with a CMS for advanced layouts and design management
  • 🛡 Debug and analyze multi-channel messages in a single dashboard
  • 📦 Embeddable notification center…

TL;DR

Novu implemented Clerk as a user management solution (Authentication infrastructure) that laid the ground for offering SAML Single Sign-On (SSO) functionality, Google and GitHub as OAuth providers, multi-factor authentication, Role-Based Account Control (RBAC), and more.

One developer named Adam implemented it, with a solid assist from platform engineer Denis.


minor change meme

Like most projects, this one started from the backlog. Not before dozens of customers asked for it, and heavily upvoted the request in our roadmap.

As a notification infrastructure solution, a part of our app and architecture involves managing users from the sign-ups, log-in, and session logs to enable users to invite team members to the organization and manage the access of each role.

It’s all about priorities. Novu’s core focus is to solve all things related to notifications and message management so our users won’t have to. Therefore, we spend our cycles on providing the best experience for building and managing notification workflows, empowering developers to protect their time, and smoothing out collaboration between product and marketing teams.
Effective user management doesn’t fall under that “core value” that we are dedicated to.

In the same way we expect you to offload the burden of engineering notifications to our expertise, we offloaded the burden of engineering effective user management to the expertise of Clerk.

Needless to say, our team built a great authentication and authorization infrastructure in-house from day one, based on custom-tailored and well-designed architecture.

As we level up, we focus even more on perfecting the notification development experience.

We expect developers and engineers to avoid reinventing the wheel and let Novu handle notifications, just as we choose to leverage proven, tested, and leading solutions for other aspects of our product: MongoDB for the database, Stripe for payments, and now Clerk for user management. We walk the talk.


Our primary objective

Create a secure and easy-to-use experience for our users.

When outlining the initial draft for this project, it may appear brief and straightforward, possibly even giving the impression that it could be completed over a weekend.

The initial draft checklist:

  • OAuth Provider (GitHub, Google)
  • SAML SSO
  • Secure session management
  • RBAC
  • Magic link auth

Note that if the initial draft hasn’t changed, then the project hasn’t received enough feedback and input. Naturally, the list got longer.

The actual checklist:

  • Sign up with user credentials
  • Sign up with OAuth providers (Github, Google)
  • Sign in with user credentials
  • Sign in with OAuth providers (Github, Google)
  • Sign in with SSO (SAML)
  • Sign in from Novu CLI
  • Sign in/up from Vercel Marketplace
  • Create organization
  • Organization management
  • User management (update user info, credentials, etc…)
  • MFA/2FA (OTP via sms/email, TOTP, passkey, biometric, etc.)
  • Invitations
  • RBAC: Two roles admin & editor
    • admin = The admin can access and interact with any page on the web platform (so, including team members and settings)
    • editor = The editor role remains the "main content manager" (aka product manager or marketing manager)

Research and evaluation

After identifying the project's scope, the next step is to conduct research and evaluate the resources needed to achieve the desired outcome.

This process includes:

  • Have a very clear understanding of the current state and each layer of the product:

    • Dependencies
    • Endpoints
    • Architecture
    • Client layer components and representation (frontend)
    • Tests

    And more.

  • Outline the migration specs (What remains in-house and should be obstructed away)

  • Backward compatibility

  • Try to find references to similar projects, from former colleague maybe, and learn from their process and recommendations

  • Try and find open-source solutions

  • Find if there are any vendors (3rd party solutions) and compare them.

And more.

In another blog post, we will explore how we evaluate and compare third-party solutions (or products) as a service/infrastructure as a service company.

Insufficient research or in-accurate evaluation usually leads to technical debt and future losses of resources, such as engineering time when adding additional features and maintenance, which require refactoring of the entire thing. So, search for the hidden costs of each option.

Experienced team leads know how to evaluate each option's return on investment (ROI), which helps them make the best decision for the business.

That’s exactly how we ended up with Clerk. Their solution covers most of our use cases and from the business point of view, the ROI on implementing them to manage the users and organizations layer make sense.


Implementation

Novu service contains a lot of micro-services and aspects like:

  • Notification Channels (SMS, Email, In-app, Push, Chat etc..),
  • Notification Orchestration (Cross-Device Sync, Digest Engine, Delay, Timezone Awareness etc..)
  • Notification Observability (Debugging, Insights etc.)
  • Notification Content Management (Editor, Branding, Layouts, Translations, Variable Management etc..)
  • End-user management (User Preferences, Subscribers, Topics, Segments, Subscription Management etc..)
  • Account Management (SSO, Role Based Access Control, Multi-Tenancy, Billing etc..)

The diagram below shows a simplified version of Novu’s API structure, focusing only on the authentication and authorization of Novu users and organizations before implementing Clerk.

flowchart 1

We use MongoDB to store all the data Novu requires, every user, organization, tenant, subscriber, topic… in short, everything.

Because Clerk have it’s own database to manage users, we needed to handle the migration and sync between the databases very carefully and precisely.


JWT strategy vs. Clerk strategy

One of the main things we needed to ensure is that UserSessionData object will not change to not break the session of the user when using Novu. It should remain compatible.

Here you can see the jwt.stratgy.ts file example:

//jwt.stratgy.ts
import type http from 'http';
import { ExtractJwt, Strategy } from 'passport-jwt';
import { PassportStrategy } from '@nestjs/passport';
import { Injectable, UnauthorizedException } from '@nestjs/common';
import { ApiAuthSchemeEnum, HttpRequestHeaderKeysEnum, UserSessionData } from '@novu/shared';
import { AuthService, Instrument } from '@novu/application-generic';
import { EnvironmentRepository } from '@novu/dal';

@Injectable()
export class JwtStrategy extends PassportStrategy(Strategy) {
  constructor(private readonly authService: AuthService, private environmentRepository: EnvironmentRepository) {
    super({
      jwtFromRequest: ExtractJwt.fromAuthHeaderAsBearerToken(),
      secretOrKey: process.env.JWT_SECRET,
      passReqToCallback: true,
    });
  }
  @Instrument()
  async validate(req: http.IncomingMessage, session: UserSessionData) {
    // Set the scheme to Bearer, meaning the user is authenticated via a JWT coming from Dashboard
    session.scheme = ApiAuthSchemeEnum.BEARER;

    const user = await this.authService.validateUser(session);
    if (!user) {
      throw new UnauthorizedException();
    }

    await this.resolveEnvironmentId(req, session);

    return session;
  }

  @Instrument()
  async resolveEnvironmentId(req: http.IncomingMessage, session: UserSessionData) {
    // Fetch the environmentId from the request header
    const environmentIdFromHeader =
      (req.headers[HttpRequestHeaderKeysEnum.NOVU_ENVIRONMENT_ID.toLowerCase()] as string) || '';

    /*
     * Ensure backwards compatibility with existing JWTs that contain environmentId
     * or cached SPA versions of Dashboard as there is no guarantee all current users
     * will have environmentId in localStorage instantly after the deployment.
     */
    const environmentIdFromLegacyAuthToken = session.environmentId;

    let currentEnvironmentId = '';

    if (environmentIdFromLegacyAuthToken) {
      currentEnvironmentId = environmentIdFromLegacyAuthToken;
    } else {
      const environments = await this.environmentRepository.findOrganizationEnvironments(session.organizationId);
      const environmentIds = environments.map((env) => env._id);
      const developmentEnvironmentId = environments.find((env) => env.name === 'Development')?._id || '';

      currentEnvironmentId = developmentEnvironmentId;

      if (environmentIds.includes(environmentIdFromHeader)) {
        currentEnvironmentId = environmentIdFromHeader;
      }
    }

    session.environmentId = currentEnvironmentId;
  }
}
Enter fullscreen mode Exit fullscreen mode

flowchart 2

To maintain compatibility with the rest of the app, we needed to transformed the JWT payload from Clerk to the previously existing JWT format.

This is how we have done it:

async validate(payload: ClerkJwtPayload): Promise<IJwtClaims> {
  const jwtClaims: IJwtClaims = {
    // first time its clerk_id, after sync its novu internal id
    _id: payload.externalId || payload._id,
    firstName: payload.firstName,
    lastName: payload.lastName,
    email: payload.email,
    profilePicture: payload.profilePicture,
    // first time its clerk id, after sync its novu internal id
    organizationId: payload.externalOrgId || payload.org_id,
    environmentId: payload.environmentId,
    roles: payload.org_role ? [payload.org_role.replace('org:', '')] : [],
    exp: payload.exp,
  };

  return jwtClaims;
}
Enter fullscreen mode Exit fullscreen mode

Here you can see the clerk.strategy.ts file example:

import type http from 'http';
import { Injectable } from '@nestjs/common';
import { PassportStrategy } from '@nestjs/passport';
import { ExtractJwt, Strategy } from 'passport-jwt';
import { passportJwtSecret } from 'jwks-rsa';
import {
  ApiAuthSchemeEnum,
  ClerkJwtPayload,
  HttpRequestHeaderKeysEnum,
  PassportStrategyEnum,
  UserSessionData,
} from '@novu/shared';
import { EnvironmentRepository, EnvironmentEntity } from '@novu/dal';
import { LinkEntitiesService } from '../services/link-entities.service';

@Injectable()
export class ClerkStrategy extends PassportStrategy(Strategy, PassportStrategyEnum.JWT_CLERK) {
  constructor(private environmentRepository: EnvironmentRepository, private linkEntitiesService: LinkEntitiesService) {
    super({
      // ...configuration details
    });
  }

  async validate(req: http.IncomingMessage, payload: ClerkJwtPayload) {
    const { internalUserId, internalOrgId } = await this.linkEntitiesService.linkInternalExternalEntities(req, payload);

    const session: UserSessionData = {
      _id: internalUserId,
      firstName: payload.firstName,
      lastName: payload.lastName,
      email: payload.email,
      profilePicture: payload.profilePicture,
      organizationId: internalOrgId,
      roles: payload.org_role ? [payload.org_role.replace('org:', '')] : [],
      exp: payload.exp,
      iss: payload.iss,
      scheme: ApiAuthSchemeEnum.BEARER,
      environmentId: undefined,
    };

    await this.resolveEnvironmentId(req, session);

    return session;
  }

  // Other functions...
}

Enter fullscreen mode Exit fullscreen mode

flowchart 3


Sync between Clerk and Novu

While the goal is to use ideally only Clerk for creating and retrieving users, organization etc, unfortunately its not fully possible due to the need to store and query some metadata about users and organizations in a performant manner.

Here is an example of a method in Novu’s organization repository:

  async findPartnerConfigurationDetails(organizationId: string, userId: string, configurationId: string) {
    const organizationIds = await this.getUsersMembersOrganizationIds(userId);

    return await this.find(
      {
        _id: { $in: organizationIds },
        'partnerConfigurations.configurationId': configurationId,
      },
      { 'partnerConfigurations.$': 1 }
    );
  }
Enter fullscreen mode Exit fullscreen mode

This method uses various MongoDB specific constructs to filter a document - this is not possible to reproduce using Clerk in a performant manner since thats not a database meant for such queries.

What we can do is to store these metadata about organization in our MongoDB organizations collection and link/sync the collection with the Clerk database using externalId.

Database flowchart

Now we can combine both Clerk and MongoDB to query the metadata if needed.

async findPartnerConfigurationDetails(
  organizationId: string,
  userId: string,
  configurationId: string
): Promise<OrganizationEntity[]> {
  const clerkOrganizations = await this.getUsersMembersOrganizations(userId);

  return await this.communityOrganizationRepository.find(
    {
      _id: { $in: clerkOrganizations.map((org) => org.id) },
      'partnerConfigurations.configurationId': configurationId,
    },
    { 'partnerConfigurations.$': 1 }
  );
}

private async getUsersMembersOrganizations(userId: string): Promise<Organization[]> {
  const userOrgMemberships = await this.clerkClient.users.getOrganizationMembershipList({
    userId,
  });

  return userOrgMemberships.data.map((membership) => membership.organization);
}
Enter fullscreen mode Exit fullscreen mode

By calling getUsersMembersOrganizations, findPartnerConfigurationDetails gets the necessary organization data to perform a filtered search on the communityOrganizationRepository, ensuring only relevant configurations are returned.

We need to sync only Users and Organizations between Clerk and Novu, the organization members doesn’t need to be synced.


Syncing users and organizations

There are two ways of how the database ids are synced:

  • middleware - any endpoint in API will sync the IDs if it detects that JWT doesn’t yet contain an internal ID.
  • webhook - as soon as the user/org is registered in Clerk, Clerk calls Novu’s API webhook, and we sync it.

Novu diagram

Here is the flow we had in mind:

  1. A user creates a new account via frontend using the Clerk component
  2. Gets a new JWT containing Clerk user-id
  3. Any request that hits the API triggers the syncing process (given it hasn’t yet happened)
  4. A new user is created in Novu’s MongoDB containing the Clerk’s externalId
  5. Clerk user object gets updated with Novu internal object id (saved as externalId in Clerk)
  6. The new token returned from Clerk now contains an externalId that is equal to Novu's internal user ID.
  7. In the Clerk strategy in validate() function on API - we set _id to equal to externalId so it is compatible with the rest of the app.

Note
In the application, we always expect Novu’s internal id on input and we always return internal id on output - its important for the application to work as is without major changes to the rest of the code.
API expects internal _id everywhere and it needs to be MongoDB ObjectID type, because it parses this user id back to ObjectID e.g. when creating new environment or any other entity which needs reference to user.

The same logic applies to organizations; just the endpoint is different.


What is stored in Clerk vs Novu

Users

For the users, we store everything in Clerk. All the properties are mostly just simple key/value pairs and we don’t need any advanced filtering on them, therefore they can be retrieved and updated directly in Clerk.

In internal MongoDB, we store just the user internal and external ids.

The original Novu user properties are stored in Clerk’s publicMetadata :

export type UserPublicMetadata = {
  profilePicture?: string | null;
  showOnBoardingTour?: number;
};
Enter fullscreen mode Exit fullscreen mode

There are also many other attributes coming from Clerk which can be set on the user.

Organizations

For the organizations, we store everything in Clerk except for apiServiceLevel, partnerConfigurations, and branding since they are “native” to Clerk and we update those attributes directly there via frontend components and so we don’t need to sync with our internal DB after we change organization name or logo via Clerk component.

screenshot 1


Injection of Enterprise Edition providers

The goal here was to replace the community (open source) implementation with Clerk while being minimally invasive to the application and to keep the Clerk implementation in a separate package.

This means we need to keep the changed providers (OrganizationRepository, AuthService…) on the same place with the same name so we don’t break the imports all over the place, but we need to change their body to be different based on feature flag.

The other option would be to change all of these providers in the 100+ of files and then import the EE(enterprise edition) package everywhere, which is probably not a good idea.

This turned out to be quite challenging due to the fact that users, organization and members are relatively deeply integrated to the application itself, referenced in a lot of places and they’re also tied to MongoDB specifics such as ObjectID or queries (create, update, findOne …).

The idea is to provide different implementation using NestJS dynamic custom providers where we are able to inject different class/service on compile time based on the enterprise feature flag.

This is the most promising solution we found while keeping the rest of the app mostly untouched, there are some drawbacks explained later.


AuthService & AuthModule - dynamic injection

flowchart 4

We have two implementations of AuthService - community and enterprise one (in private package), we inject one of those as AUTH_SERVICE provider.

We need to however have a common interface for both IAuthService

Since we also need to change the AuthModule, we initialize two different modules based on the feature flag like this:

function getModuleConfig(): ModuleMetadata {
  if (process.env.NOVU_ENTERPRISE === 'true') {
    return getEEModuleConfig();
  } else {
    return getCommunityAuthModuleConfig();
  }
}

@Global()
@Module(getModuleConfig())
export class AuthModule {
  public configure(consumer: MiddlewareConsumer) {
    if (process.env.NOVU_ENTERPRISE !== 'true') {
      configure(consumer);
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

The reason why the EEModule can be a standalone module in the @novu/ee-auth package which we would just import instead of the original AuthModule and instead we are initializing one module conditionally inside API, is that we are reusing some original providers in the EE one - e.g. ApiKeyStrategy , RolesGuard, EnvironmentGuard etc which resides directly in API.

We would need to import them in the @novu/ee-auth package which would require to export these things somewhere (probably in some shared package) and it introduces other issues like circular deps etc - it can be however refactored later.

Repositories - users, organizations, members

Same logic applies for the repositories. No module is being initialized here, they’re just directly injected to the original repository classes.

flowchart 5


Controllers

The controllers are being conditionally imported from inside @novu/api . The reason for that is the same as in the auth module, there are too many imports that the controllers uses, that we would either need to move to @novu/ee-auth or move them to a separate shared package - which would then trigger a much bigger change to the other unrelated parts of the app, which would increase the scope of this change.

function getControllers() {
  if (process.env.NOVU_ENTERPRISE === 'true') {
    return [EEOrganizationController];
  }

  return [OrganizationController];
}

@Module({
  controllers: [...getControllers()],
})
export class OrganizationModule implements NestModule {
    ...
}

Enter fullscreen mode Exit fullscreen mode

Issues with this approach

The main issue here is the need for common interface for both of the classes - community and enterprise. You want to remain compatible in both community and enterprise versions, so when there is a this.organizationService.getOrganizations() method being called in 50 places in the app - you need an enterprise equivalent with the same name otherwise you need to change 50 places to call something else.

This results in not-so-strict typing and methods without implementation

flowchart 6

We need to have a common interface for both, however the community one relies on MongoDB methods and needs different method arguments as the enterprise one which causes a use of any to forcefully fit both classes etc.
In some cases we don’t need the method at all, so we need to throw Not Implemented .


Endpoints modification

We modified the endpoints as follows:

  • AuthController: Removed and replaced by frontend calls to Clerk.
  • UserController: Removed, added a sync endpoint for Clerk users with MongoDB.
  • OrganizationController: Removed several endpoints, which can be migrated later.
  • InvitesController: Completely removed.
  • StorageModule: Completely removed.

Key points to consider and avoid

  1. Avoid Storing Frequently Changing Properties in JWT
    • Example: environmentID
    • It can be cumbersome to update these properties.
  2. Simplify Stored Data Structures
    • Avoid storing complex structures in user, organization, or member records.
    • Clerk performs optimally with simple key:value pairs, not arrays of objects.
  3. Implement a User/Organization Replication Mechanism
    • This helps bridge the gap during the migration period before Clerk is fully enabled.
    • Use MongoDB triggers to replicate newly created users and organizations to both Clerk and your internal database.
  4. Store Original Emails
    • Do not sanitize emails as Clerk uses the original email as a unique identifier.

Team Spotlight

Lead Engineer: Adam Chmara

Platform Team Lead: Denis Kralj


Summary

Our implementation approach comes to the fact that we offloaded the Users, Organizations and Members management to Clerk.

The data property injection to Novu’s Controllers (endpoints) layer, Business layer and data layer happens based on “Enterprise” feature flag validation.

We are leveraging pre-built Clerk components on the frontend and reducing the need to build and maintain our own custom implementation on the backend.

You can also observe below the diagram of the current state after implementing Clerk.

Novu flowchart 7


Hindsight Bonus Points

When we made a decision to implement Clerk for user management, we also opt-in for the long-term benefits of expending the capabilities and features Clerk will support and offer in the future.

Here are some examples of what we might consider supporting in the near future:

  • Fine-grained access control (FGAC)
    • Attribute-based: FGAC is often implemented using Attribute-Based Access Control (ABAC), where access decisions are based on various attributes of users, resources, and the environment. Attributes can include user role, department, resource type, time of day, and more.
    • Flexibility: FGAC offers greater flexibility and granularity by allowing detailed, condition-based access controls. This means permissions can be fine-tuned to very specific scenarios.
    • Dynamic: FGAC can adapt dynamically to changes in the environment, such as time-sensitive access or location-based restrictions.
    • Detailed permissions: Permissions in FGAC are more specific and can be tailored to individual actions, users, or situations.

Provide this level of detailed flexibility to Novu, might have been out of the scope or even scraped at the gate because of the potential complexity of implementation.

  • User impersonation

    Our costumer success or support teams could use this to troubleshoot issues, provide support, or test functionality from the perspective of the impersonated user without needing to know their password or other authentication details.

    • Reduces the time and complexity involved in diagnosing and resolving user issues.
    • Ensures that the support or administrative actions taken are accurate since the support staff can see and interact with the system exactly as the user would.
    • Enhances user satisfaction by providing faster and more effective support.

Simply put, we will be able to improve the experience for Novu users with ease given that the fact that now, the authentication infrastructure is lied out for it.


If you would like to suggest additional features involving AuthN (or any other), visit our Roadmap to review and upvote requests, or submit your idea.


Liked what you read? Hit follow for more updates and drop a comment below. I’d ❤️ to hear your 💭

Top comments (2)

Collapse
 
sawyerwolfe profile image
Sawyer Wolfe

Could you elaborate a bit more on the challenges you faced with backward compatibility during the migration to Clerk? It would be interesting to see a deeper dive into how this affected the deployment process. Great insights overall!

Collapse
 
justnems profile image
Justin Nemmers

Hey Sawyer--

The biggest challenge was related to latency. While components are excellent (and we're huge believers in them), API-over-HTTP frequently means greater latency. As we're all likely painfully aware, unplanned or unpredictable latency can have some pretty large ramifications. This bit us a bit and required some logic changes across our app.

Next, we encountered operational challenges due to the fact that our database was no longer the source of truth for users. We keep a local cached copy, but it's just that--only a cache.