Breaking up the monolith: Magic identifiers

#architecture #audit #monolith #software

A common pattern often found in software engineering is magic identifiers. These identifiers are used to quickly represent a concept in an abstract way. Rather than referring to some property bag of three fields, you might to choose to wrap that data in an specific enum. Let’s look into an example:

A simple vehicle

Let’s say we have the properties of a vehicle , it might support land-based travel , have stored battery power , and support multiple passengers. That’s great, it also helps to talk about different kind of vehicles and use them everywhere. Simple checks start to pop-up in your code base such as :

vehicle.supportedPassegersSize > 1

And then you move on, however, you might find that you constantly are checking all three:

vehicle.supportedPassegersSize > 1
  && vehicle.supportedPassegersSize < 4
  && vehicle.poweredBy == POWERED.BATTERY
  && vehicle.ridesOn.includes(SURFACES.LAND)

In one place that works well, but what if you have that same code in 3 places. The engineer in you wants to refactor that and you do. The right name for this object is:

function vehicleIsSizeOneToFourPoweredByBatteryAndRidesOnLand(vehicle)

However, because naming is hard we probably called this object verifier:

isCar(vehicle)

What’s in car type

Having verifier methods is great, but not everyone gets to that point, instead of having individual properties we may have just defined vehicleTypes

const vehicleTypes: {
  CAR, BUS, MOTORCYCLE, PLANE, TRAIN, LEGS
};

Now instead actually having to pass around a whole object and reference a global verifier method, we can just pass around the type of the vehicle, much easier and simpler.

This helps to reduce the complexity at every call site, almost…

The problem here is that, who is to say what a car is?

A car that isn't a car — Courtesy of https://www.flickr.com/photos/48817379@N03/7473259286/

Very quickly instead of checking the properties that we care about we start checking the type of the vehicle which spills everywhere. In code locations, classes, or services where you want to figure out how expensive something is you might see code like this:

function isExpensive(vehicleType) {
  return vehicleType != VEHICLE_TYPES.LEGS;
}

And whether there are passengers

function supportsPassengers(vehicleType) {
  return vehicleType == VEHICLE_TYPES.CAR
    || vehicleType == VEHICLE_TYPES.PLANE;
}

Now it’s impossible to determine what business logic exists that has anything to do with a specific vehicle type. You can’t reasonably look a service or a bit of code, and know, oh yes, of course calling supportsPassengers does the right thing. Because this bit of code has made assumptions about what a plane is:

A plane that isn't a plain — Model Airplane courtesy of Alibaba (60726019789)

While clearly this neither flies nor supports passengers, it is a plane.

The Admin Role

Another common case of this is in the authorization space. One frequent pattern too commonly used, and almost always incorrectly, is to verify a user’s authorization via an attribute in their access token (JWT) against the required role attribute on a method:

[@Roles(User, Admin)]
public async GetUserProfile(userId) {...}

This also works in a small space, a single service or in a monolith, where User and Admin might be well defined. As the architecture evolves and microservices are built, every service will start making its own decisions about what the Admin Role means and how it is utilized. In one service, the Admin Role might mean the software development team that built the service. While in another service it might mean the admin of the customer account.

Understanding what a user role actually grants access to now is impossibly difficult. In the authorization space, one solution for this is to instead on the service side to check explicit permissions, such as users-service:read:profile and give this permission to the user explicitly. Obviously granular access like this no longer fits in a JWT, which is why there are services like Authress to bridge this gap.

Tell me everything about…

If you want to know everything about a Car or everything about the Admin Role , there is just no good way to get this data. You could go to every service and search for an enum type and hopefully you did your search correctly. But even worse, you don’t even know when looking at one of these places if the check is correct. You just can’t reason about it. One way to improve this is to model Vehicle or Role at the service level. In every service that cares about vehicles, it could define Car or Admin, with a list of properties. Those properties could be checked within the service. So instead of everywhere checking for vehicle type, we are back to checking a list of properties matches our vehicle instance. This looks to be perfect for microservices configuration, but sucks for analytics, and more so, actually causes the incorrect creation of domain modeling for microservices.

Why is this a problem?

While we successfully decoupled the notion of what a Car is between services, as each service defines a Car itself, we actually haven’t decoupled the notion of what a vehicle is. A service that exists to keep track of sales of a product, doesn’t need to care about if that product is a vehicle. Storing an enum called Vehicle with a value of Car does a disservice to that microservice. While that service may have a first class notion of a Car, if it doesn’t then don’t create one. Further while every existing service may be happy with what a Car is, your new and shiny service is going to ask: what is a car anyway? The new team building out a new service in a different part of the company has no idea you have opinions about a Car is, and what that thing is. It’s also probably coupled to your cultural context as well. (But I’m not going to get into that here, but when you use enum types that grow in scope to outside of one service, you require everyone to not only have an opinion about that object, but also require their opinions are the same.)

Another way to look at this is, the product pricing service doesn’t want to know when someone is asking about the price of a car, it wants to know when someone is asking about the price of a vehicle that fits 10 people, goes on land, and is battery powered, fyi that is more like a Bus anyway) It should support taking in these first class attributes and ignoring generic types that don’t have relevant meaning.

So, where do I put the type definition?

It’s more of one of those things that is the first to go when migrating from a monolith to microservices, since you can no longer keep track of a huge list of properties even inferred by that identifier. Microservices by necessity want to have their own context.

So where does that leave us? In the type definition repository of course. This may be your vehicle repository in the case of vehicles (Someone probably cleverly called this the Garage ) or your authorization service in the case of roles (we called this Authress), and nothing else should care about vehicles or roles. Great solution. When you want to ask a question about a vehicle, convert your context to a map of properties that the specific service cares about and pass in that data. You get back an immutable object because it is based on a list of properties and not a Magic Identifier and you don’t even need to worry about an audit trail because you didn’t use an abstracted type.

Rewinding back to the monolith

This is a frequent problem caused by monolith architectures, and even if you don’t have one, if there is a single team/department saying we need you to tell me everything about customer X or we want all the data regarding t-shirt products, they aren’t thinking in autonomous teams (Spoiler alert: it is probably the marketing org). Conway’s Law makes it clear that if your teams aren’t autonomous, then they aren’t building microservices.

When you are building a monolith or a monoculture, enforcing the consistent opinion about the concrete concepts represented by your abstract ones is a reality you’ve not only become accustomed to but are actively encouraging and requiring, because differences of opinion and diversity won’t work for you.

Further since the notion of what a Car is can change over time, it becomes increasing common to have the question, Yeah but a Car at what time in history? Because the implication of being a Car actually changes with the business logic stored in different services. You can’t even audit this well, because it requires going back through the development history to see what it was, in what is likely many many repositories.

A call out on extremes

Note: It's worth a quick call out here that the other extreme is also a problem. Like most things, figuring out where to draw boundaries is important. Just as it's likely a problem to have one top level VehicleType enum that implies business logic everywhere, it's equally problematic to store the business logic of individual services and products in the Vehicle repository's entity object. You don't want to have properties in the repository entity that look like:

Car: {
  CanBeUsedToTransportPeopleToTheDepartmentStore: true,
  ShouldCostBetween10and100000Dollars: true,
  CrimeToBeUsedWhileIntoxicated: true
}

A clear separation of entity properties from business logic surrounding the entity as it applies in different circumstances is important. Specifically high level properties of the entity belong with the entity definition, but business logic and use of that entity in your application belong in the relevant product and services.

A property bag solution

Storing definitions of Vehicles in a vehicle repository and then using hypermedia urls to reference a vehicle is one way to avoid passing around a property bag of all the attributes of a vehicle. It’s easy and comes with history and auditing. While for vehicles this isn’t super important, for roles and authorization access it is critical and may even be tied to your compliance of regulations. While I don’t know any vehicle garage auditing service, there is an AWS partner authorization audit and SIEM integration available though AWS Event Bridge.

A Vehicle repository microservice, would be a simple service with CRUD RESTful endpoints. Your basic GET, PUT, PATCH, POST, DELETE, and anything you need, along with Collections, apply patches, get, search, and update are simple pass throughs to your database. You'll probably want some validation on the data via defined schemas, and you can add additional needs via annotations, sidecars, or libraries. So I won't go further into that. If there are questions about what a microservice looks like, there are plenty out there in the internet.
As for a detailed authorization service with full auditing, versioning, history, and as something that satisfies security compliance checks, well, you can check out Authress. It's got these and more, and does a much better job describing how it works through these challenges than I would be able to do here.

Join the Authress Community