Every mid-sized project likely has experienced a point in time where its members were to decide whether or wether not to outsource common code into its own library artifact. The main reason for this is mostly to save on duplicated code and to DRY (don't repeat yourself). Sadly, I have seen this to backfire so dramatically in some projects, that duplicated code might have been the better solution. That is because:
DRY has one major tradeoff: it introduces dependencies.
Say you have identical code parts in Service A and B. These parts are completely independent. In case the requirements for the implementation in Service A changes, Service B can remain untouched. That is a pretty comfortable situation because it gives you great freedom in applying changes.
If you had moved the common code into its own library artifact and the requirements for service A are changing
- you update the library code and release it as new version
- you update the library dependency in Service A
In my experience, what happens next is either of the following:
- you forget to update the library dependency in Service B
- you update the library dependency in Service B to find out that your changes broke the implementation in B
- you update the library dependency in Service B to find out your latest changes work just fine in B. Unfortunately you broke an unrelated part, because the latest version of your library also contains earlier changes for which you forgot to update the dependency
- you update the library dependency in B and everything works as expected (yeah, sure)
These problems obviously get worse with every further Service that uses your library. You now need to update every client of your library, although you needed a change only affecting a single client. Or you choose to not actively update the dependencies in other services and perform a large migration later on (or never).
Before carelessly extracting common code, apply some thoughts to the following topics.
Selecting the scope
Before deciding to move common code into a library, you have to understand why you have the same code in different locations to begin with. It might be pure coincidence because of similar (but actually unrelated) business requirements in Service A and B. Business requirements are likely to change in a unforseen and incompatible way. So it is probably not the best idea trying to unify your business logic into a library.
On the other hand, if you have code that serves purely technical purposes like managing database connections, logging, serving REST endpoints and the alike, it can be really worthwhile considdering to refactor these stuff into a reusable library. If such a technical aspect changes, it is likely that these changes are beneficial for other users of this code as well.
When deciding to use a shared artifact for commonly used code, do not make the mistake of mixing different concerns into this artifact. Don't put your database connection management together with your logging code. There should only ever be a single reason of why you would apply a change to your library. This also mitigates bullet 3. from above as we are reducing the probability of breaking unrelated parts.
Don't start your project with writing a library that might later be used by multiple clients. You don't know the actual requirements yet and are darned to emphasize on the wrong parts. Simply put:
Managing change
If you extracted parts of your code base into a library, you have to plan for upcoming changes and how you want to handle them.
- Design your library code to be extensible from the "outside"
- Have a migration strategy in case you need to change the library
- Have a deprecation strategy to be able to remove code from the library in the long term
- Have a notification strategy to announce new releases
Designing for extensibility can be exhausting. You have to think about valid extension points and need to predict future requirements. There might always be some obvious interfaces that can easily be made customizable, but there will be the interfaces you didn't know you need until the actual requirement comes up. Chances are high that you invest a lot of time in planning ahead and to over engineer your code in order to be prepared for every possible change to come. Or, you don't, and end up applying changes and releasing new library versions every other day.
You have given up the freedom of freely applying changes to every part of your code base. Now, whenever you have to change even the smallest part in your library, you have to keep in mind that you might break code that depends on this part. According to Hyrums Law you not only might break, but you actually will break code if there are enough users of your library:
With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.
Without a deprecation strategy you end up carrying around old code for ages. If you have no way of informing clients for classes and methods that are no longer to be used, you will either end up maintaining this code for the lifetime of this library or you remove it without announcing and hit your clients unprepared when they migrate to the next version.
It now starts to feel more like building a product than doing the simple refactoring of moving code around. The more users your library has the more weight has to be put on those single topics. On the other hand, the less users your library has, the less is the effect of saved efforts. It is a constant tradeoff between maintaining duplicated code and maintaining changes to the library.
Summary
The overhead of managing a library can be intimidating, especially if you need to do it in parallel to an actual project. On the other hand, if your overall goal is to write, ship and maintain a library, these topics give you a good start at what to take care of.
I tried to emphasize the tradeoffs you have to consider when introducing shared code to your project.
So, Should you write a library?
As often in software development, there is no binary yes or no (which is kind of ironic, isn't it?) but only the good ol' it depends.
Top comments (2)
I just happened to read this post earlier today. It suggests WET (Write Everything Twice) as an alternative to DRY (Don't Repeat Yourself). One of the arguments for DRY is programmer laziness, but in fact, it is often much easier to copy and paste than factor out code into a library.
I think you give very good advice to know when a library is appropriate. I try to keep code the way it is until the duplication becomes a pain point. When I update a piece of code, and then find I have to make the same update in three other places (or create a bug by forgetting to update one of the places), then it's probably time to factor it out into a library.
It is good advice to keep the business logic duplicated. But there is still the question of how do you manage changes across duplicated code when appropriate.
I think this is where distributed version control can come in handy. It will handle the copy and allow you to distribute changes. Still need to manage that distribution.