Darius Juodokas

Posted on Apr 20, 2021

Should I use a library for that?

#dependencies #design #hell #security

Libraries and frameworks are here to ease our lives. They bundle up tons of logic and present us a remote control with very few buttons and knobs. A button says "blowItUp" and it blows the whole thing up. A button says "get500BitCoin" and it gets you 500 bitcoins. Well not exactly like that, but the point still stands. Libraries and frameworks know how to do THIS or THAT and they do their jobs well. So you don't have to worry about all the nuts and bolts that you may need to get the job done. Libraries and frameworks have thought of all of that for you. Just use them!

Libraries and frameworks are great!

I must admit, the idea of libraries is brilliant: someone had a problem,figured out a way to solve it and shared the solution for others to reuse to solve the same (or similar) problems. Let's see what we love them for.

Quick & easy solution

If you have a problem you have to solve, it might take hours, days or even weeks to solve it yourself. Ever looked at the implementation of the TCP library? There's a lot going on. Imagine if you needed to implement it yourself. That would take several months and still the implementation would be buggy.

What you can do instead, is to look for a library that solves that particular solution and add it to your project in minutes. And that's it - your problem is solved. In minutes, rather than hours, days, weeks or months.

Now off to the other problem (i.e. library lookup)

Less boilerplate

... and less code to support. This means you are not responsible for the code in the libraries and you don't have to support it. If something doesn't work - just report it as a bug and let maintainers to worry about how to solve it. If you feel like it, you can probably propose your own solution and earn some karma points (and the great feeling of knowing that other projects will be running your solution). With libraries, you go fast. The TTM is short. And you develop less silly bugs. Not to mention you have to write less :)

Not reinventing the wheel

Most of the problems you're running into have been already solved by someone else. Probably many, many times. If these folks solved those problems, then why should you repeat what they did? You can simply take their solution and use it in your project. There's also a good chance that by reinventing the wheel you'll introduce bugs you haven't thought of, and your solution will be less efficient. Let's not waste our time and use what's already out there.

They are professionals - they can do it better than I can

Libraries and frameworks are often times developed by experienced professionals. If the maintainer is a professional, we expect him/her to know a great deal about the topic and only use the best practices in the solution of the problem. We want to learn from the best, we want to use what's the best, so we simply like to use what these PROs decided is the best. In many cases it's a no-brainer.

Standartized usage

This is one of the key aspects that people like libraries for. If I use a library for the job that is popular in the industry, whoever comes after me will most likely know what we're talking about. They will know how to use it, they will know the caveats and possible points for improvements. This is the benefit of using any popular tool, not just libraries, not just frameworks, not just... software. It's easier to maintain the project as a whole when features follow some well-known standards or even are publicly documented and used by others elsewhere.

However...

This blog post wouldn't exist if there wasn't the "however" part. While it seems that libraries solve all your problems and might even be solid building blocks to the complete application you're after (you only need to arrange them properly et voila - the project is done), there are things to consider before diving into the dependency world (or hell).

Foreign code

Any code you take in as a part of your project should be considered potentially harmful. It's very easy to modify your pom.xml or gradle.properties and add that one line adding a library that solves the problem for you! However, your build tools are downloading that library from external sources which you have no control over. Who wrote that library? Did they inject a backdoor in it? Will the authors hack my application if I use their library? I don't know - the library code is far too complex for me to digest and catch all the possible trojans (if any). And it's a tremendous waste of time! It would take weeks or even months... Do I want to spend this much time on that?

Do you think I'm overly paranoid? Sure you do! Have you seen how many attempts there were to inject backdoors into the Linux kernel? :) If you had, you would be too. And the Linux kernel is no different than any other open-source project in this sense!

Even if the maintainers/authors of the library had no intentions to harm you, they might be installing backdoor without their own knowledge. Have you heard of the very recent SolarWinds hack? Boy did that cause a mess... EVERYWHERE! Government, Homeland Security, treasury, energetics, corporations (think: Microsoft),... everywhere! This hack started a long time ago - someone injected a backdoor in the codebase - and all these companies used the affected SolarWinds tools. As a results, all these companies willingly (though unknowingly) installed a backdoor in their systems, granting an attacker a wide spectrum of accesses to huge amounts of services and data. Which could have been exploited for further hacking... SolarWinds is just an example, illustrating that things like this are happening at various scales. If Ubuntu maintainers managed to expose their repository passwords, who says your library maintainers haven't shared them somewhere by accident?

Even if the vendors manage to protect their codebase, they usually distribute their products as compiled bundles. For convenience. And that's all fine. What should trigger your red flags is that you almost never download those bundles from the vendor. You download them from another 3rd-party, which specializes in storing those bundles, e.g. Maven repository, Artifactory, DockerHub, etc. That's another segment in the chain that can be potentially accessed illegally and tinkered with. Should any unauthorized party get access to such hubs, they could replace those bundles with their own versions of the library (most likely a modified original library with a backdoor injected). Checksums make that a tad more difficult, but it's not impossible to bypass them too. So that's one more attack vector where somebody could be spinning up a backdoor in your application without you or vendor or maintainers having a clue. If you once again think I'm too paranoid, you should know that these things happen. Repositories get hacked and binaries get replaced.

I cannot find a reference now, but I recall reading in one blog about a guy who carried out an experiment: he added a potential backdoor (a stub of a backdoor - it will only call-home instead of connecting to CnC for further instructions) in his open-source node library. It sat there for months (or years?) and no one ever noticed it. He collected statistics after a while and summarized that he could have easily leveraged that backdoor for his own benefit to exploit millions (or was it billions?) of different projects. How many companies is that? :)

Whenever you are adding a 3rd-party library, consider it as a potential threat. It most likely is harmless, but there are ways it could become your and your employer's worst nightmare.

Version upgrade

You probably have plenty of libraries in your project. As time goes (and your library versions don't change), it's very likely your project will have more known library-related bugs:

security
functionality
performance Naturally, you'd like to upgrade your library to a newer version,hoping those bugs have been resolved. It's easy - just update the version number in the pom.xml and reload the project! Or is it...

Most likely the library will have its contracts changed. Now your code doesn't compile, because some library classes got moved/renamed, others have methods with different signatures, some others got deprecated and removed or their fields/methods deprecated and removed! Not to mention cases where the general usage changes. It's hell! Now you have to scan your WHOLE code and look for spots where the usage of the library no longer compiles. Or even worse - it compiles, but it's used wrong in the context of this new version! You have to adapt your code to the new version of the library. So... you wanted to resolve a single issue (or at lease see whether it was resolved in this new version), but now your code no longer compiles, not to mention the correctness of its work. This is often the case with large frameworks, like Liferay or Spring or Hibernate (in Java's terms). How long will it take you to test if that bug was fixed? If it wasn't, you'll have wasted all this time for nothing. Does that sound right to you?

Deprecated

If you think version upgrade causes a mess, I've got a better treat for you! Suppose there is a zero-day security vulnerability revealed in a library you are using, but that library is not maintained for the last... 8 years. Noone has forked it, no one has owned it - it's simply dead. Now you either have to live with that 0-day vulnerability (unacceptable), or patch the library code (have fun!), or replace the deprecated library with an alternative.

Patching is tedious, because it is not your code, you don't know it well and you are likely to introduce other bugs with your patch (assuming your patch fixed the 0-day properly in the first place!). Patching also means, that you will keep on living with a severely outdated and dead code, that no one looks after any more - no one but you. So instead of developing one project you now have two. One patch after another and eventually you'll have rewritten large portions of the foreign code. You might even say you have recreated that library anew. Which is more expensive than creating a new library, because (a) you had to learn the foreign code and (b) you had to fix it iteratively, i.e. not damaging the rest of the code.

Replacing the library is also not the most tempting idea, because an alternative library will have different contract, different classes, different methods, and, most likely, different behaviour. If you could use your old library by invoking a single static method, now you might have temporal coupling in place, requiring you to prep the library, initiate something, persist something and then call something on something. This, like a library version upgrade, introduces a massive scan of the code and lots of code replacements, sometimes even refactoring or change of flows. Might also require new infrastructure units. Now, once again, you have to adapt your code to the library.

It no longer fits my needs perfectly

You have adopted some framework because it promised to tick all the checkboxes in the project requirements' sheet. And it offered an amazingly fast TTM (TimeToMarket) by covering most of the code you'd need to write! Amazing!

2 years later you find yourself in a position with dozens of framework entities/services extended and overridden, plenty of nasty hacks to keep all the overrides in order. Adding a new feature probably introduces yet another hack to adapt your code to the framework. The problem with hacks is that they have a tendency to introduce unexpected bugs, which are a pain to debug. Another problem is that the project becomes barely maintainable: estimates are loo-ooo-ooong (and yet many of them are too short), and more often than before you close the feature requests with "Won't do: not possible" closure code. You may even have introduced a dedicated Jira label: "Not Possible"!

A sane thing to do would be to eradicate the framework (or parts of the framework that are riddled with hacks), but that means you will have to invest lots of time to reinvent the wheel - it's still going to be round, but it will fit your carriage better than the wheel you have now. Who's going to pay for such a months-worth investment? Only once in a blue moon, a client cares for those details and understandingly agrees to pay for such maintenance.

On the other hand, would it perhaps be cheaper and faster to rewrite the whole project? And quite often people do choose this option. The framework is rooted SO deeply in the project, that it seems faster to rewrite the whole thing than to sort out all the hacks and remove the framework. Boy is that expensive..

I need one more feature

You have adopted some library to do the job for you. And it works miracles! However, months later, another business request comes in - make a feature [that uses THAT library] also do THIS. Uh-oh... But that library doesn't do that. You browse all the docs, forums, blog posts looking for instructions that would tell you how to do something at least close to what you need -- nil. You get nothing. Now you either have to extend that library and implement the feature you want, or you have to find another tool for the job. A good recent example I came across is caching. The project used EHCache for in-memory caching. The application needed plenty of data cached, but it only needed it for short periods of time - several seconds. After that, the data is no longer useful. Even worse - the expired data used up memory that was needed for other jobs. So you either boost your RAM by several gigs because you'll need them for cached objects for several seconds per hour (because expired TTL does not mean data is removed from RAM immediately), OR you limit your cache size risking many of the objects won't fit in it, causing slow response times. You would think EHCache has some way to enable a cleaning job that scans the cache periodically and evicts expired entries... But it doesn't (there are projects that extend EHCache and introduce that feature, but that's yet another library!).

What are your options now? Either augment the library or replace it with a more feature-rich library that covers your requirements... until next time. And switching libraries, as you have already read, is not always that easy!

There is a bug! But how do I smash it?

Suppose you have a large framework (e.g. Liferay) you are building your app on. It's great, it works as expected, or even better! Time for a security audit! Auditors scan your application and find severe security problems. You fix most of them, but you struggle with the rest because they are the framework's bugs. You fix whatever can be fixed by summoning the power of manuals, blogs and forums, or even support (if you have a subscription). But what about the rest - the ones, where the support says "you'll need to upgrade the framework to a newer major version to have this fixed"? To those who have used Liferay, this option is a clear no-go, because it's easier to rewrite the whole thing than to upgrade the Liferay's version. You're stuck. It's probably time to introduce some kind of reflection-based hack that patches the security bug, hoping it doesn't open another one.

And if you report the bug to the vendor and the vendor says: "thank you, this will be fixed in the next release... in the next 8 months". What do you do all those months? Sit ducks and hope no one finds that flaw in the exploit-db and doesn't exploit it in your system? Here come the hacky patches! And the poor maintainability that comes with them! Even if you managed to live long enough without getting hacked with your patch and the new version got released, now you either forgot that you wanted to upgrade it (the client definitely has! He won't want to bring this back up as a possible expense), or you're now in the library upgrade hell.

Code I don't need

If you're into embedded or mobile development, you might be familiar with the problem of too many classes. You don't even need libraries to run into this problem - just use Dagger2 extensively, and it will generate you loooots of classes - more than you'd like. Which causes compilation/packaging or deployment problems.
But even if you don't use Dagger2, or don't develop for Android, bear in mind, that you usually invite a foreign library to your codebase to help you out with problem P, while the library is designed to solve problems E, R, G, Y, O, J, B, B1, B2, B6, M, etc. Naturally, you bring far more code into your project than you actually need. Alternatives don't solve your problem completely, so you prefer this 120MB library for a solution, that actually is no more than 5MB. This is a great way to explode your PermGen (or MetaSpace) with stuff you don't need and have more OutOfMemoryErrors. It's also a nice way to bloat your application with excessive dependencies, excessive code. And make your deployments (and applications) slower.

When it comes to security, the rule of thumb is: "don't have stuff you don't need". This situation clearly violates the rule. Now you have a lot more moving (and potentially harmful) parts in your code. Even worse, if you had to make changes to adapt your code to run well with the library code you don't use (might be the case with Spring's beans).

Indirect libraries

Even if you are using libraries from trusted sources and only libraries you truly need, bear in mind, that these libraries/frameworks most likely depend on some libraries themselves. As a result, your innocently looking library introduces even more code in your code than you thought. Definitely the case in npm-related development. The dependency graphs are enormous!

Not only the indirect libraries are a potential threat to your project, but they might also introduce compatibility issues: if the framework only works with an older version of some utility library AND you want to use a newer version of that utility library, in the end (at least in java's jars) only one version will be used. Which one? Noone knows. Something is definitely going to break.

Should I use a library then?

KISS

The right answer is it depends. Don't be a library whore (like here). Also, it doesn't pay enough to spend nights reinventing the wheel on and on. Find the middle ground. If possible, set some guidelines in the project: when are you going to introduce a library, and when are you going to implement the thing yourself. I hear you gasping at "implement yourself" :) But that is a legitimate approach to consider. If you need an in-memory caching with an active TTL, after which entries are removed from the memory - it takes up to an hour to decorate a HashMap with synchronized put() methods and a thread that scans the map every minute and removes all the entries that have their TTL expired. You don't need a fancy library for that. And your custom implementation is no worse than any in-memory caching implementation out there. Remember: KISS!

I like to live by this rule: "If it takes me up to 2 hours to implement the solution, I'll implement it myself rather than use a library". The reasoning is simple: if I have to adopt some library, it will take me far more than 2 hours to

carry out the market analysis (what libraries are out there? Which one fits my case best?)
add the library to my code
configure and use it right (means reading the docs)
suffer from all of the points (and work around them) in the "However" section above.

If I eventually need the library, I can turn my solution into an adapter (see: Adapter pattern) for that library, without changing the signatures of my methods - a perfect decorator (see: Decorator pattern), which means I don't really need to change anything else in my code.

Native abstraction of foreign code

And the above brings me to the practice that I've come to like the most. This practice solves many of the problems, regardless of whether I write my own solution or use a library. And while it doesn't solve the rest of the problems, it makes their mitigation easier and very non-invasive to the project.

Whenever you are introducing a library to your code, write an abstraction for it. If you want to introduce iText (a PDF generation utility), write an interface that has a PdfFile toPdf(DocumentToConvert doc); method. Implement both the data structures and implement that method - make it use the iText library for the job. And your code should NOT be using iText directly. Another example is JSON serialization. There are 2 major players out there (with others not far behind): Jackson and GSON. Instead of calling them directly, hide them behind an interface (contract layer)

public interface JSONSerializer {
    T fromJson(String json, Class<T> type) throws SerializationException;
    String toJson(Object pojo) throws SerializationException;
}

and write an implementation that uses either Jackson or GSON (or any other library).

This way you decouple foreign code from your product code. As a result, your code becomes

more testable
less dependent on the actual libraries
less fragile (doesn't use library features you can live without)
easier to maintain (extendable)
more up-to-date, because you can upgrade/swap any library without a sweat (assuming you have created SOLID abstractions)

What I like the most about it, I can swap out libraries as often as I like, or even depend on multiple libraries (or custom implementations) of the feature, without the rest of the code ever needing to know about any of this.

This approach always pans out if you're unsure what library to choose for the job. You might need to try out multiple libraries before you find the one that works best for you. Or, perhaps, you might not be satisfied with any of them and write your own implementation. Or a hybrid implementation... doesn't really matter as long as the rest of the code doesn't need to know about any of this. Just swap the implementations and try them out easily. You won't need to adapt your application code for the change.

This is more difficult when it comes to frameworks because they are more integrated into your code. However, you can achieve a good enough setup with such abstractions that even make frameworks look like one of the features your code has.

Notice the highlighted parts of the blog saying that you have to adapt your code for xxx. You should not adapt your code for libraries. If anything, you should write your code assuming there is a simple utility that does the job. This way you write a library to enrich your application code rather than writing your code to be able to use that library you want. Libraries are tools. They should serve YOU, not the other way around.

Divide et impera

Some libraries and frameworks are huge and do many things. Especially frameworks. They tend to cover lots of areas, solve lots of problems. Hiding a framework behind an interface would most likely be silly. The interface would have dozens or hundreds of methods and maintaining such a contract would be cumbersome. Here comes the interface segregation principle (SOL*I*D). Although, instead of splitting that enormous interface, you might want to first split the framework logically. What domains does it cover? Suppose it's some e-commerce framework. Can I extract an interface dealing with carts? Can I extract one for orders? For products' listings? For promotions? For anything else? The more fine-grained interfaces you extract, the easier it will be to maintain the abstraction. Noone says you have to write different implementations for all the interfaces - in Java a class can implement multiple interfaces. You can also use the Singleton pattern to back all implementations by the same instance of your e-commerce framework.

This approach applies to any Jack-of-spades library/framework. Divide its responsibilities into smaller sets of features and implement them using the same library if you want. Or your own implementation. Or whatever fits your bill. This segregation gives you the freedom you'll eventually want.

Don't think of a framework as of an almighty know-it-all. Think of it as a collection of features bundled into one. And you can use those features separately if you like.

Cut the losses - let your profits run

If you are a long-time framework user in some particular project and you notice that you spend more time adapting your code for that framework than working out the actual solution, perhaps it's time to leave that framework behind? It's always a choice on the table. If you've introduced that framework in your codebase as suggested above (nicely segregated abstractions), you can get rid of this bottleneck of a framework in no time. Just write your own implementations of those interfaces, write them iteratively if you like to. And eventually, you'll have eradicated that framework completely. And, once done, you are relieved of your duty to keep on adapting your code for the framework. You're now free to use that framework for the interfaces that you wouldn't benefit from rewriting and use your own implementations where you used to experience most of the maintenance bottlenecks.

Summary

So should I use a library for that? The answer is: what do your project guidelines say about it? Do the pros of the newly introduced foreign code outweigh potential hazards and maintenance hell? Will it be easier to maintain the code with the library or with a custom implementation? Is it reasonable to write a custom implementation in the first place?

Define that in your project guidelines. If you want to, you can define your project to be one huge dependency graph with your code as the glue holding the parts together. However, such a project will most likely be unmaintainable (pretty good for PoC though). If you like to, you can write all the features yourself without any libraries. It will burn a lot of time, will create a lot of bugs and you will reinvent a wheel; but you'll have a complete control on all the aspects of the code. Or don't be a radical and choose what model suits best for you. If you asked me, I'd say use a library if it would take you >2 hours to write your own implementation to solve that problem; but regardless of whether you're using a library or a custom implementation, write an abstraction for it and only use abstraction in your code.

I find frameworks very useful to start a project with - later on I might phase them out of the codebase. Libraries are great for PoC and similar code writeups requiring extremely short TTM - I tend to reevaluate a need for them soon after.

References

https://sandofsky.com/architecture/third-party-libraries/

Written with StackEdit.

DEV Community