Tracy Gilmore

Posted on Jan 14, 2023

Poorly managed packages considered harmful

#gratitude #community

Introduction

Like so many articles entitled "... considered harmful" this one is intended to argue/suggest that undertaking a certain course of action or employing a particular technique, might be detrimental to application development. The title pays homage to the seminal 1968 paper by Edsger W. Dijkstra called "Go To Statement Considered Harmful".

The paper was first published in Communications of the ACM (Association for Computing Machinery), Vol. 11, No. 3, and described the primary feature programming languages of the era used for branching and its shortcomings. Few nowadays would argue trying to finding alternative approaches to GoTo's, such as subroutines, procedures and functions, was a poor steer.

"On the perils of packages"

Free does not necessarily mean without cost. Many packages are provided to the community free of charge, which is a very compelling reason for using them (e.g. React JS). But there are variety of ways adopting 3rd-party code can come with an unexpected price tag.

Why do we use them

A module-based architecture is a proven strategy for the construction of robust systems. An application comprising of many decoupled components is easier to test, maintain and evolve. As a consequence many programming languages, including JavaScript, have syntax to define modules and/or components.

But modules are just the "tip of the proverbial iceberg"; sharing modules requires a little more. In order to publish and integrate 3rd-party modules they often need additional wrapping to enable features such as version control, dependency management and registration/discovery. Packages and modules have become an essential building block in the construction of modern applications; almost irrespective of the programming language or technology stack. It can be considered an extreme form of the DRY (Don't Repeat Yourself) principle.

In Java and Python they are simply known as packages but in .Net they are often called NuGets, in Rust they are "Crates" and in JavaScript "Node/NPM Packages" (as an extension of the language.) But whatever the technology, unless care is taken, using any 3rd-party code can be hazardous, with or without they being any malice or an ulterior motive in mind.

In deed, the idea of writing everything from scratch and not make use of the work and expertise of others is preposterous and in many cases foolhardy; except in very extreme circumstances.

What could possibly go wrong

I am sure other technology stacks have their own 'dirty laundry', none is exempt I am sure, but JS and Node is the domain I know best. I have chosen the cases cited below not because they were particularly bad but because they were well publicised at the time and represent a variety of motivations and consequences. I will not go into the details but have provided links to articles that discuss each case in more detail.

January 2022: Colors.js and faker.js by Marak Squires bleepingcomputer, revenera abrupt tool withdrawal and corruption of own repository.
November 2018: flatmap-stream in Event-stream surreptitious harvesting of authentication information.
March 2016: leftpad by Azer Koçulu sudden withdrawal of a low-level package impacting frameworks.

In all three cases there was a significant but largely recoverable impact on the industry but only in the second case was there a suggestion the author deliberately set out to cause disruption for personal gain. In the other two cases the issue can about through the withdrawal of software developed by the package owners. It can be argued that in the last example (leftpad) the industry response was ultimately to enhance the ECMA Script specification - a positive outcome.

What the examples demonstrate is nothing really comes for free, even packages freely given to the community. Incorporating 3rd-party code will always come with a risk. In the best case the foreign package might introduce a vulnerability to the application architecture/longevity. At worst the foreign package could expose the application to a vector for attack.

What are the problems

Interdependency

A common feature of many package discovery systems is there inter-dependency. Most packages utilise others, that are built on others. "Dependencies all the way down" you could say. The consequence being that incorporating a 3rd-party package is seldom the end of the story. You are also taking on the packages that are not listed as direct dependencies. Indirect dependencies can be hi-jacked, corrupted or, as highlighted above, removed from circulation.

Publication hi-jacking

In the case of the NPM registry (I am sure there are other examples), anyone can publish a package and get it listed in searches. But what is worse is that accounts can be hi-jacked unless the developer protects their account.

Unchecked adoption

Software Engineers are inherently lazy and that is a good thing. We are continually looking for a quicker/cheaper way to deliver features in tight deadlines means we can be too quick to adopt 3rd-party code. This can be hazardous if insufficient research is conducted but that costs time and therefor money.

Growing dependency

Third-party packages are making up an ever increasing proportion of the application eco-system including software modules built into the source code, libraries and frameworks (dependencies) and tools & plugins (dev dependencies). As our reliance on such packages increases so does our risk, if not managed properly.

How can we protect ourselves

Before taking a package into a project first several options should be sought and an assessment be made through asking the following technical and legal/commercial questions.

12 Questions the technical leadership should ask

Does the package offer all the required functionality?
Does the original motivation for the package align with the project requirement?
Does the package meet the project's quality assurance needs (are there unit tests)?
Does the package align/support the project's accessibility/internationalisation needs?
Does the package respect SemVer (Semantic versioning) and is it mature?
Is the project an original source or is it a fork/clone?
What is the type of custodian of the project repo?
- Sole author - beware
- Company backed - beware but less risk
- Community - best option
Should the project take packages (and updates) directly from the public registry NPM or should a private intermediary be employed? Options include verdaccio, nexus), etc.
Is the project documentation well maintained and informative and is the a development roadmap?
Is there a Contributor Code of Conduct and is it appropriate?
Are there outstanding issues and Pull Requests or is the project actively maintained?
Is the project community proactive and supportive?

3 Questions the project leadership should ask

Is the type of licence the project employs clearly defined?
Are the terms and conditions of the licence compatible with the project, the company and the end-user/customer?
What obligations does the licence place on the project?

I am sure there are more questions. If you think of any please let me know via the comments section below.

#1 bleepingcomputer - Dev corrupts NPM libs colors and faker breaking thousands of apps
#2 revenera - The story behind colors and faker JS
#3 NPM JS - Details about the event-stream incident
#4 The Register - NPM leftpad Chaos

Top comments (5)

Andre Du Plessis • Jan 16 '23

This is a very interesting and valid argument you are tackling here, Tracy.

I don't think we will ever walk away from being dependent or inter dependent in any realm, forget the software arena.

You provide ample guidance on how we can get this "dependency relationships" to work reliably and consistently in favour of better, more secure software and specifically package management.

Yes, there are probably more things that could be added to the lists you gave, but there seems to be one major obstacle affecting most developers and even more so individual, small organizations and SMEs that are participating in the Open Source community.

The conformance to some-or-the-other-standard part is relatively easy, the seemingly hard part is monitoring this conformance. When done by humans, yes. We just don't have the capacity to quickly process all the variables involved in such a vast "net work of inter-dependent" blocks.

There's a fairly recent article Open-source software vs. the proposed Cyber Resilience Act" (Nov 22) specifically focussing on a SW development standards in the EU related to the ECs Cyber Resilience Act (CRA).

It is currently leaning towards all SW to be used or sold on the EU markets having to be "certified" with something like the CE Mark. This implies third-party audits, which points to budgets the small guys will never be able to afford.

I'm sure this has been discussed by most "sane-thinking" governments, especially in the wake and ongoing waves of CySec issues we all face in ever increasing frequency.

The question in the article regarding FOSS developer communities, large or small being able to comply, should something like this come to pass. Nobody is saying not to get SW Audited, but most are asking how on earth is it going to be done in an affordable ways in the opensource community.

This affects about 70+% of all software used by humanity, including the big proprietary SW guys like MS, Apple, IBM, Google, etc. not to mention FOSS products from the EU like NextCloud and so on. Stats say they are major users of OSS. Yet they also make the most money from it.

But can it really be that hard to check, verify,/audit and certify it automatically using current technologies?

In my mind we will need to find ways to use AI to do this and do it on the fly and free as the code is actively developed.

We already have initiatives like Snyk, Sentry and others that comes to mind as examples of the type of things that are already done to scan code for vulnerable packages during the dev cycle and monitor the performance of live software in production.

Surely it shouldn't be that hard to process any repositories on version control platforms to have this type of Certified Audits and their full histories managed as well?

Do you know of any initiatives leaning in that direction?

Tracy Gilmore • Jan 16 '23

Hi Andre,
Thank you for reading my ramblings and for your considered response. The last two projects I worked on (both large systems) took very different approaches to the management of NPM packages.

The first project imposed a gateway and private NPM registry so we had not access to the public registry. This slowed down the adoption of new or updated packages but safeguarded the project from the incorporation of inappropriately licenced, poorly maintained and potentially hijacked 3rd-party packages.

The second project did quite the opposite adopting updated packages as soon as they became available and in fact failing the build pipeline when outdated packages were detected. The justification for this approach was well founded - to ensure security updates were adopted as rapidly as possible to avoid vulnerabilities.

In all honesty I cannot say which is the best approach, they both have merits and drawbacks.

Thanks for the link I will investigate that further.

Regards, Tracy

Andre Du Plessis • Jan 17 '23

Thanks for your feedback on this. It makes the whole issue you personally experienced clearer to me via the comparison.

What if one could use both?
Us the "Private Registry" for the higher risk components and the "public registry" for all initially but in a sandbox deployment.

Check the higher risk components out first and if valid and passing Private Registry QA stipulations, pass them onto the "Private Registry".

Obviously the remainder of the updates gets adopted into the final production code, permitting all "public-based" code passing the companies public registry QA stipulations.

Or am I oversimplifying this to the point of "Not as simple as that!"?

Tracy Gilmore • Jan 17 '23

It is a good suggestion but two critical differences between the two approach, which I think will be difficult to reconcile are:
1, The first approach takes more and dedicated human resources, which the other project could not afford.
2, The second project has I high level of trust in the public registry than the first but does also have automated monitoring of CVEs announced by cve.mitre.org/.

Andre Du Plessis • Jan 17 '23

Thanks for the clarification as well as the pointer to Mitre. I'm going to have a look and add to my toolbox.