"Not all debt is bad, but unpaid debt compounds."
This is especially true in the context of developing software. Technical debt in the context of software engineering refers to a framework that helps keep account for the long-term costs of moving fast. In this article, we will be exploring technical debt in the context of machine learning (ML) specifically.
ML is known to have a special ability to increase technical debt on top of the debt already presented by traditional software development. We will be delving into 3 key sources of technical debt in Machine learning, each explained with a possible solution.
1.Hidden feedback Loop
Machine learning capabilities are often part of a larger piece of software. This means that the machine learning system is integrated in a complex way in order to get some input from the main software, run it through the model and return the prediction to the software.
For example, take a recommendation system on a popular e-commerce site (cough*cough* Amazon). Have you ever wondered why such sites recommend things that you never even thought about purchasing? Well, it could be an issue of technical debt. The machine learning team could have implemented the state-of-the-art recommendation system that recommends items that are strongly related to the purchase history as well as new items which may be of interest to the customers.
However, due to the ‘business interests’ of the company, the effectiveness of the recommendation system could be hampered. The frontend team could simply block out recommended items which have lesser than 50% probability as those items could be of no interest to the customers. Gradually, those items with slightly above 50% probability will now be recommended with less than 50% probability. This marks the start of a vicious cycle. The system begins recommending the same kind of items the customer has already bought which defeats the purpose of the recommendation system.
Despite the company being able to enjoy large amounts of profits over a span of few months with this frontend tweak, based on the skewed activity that this tweak has encouraged of customers, the ML model has succumbed to the untrue training data it has been receiving. This, in turn, could result in stagnating or even decreasing rate of sales after a few months.
A possible way to circumvent this problem is to have strong collaboration links between the different dev teams working on the recommendation system. It is important that each team does not focus on pumping up their own performance metrics (e.g. Test accuracy score for ML team) but rather how they could collaborate seamlessly to improve sales over the long run which is the ultimate aim.
ML models could get input data that has already been output by other ML models. The lack of quality in data could cause inputs to ML models to be unstable and cause the behaviours of the data to deteriorate as it progresses through the pipeline. For example, embeddings (e.g. Word2vec) of data points could be used as input into a machine learning system. The extent of this issue is analogous to the domino effect.
A possible solution is to maintain 2 copies of the data after each stage in the data pipeline. That way we could leverage on the redundancies to ensure data quality and to analyze the stage at which data deteriorates in behaviour.
3.Undeclared consumers in pipeline
Last but not least, to build further on the issue of data deterioration through the pipeline, the series of pipelines built by different dev teams at different level of hierarchy could cause further mess. The pipeline could be hacked together to satisfy certain requirements for that time being with no vision over the long run. These complex pipelines could give rise to undeclared consumers who have no access control to the other related parts of the pipelines. Undeclared consumers consume the output of a model with no capabilities for interrogation. Such consumers could result in additional feedback loops that could lead to completely useless predictions in the succeeding machine learning models of the pipeline.
Allowing the various software teams to get a clear picture of the entire pipeline could solve this problem to a large extent. It is important to regularly work on strengthening the pipeline and fixing the loopholes to ensure its effective and efficient operation.
As you may have noticed, a salient theme for technical debt in ML is the lack of communication and understanding. Hacking things together to suffice upcoming milestone meetings are not going to help with paying off the technical debt. A few things that can be done to encourage the culture of paying off debts within your dev team are:
• Agreeing on a set of performance metrics for the system the team is building and ensuring it goes hand in hand with the business aims
• Ensuring each and every team member understands the entire pipeline even though he/she may not be working on it directly
• Encouraging constant communication between the various dev teams through regular in-person meetings
I would like to end off with a quote by popular software engineering speaker, Jim McCarthy, that I hope drives the point home:
"You can't have great software without a great team, and most software teams behave like dysfunctional families."
I would like to thank the following sources for inspiring me to write this article:
- D.Sculley’s talk ‘Machine Learning, Technical Debt, and You’ at PAPIs.io https://www.youtube.com/watch?v=V18AsBIHlWs
- And his paper: 'Machine Learning: The High-Interest Credit Card of Technical Debt' https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43146.pdf