DEV Community

Eric
Eric

Posted on

What is technical debt and why does it cost $85 billion a year? How can CI/CD help reduce it?

TLDR

Technical debt (also known as design debt or code debt) is a concept in software development that reflects the implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer. https://en.wikipedia.org/wiki/Technical_debt

Technical debt is difficult to deal with. It is costly and can hide in all facets of an organization. Having a plan to identify, log, prioritize, detect and account for tech debt is crucial to the smooth operation of the company and employee morale.

Take your time, do it right. You may be able to go back and refactor some of the code, but you will never be able to refactor the underlying patterns. Once it gets gets to pattern changes, rewrite often occurs. Businesses don't like the word rewrite. While rewrite has a negative connotation, it often means that lessons have been learned and we have found better ways to implement business requirements that will not only benefit the business, but developer productivity and morale as well.

What does Technical Debt mean to me?

Tech debt is the sum of all inefficiencies caused by past, present, and future decisions across the entire company. It affects both the technical and business side of the organization. It is always in a state of creation and/or destruction. Once it shows up, there will always be an effort needed to remove it.

How much does it cost?

It costs $85,000,000,000 a year.

Ideas on reducing tech debt

A plan needs to be implemented to eradicate it in your system. If an implementation/system is so poor that users lose trust, it may be worth considering an exit strategy on that software and looking into alternative solutions.

The decision to prioritize tech debt in the system should be based on the impact it is having on the organization.

What questions can you ask to uncover tech debt?

Business processes

  • Do business users trust their system(s)?
  • Are there any manual steps necessary to complete a business process?
  • Do you know the specifics of data leaving or entering your applications?

Infrastructure

  • How quickly can environments bet stood up and torn down?
  • How easy is it to manage your infrastructure?
  • How is your logging?
  • How is your monitoring?
  • Do you have a disaster recovery plan? If so, how long would it take to get back to an operational state in the occurrence of a disaster?

Build/Release pipelines

  • How long does it take to get code from development to production?
  • Are there any manual steps in the deployment process?

Code

  • Do software architects have enough control over their business unit's systems to enforce standards and best practices across them?
  • Are best practices and standards being followed across the orgization?
  • Do you have 100% unit test coverage?
  • Do you have integration test coverage on critical parts of the system?

How do you estimate technical debt?

For estimating developer time, if you are following agile-ish pratices, during retrospective meetings, ask, if the system(s) met all expectations and were running optimally, how much time could have been saved? Log those estimates over time and extrapolate how much that is costing the organization.

What is CI/CD?

CI/CD is an interpreted and generic term that ultimately means how fast can you detect bad code, stabilize it, and safely/reliably deploy it to production.

Areas of Interest:

  • Pull Requests
    • Code review is crucial to delivering high quality software.
  • 100% unit test coverage
    • With the world of libraries and packages, standards may not be enforced. Popular libraries will typically have unit test coverage, but it is always a good idea to do a technical audit on any new and/or existing libraries you are referencing to make sure it meets standards. It also would be a good idea to run a coverage tool on the libraries crucial to your business. (DotCover, WallabyJS are tools I have used)
  • Integration test coverage
    • Coverage on critical parts of the system should be a requirement. Between the unit test coverage and critical path integration test coverage, developers can merge code to master with a high level of confidence.
  • Pipelines are built to generate builds, run tests, and release builds. The complexity of these pipelines will vary significantly by the type of resource. The time it takes to build these pipelines must be accounted for in estimates.
  • Feature toggles are implemented to deploy chunks of a system in an iterative fashion. Feature toggles are tracked and can be contextual to the user.
  • Strict data contracts must be enforced on critical parts of the system.
  • GitFlow
  • Monitoring
    • This is critical to detecting issues across the entire system. Azure has a tool called Smart Detection that will detect a number of anomalies and send out email notifications. For example, Azure will track dependency failures over time and alert when failure rates go above an abnormal threshold.

Conclusion

Software is hard. This was a lot of topics to cover. I'm sure I missed things, so feel free to suggest any additions to this.

Top comments (0)