The Costs of Technical Debt

Edit: Still an important concept to understand.

Originally posted December 8, 2015 on AIXchange

As often as I see it, it still surprises me when I encounter a company that depends on some application, but chooses to run it on unsupported hardware without maintenance agreements and/or vendor support. If anything goes sideways, who knows how they will stay in business.

Another situation that isn’t uncommon involves time-sensitive projects, new builds where settings or changes are identified and added to a change log. It’s supposed to get taken care of in a few days, but you know the drill. Somehow the changes aren’t made, and before you know it the machine is now production. The build process is over and users are on to testing or development.

Then there are the innumerable enterprises that continue to run old hardware, old software, old operating systems or old firmware. Why is this the case? Are business owners not funding needed updates and changes? Is it a vendor issue? Sometimes vendors go out of business or discontinue support of back versions of their solutions. In smaller shops, maybe one tech cares for the system, and no one else has any idea what’s being done to keep things running. This becomes a problem if that one tech leaves. Then there’s the all-purpose excuse: “If it isn’t broke, why fix it?”

There’s actually a name for this: technical debt:

Technical debt (also known as design debt or code debt) is a recent metaphor referring to the eventual consequences of any system design, software architecture or software development within a codebase. The debt can be thought of as work that needs to be done before a particular job can be considered complete or proper. If the debt is not repaid, then it will keep on accumulating interest, making it hard to implement changes later on. Unaddressed technical debt increases software entropy.

Analogous to monetary debt, technical debt is not necessarily a bad thing, and sometimes technical debt is required to move projects forward.

As a change is started on a codebase, there is often the need to make other coordinated changes at the same time in other parts of the codebase or documentation. The other required, but uncompleted changes, are considered debt that must be paid at some point in the future. Just like financial debt, these uncompleted changes incur interest on top of interest, making it cumbersome to build a project. Although the term is used in software development primarily, it can also be applied to other professions.

It’s hardly a new term, either. Although this piece, from 2003, focuses on the process of writing software, I think it’s applicable to other areas of IT as well.

Technical Debt is a wonderful metaphor developed by Ward Cunningham to help us think about this problem. In this metaphor, doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into the better design. Although it costs to pay down the principal, we gain by reduced interest payments in the future.

The metaphor also explains why it may be sensible to do the quick and dirty approach. Just as a business incurs some debt to take advantage of a market opportunity developers may incur technical debt to hit an important deadline. The all too common problem is that development organizations let their debt get out of control and spend most of their future development effort paying crippling interest payments.

The tricky thing about technical debt, of course, is that unlike money it’s impossible to measure effectively.

The same article cites this 1992 report. (Funny how as quickly as business computers evolve, some of the underlying issues of using them remain with us.)

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite…. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object- oriented or otherwise.

Here’s more from the wikipedia link:

“It is useful to differentiate between kinds of technical debt. Fowler differentiates “Reckless” vs. “Prudent” and “Deliberate” vs. “Inadvertent” in his discussion on Technical Debt quadrant.”

There’s also this:

The concept of technical debt is central to understanding the forces that weigh upon systems, for it often explains where, how, and why a system is stressed. In cities, repairs on infrastructure are often delayed and incremental changes are made rather than bold ones. So it is again in software-intensive systems. Users suffer the consequences of capricious complexity, delayed improvements, and insufficient incremental change; the developers who evolve such systems suffer the slings and arrows of never being able to write quality code because they are always trying to catch up.

Finally, this article argues that we aren’t making the leaps and bounds in computing we once did, in part because of technical debt.

A decade ago virtual reality pioneer Jaron Lanier noted the complexity of software seems to outpace improvements in hardware, giving us the sense that we’re running in place. Our computers, he argued, have become more complex and less reliable. We can see the truth of this everywhere: Networked systems provide massive capacities but introduce great vulnerabilities. Simple programs bloat with endless features. Things get worse, not better.

Anyone who’s built a career in IT understands this technical debt. Legacy systems persist for decades. Every major operating system — desktop and mobile — has bugs so persistent they seem more like permanent features than temporary mistakes. Yet we constantly build news things on top of these increasingly rickety scaffolds. We do more, so we crash more — our response to that has been to make crashes as nearly painless as possible. The hard lockups and BSODs of a few years ago have morphed a momentary disappearance, as if nothing of real consequence has happened.

Worse still, we seem to regard every aspect of IT with a ridiculous and undeserved sense of permanence. We don’t want to throw away our old computers while they still work. We don’t want to abandon our old programs. Some of that is pure sentimentality — after all, why keep using something that’s slow and increasingly less useful? More of it reflects the investment of time and attention spent learning a sophisticated piece of software.

What are your thoughts? Is “good enough” actually good enough, or could we be doing more?