Thoughts on Tech Debt

May 16, 2024 Draft

What is tech debt?

My sense is that it can be any decision made in a system that makes it harder than necessary to evolve the system. This can be intentional. Time to market and all that. It can be accidental. It can be completely innocent - decisions that made sense yesterday may not make sense today due to changing circumstances.

There are always small imperfections scattered throughout a codebase that could be called tech debt. But it is more useful to focus on larger issues that seriously impede new work.

Some examples:

  • Lots of repeated logic that needs to be updated.
  • Partial refactorings.
  • Coding standards have evolved but evidence of the older ones remain.
  • Simple changes take weeks instead of days due to earlier assumptions.
  • A 3rd party library hits you with a nasty breaking update.

How does it happen?

We can blame the nature of evolving systems. Every system is intended to solve one or more problems. Over time we will both evolve our understanding of those problems and add new problems to solve.

So our system began as a naive solution to problem ‘A’. We now understand that problem better so we want to build a more nuanced solution to ‘A’ while also adding a solution for problem ‘B’. Depending on the specifics, that could be easy or a complete nightmare.

What is it about software specifically?

It seems like this problem is larger in software than in other disciplines. Why don’t software folks just git good?

My theory is that there is no other discipline so amenable to change as software. The materials are free! If you can think it you can build it! Other fields generally don’t expect to be able to continually modify the design of the product. At some point you lock it in and deliver. But in software you can keep making changes, so you will. And this is often great!

But the main challenge of software is managing complexity. And every feature represents some amount of complexity. First in the obvious sense that one feature + another feature = 2 features worth of complexity.

But it’s more than that. A system will be designed to support solving an initial set of problems. This will lead to constraints that make certain changes hard. The next problem to solve may hit against those constraints. This means that solving the next problem in the context of your system will be EVEN HARDER than doing it from a blank slate. It can be incredibly challenging (and rewarding) at times just to solve a single isolated problem. But now you have to fight against the system you just built! This is the hidden extra complexity.

So in my opinion it’s not that engineers are doing a poor job. It’s that on the surface, making changes to software appears incredibly easy. And doing so quickly is possible in the short term. But it often comes at incredible cost to the system’s maintainability in the long term.

What is the view from outside?

As engineering rolls out features, folks expect the system to remain in good health. This makes sense when compared to other things in life. In the real world, the connection between physical things (or lack therof) is more obvious. When you get your roof replaced you don’t expect your microwave to explode. They’re clearly unrelated! But the connection between software features is often filled with subtlety and suprise.

Unless you’re up past your ankles in code, your view of the system will be the output: the running system. And each new feature may truly be great and meet all current requirements! But without active effort the system will not remain in great condition.

There can be indirect clues of what is happening. New features may start taking longer and longer, and with more and more bugs. And the engineers may have trouble articulating why.

So we have a quietly deteriorating system. And a business does not want to hear that new features are blocked until we spend considerable time cleaning up the system. And they will never automatically have that expectation because it requires intimate knowledge of how the system is built to understand what is wrong. A healthy business will be open to some amount of this kind of feedback.

What to do about it?

There are several general approaches. One is to pad estimates by default and use the padding for smaller cleanups. Ideally this would be normalized and expected behavior.

The other is to spend effort to expose the need for a larger cleanup effort. But for this to succeed you don’t just need to be right about the problem. Engineers must also be able to sell it or it won’t be greenlit.

Again, the business’s default expectation is the system is always fine, and features can simply be added one after the other forever. When that becomes false, eng must make the case for fixing it.

Going too far

There is a pathology among engineers where it begins to feel like “everything is tech debt” and “everything must be rewritten (in rust?)”. No code feels more ‘homey’ than code you wrote or reviewed recently. Engineers must be on guard for that attitude.

I think when tech debt gets bad enough, and we are forced to deal with it, we can develop a temporary sensitivity to it that can lead to the above. So in a sense, keeping it at a healthy level can reduce the urge to “rewrite everything”.