I was in the middle of my quest to interview as many top software engineering teams as was necessary to nail the perfect process to manage tech debt.
Needless to say, we blew right past the time we'd allocated for our introductory call as we were riffing about technical debt and how big a role it plays in the daily decisions of Agile software development teams.
Janna and the ProdPad team were incredibly kind to organise a webinar for me to talk about what I'd learned from the 200 top engineering teams I'd interviewed about tech debt.
In this talk, you will learn:
2:38 — What is tech debt?
3:31 — Why is tech debt a thing?
5:09 — Martin Fowler's Technical Debt Quadrant.
6:22 — Tech debt myths to debunk.
8:12 — Why bother managing technical debt properly?
10:20 — Creating your tech debt management strategy.
To get you started with this, you can take our 1 minute 'tech debt credit score'test to find out how much debt you can take on and why.
15:42 — The one cultural characteristic for a healthy codebase.
18:41 — How to create & think about your tech debt budget.
21:40 — How to deal with 'small' debt.
22:40 — How to deal with 'medium-sized' debt.
27:01 — How to deal with 'large' debt.
29:12 — High level takeaways.
30:02 — A SaaS product to help you manage tech debt.
30:53 — Q&A.
Over the last 6 months, I've interviewed over 200 top engineers in all positions and types of companies as part of the customer development work we've been doing for our latest product.
I've been working on products to help engineering teams ship better software faster for over 4 years now and raised millions of pounds to finance that.
We're lucky to work with and learn from some of the best software companies out there like Snyk or City Pantry, and I want to share some of these lessons with you today.
We'll talk about:
And then talk through the tactics, processes, and methods the best engineering teams use to manage tech debt, which includes:
I assume you all know the metaphor based on financial debt. There are many definitions for tech debt, most of which work well, this is my attempt at simplifying them.
'Technical debt is code written yesterday that's a burden today.'
It's all that extra unnecessary work you need to do to get your software out the door.
I love this comic by MonkeyUser which does a great job at illustrating it.
The team is digging that tunnel so fast! But forgot to get rid of all the rubble they dug out... A bug is causing water to leak into their tunnel. They're stuck in there because of technical debt, won't be able to do anything about the bug, and might even die in there. That's technical debt and what it can do to a business.
But let's talk about why tech debt is even a thing? Why can we never seem to avoid it, no matter what we try? Because
'software exists in a world of uncertainty'.
This quote is by Martin Fowler who, in my opinion, wrote the best blog posts about technical debt, I encourage you to read them.
Essentially, tech debt exists because the code that we write to solve a problem is based on our current understanding of that problem.
It sounds obvious, but let's unpack this a little. Even if the perfect engineers found the perfect solution to a problem and coded it perfectly, their understanding of the problem will evolve—and quickly.
But left unattended, their code won't.
This means that our code will soon no longer appropriate. It happens all the time, and much faster than you might think especially in a high-growth environment.
Something else to consider is that:
It's often the case that it can take a year of programming on a project before you understand what the best design approach should have been.
Another quote by Martin Fowler from his piece on the Technical debt quadrant.
However, what I've learnt the best in the business do know how to handle such high uncertainty. They use the right tools and processes, continuously refactor code that has accumulated too much cruft, and they won't accept messy code as technical debt.
I want to unpack this last bit a little bit too so that we're all on the same page. You may have seen this quadrant before:
When I say the best engineers in the business won't accept messy code as technical debt, I mean they aim for the top right corner of the quadrant—prudent and deliberate debt.
Technical debt is not an inherently evil and bad thing. Just like financial debt, it's a tool that we can use to gain leverage and test ideas faster. Just like financial debt, if we take it on without being prudent, deliberate, and managing it carefully, it will screw us over and we will go technically bankrupt.
We should never get sloppy and accept any kind of incompetence as acceptable technical debt. We should be aware of current best practices, and use them. We should write clean, readable code. And we should know that code, left unattended, will turn into technical debt.
The best engineering teams in the business have ways to handle the uncertainty inherent to building software and end up in the top right quadrant.
Now, let's debunk a few myths to summarise the key ideas here.
There's this myth that tech debt is bad and no one should ever take any on.
Wrong. If you're sending people to Mars, sure. If you're building software where the cost of failure isn't high, you can use tech debt as a way to gain extra leverage, just like financial debt. Just as we discussed, taking on tech debt prudently and deliberately is fine and if you have 0 tech debt, you should truly ask yourself why.
Also, it's not realistic to have 0 TD because of entropy in the codebase. That's how Ron Paridges, VP of Engineering at Carta, told me that he looks at technical debt: as entropy in the codebase. It never ends, and is a constant struggle. So adjust your expectations accordingly.
The next myth is about how tech debt is only engineering's problem.
Nope. We'll talk about how TD impacts the whole company, and talk some more about how company culture, particularly how PMs and engineers work together, and how much leadership understands about TD impact a company's approach to managing TD.
And finally, a lot of people wrongly believe that managing tech debt properly will slow us down.
Super duper wrong. If you're managing tech debt successfully, you'll see your number of bugs go down and velocity go up. I'll show you how it's done.
At this stage, you might be asking yourself 'but why even bother managing tech debt?'. I'll give you the macro numbers first.
By 2024, global technical debt that has not been remediated will double, totalling $4 trillion.
These numbers are so big that they're meaningless so I'll give you one that hits closer to home.
Through 2023, I&O leaders that actively manage and reduce technical debt will achieve at least 50% faster service delivery times to the business.
This is a long quote but it basically says that companies who have a strategy for TD will ship 50% faster.
And if you think Gartner don't know what they're talking about, consider this datapoint that Stripe, arguably the best software company out there, uncovered in their research:
Engineers spend ~33% of their time dealing with technical debt which crushes team morale and costs companies ~$85Bn/year.
How the hell can they say that? Let's unpack it.
Technical debt slows the entire engineering team down within days or weeks and has repercussions across the entire business. Check this out:
In software companies, too much tech debt, means you'll get too many bugs, loads of performance issues, and too much downtime. That'll create more work for QA, more work for the SRE team, and results in broken SLAs.
All that stuff tallies up to more customer complaints, which means more work for support, customer success, and account management. And it all adds up to unhappy customers.
I've heard some version of this many times: 'we'd be shipping twice as fast today if we'd handled TD carefully in the past'. I'm sure you've all seen this happen too: a feature you thought would be simple and take a sprint ends up taking the month. Now imagine this at a global scale.
So now hopefully you see why it's imperative to manage tech debt carefully.
To get you started on that journey, let's first figure out wow much tech debt you can afford to take on as a company.
Unfortunately, no one's ever managed to come up with mathematical formulas to answer this question, but I'll give you a few pointers so you can begin to find an answer for yourself.
Is the software you're building critical to your business?
I assume most of us here work at software product companies so the answer is 'yes', but if you're building some system without which the company could function pretty well, you're unlikely to get the budget you need to pay back tech debt, so bare that in mind before you take it on.
Conversely, companies who live and die by their software product will manage tech debt very carefully.
Is the cost of failure in your software high or low?
If very high, you can't afford to take on tech debt. By very high, I mean life or death.
If low, you can afford to take on prudent and deliberate tech debt to gain extra leverage. By low, I mean users might find a few bugs but no one will get harmed or sued.
Ask yourself: how much does your team sweat when something breaks? Or am I in a highly regulated industry? Highly regulated industries often have higher costs of failure.
What is your competitive advantage and business objective?
Are you in a highly competitive market where all software is beautiful and flawless and your UI/UX is your competitive advantage?
Do you win because of how fast you churn out features?
Is reliability and security the be-all end-all?
Manage tech debt in a way that play into that. It's no accident that one of Facebook's cultural tenants was 'move fast and break things'. Speed is paramount to how they win.
Are you pre or post P/M-fit?
If pre-P/M fit, knock yourself out with prudent and deliberate tech debt. A very small subset of what you end up building will truly matter, which means you can throw lots of code away and write off debt. Get it in front of users, learn, toss away what didn't work, and rewrite what did.
If post P/M-fit, you can't afford to throw as much code away and that's OK because you probably have more certainty about most of what you're shipping. You'll have to maintain the code you ship, so make sure you'll be happy doing so.
Can one human reasonably be expected to understand your entire system?
If yes, tech debt will be easier to manage, so feel free to take on more than you usually would.
If no, be careful. If you introduce tech debt in a part of your codebase where the bus factor is 1 and that engineer leaves the organisation, you're toast. Like we'll see later on, it's important that you document your debt to avoid this problem.
How many engineers work at the company?
Remember, tech debt is inevitable because of entropy. If you have lots of engineers shipping code and all introducing tech debt, you'll accumulate it much faster than if you had a small team. Small pieces of debt don't seem like much, but, in that context, they add up.
On top of that, engineers are expensive. If tech debt slows them down or creates more work for them, the company is racking up a serious bill. Not just in engineering cost, but also in opportunity cost: how much more revenue would the company generate if it'd launched its key product a month earlier if it didn't have so much TD? This stuff compounds.
Does company leadership understand tech debt?
Will they let you pay down the tech debt you're just about to take on?
If yes, you're safe to take on some debt. If no, ask yourself why. There may be a problem to solve here, and I'll tell you how when we talk about tech debt processes.
For each question, write down if the answer implies that you can afford to take on high levels of tech debt or low levels of tech debt. Then, you'll have a good idea of how to devise your tech debt management strategy.
Also note that you can adopt different tech debt management strategies on different parts of your codebase. For example, you might come to the conclusion that you can take on TD on your FE code because UX/UI bugs are tolerable, but that you can't afford to take on much debt on your BE systems because data security and resilience are key to your customers.
Next, it is crucial to understand that technical debt isn't just technical. It's about people. It's deeply influenced by your company culture.
To give you an example, if your engineers are never recognised for paying down tech debt and it doesn't advance their career, do you think they're likely to volunteer to address debt? Or, if engineers get reprimanded at the slightest hiccup in the software by people who don't understand that tech debt can be used for extra leverage, do you think they'll take on any debt? Clearly, no, they won't.
As we all know, company culture is a huge topic, but I've gone deep into it and I'm about to share with you the one cultural characteristic you should focus on if you want a healthy codebase.
Ownership is a leading indicator of engineering health.
Gareth Visagie, Chief Architect at Snyk
Ownership. And I'm not just talking about a hazy concept here, Microsoft did some great research that we can use to quantify that.
If you analyse your git commit activity to see which percentage of modifications to each file in your codebase were made by the main author of the code, you'll see that the files with bugs are the files where contributors made less than 60% of the edits. In other words, code ownership is a leading indicator of codebase health. You can use it to predict where things will break and reverse the trend before they do.
I want to add a bit of nuance here so that we can draw the right conclusions. This research does not suggest that each file in your codebase should be owned by one and only one person and that they're the only person who can work on it. That would put your bus factor in the risky zone.
Ownership is a spectrum. It starts with orphaned code which doesn't have a clear contributor and therefore no one is implicitly responsible for its maintenance. This is a bad spot to be in. All the way to absolute ownership where only one person can modify the code in question. For each file in your codebase, you want to be in the collaborative ownership zone where the main contributor made more than 50% of all edits, but not all of them.
We won't get into it today because we don't have enough time, but think hard about how you can foster a culture of collaborative ownership in your engineering team. It's the best way to maintain a healthy codebase.
So, you now have a rough idea for your company's 'tech credit score' and the cultural drivers behind tech debt. Now let's talk about tech debt budgets.
You might've heard about this before, it's the idea that you should allocate a fixed proportion of your sprint capacity to paying back technical debt—say 10% to 20% of your time.
How the hell do you come up with the appropriate number?
Well, it should change every sprint and it turns out it's not that important to explicitly pick the right number. But I'll tell you how you can think about this.
At Stepsize, we like to think of tech debt budgets like SRE teams think about their site reliability goals.
Site Reliability is responsible for keeping software products up and running but interestingly, companies like Google don't aim for 100% uptime. That's because 99.99% uptime is enough for Google products to appear supremely reliable to real-world humans. That last 0.01% is exponentially more difficult to reach and it simply isn't worth fighting for.
Consequently, if this allows them 52 minutes of down-time per year, Google will want to get as close to that as possible. Anything less than 52 minutes of downtime is a missed opportunity for taking extra risks and delivering more ambitious features for their customers faster.
Think of your tech debt budget like your site reliability budget. Provided it's prudent technical debt you're taking on deliberately—and you remain below the maximum amount of tech debt you can tolerate before affecting your customers and business—you should feel free to take more risks, even if you increase the amount of tech debt, because that's how you'll beat your competitors.
This pseudo-graph summarises the idea. You want to hover around the maximum amount of tech debt you'll tolerate, and your tech debt budget can be in the red—you need to pay some back—or in the green—you can afford to take more on.
A simple way to define your tech debt budget is to identify the intersection of things you know you'll work on using your product roadmap, and the parts of your codebase that have tech debt. Then, you pay off the debt in that intersection, but not outside of it. Scope out the work, and you'll have your tech debt budget for your sprint, quarter, or year if your roadmap stretches that far into the future.
The key idea here is that you don't need to address all your tech debt right now in one go. Address the debt that's in the way of your key goals for the quarter or whatever period you select.
Now let's get practical and talk about how you can incorporate tech debt management into your day to day Agile development process.
The first question that should be asked of any tech debt is: is this a small, medium, or large piece of debt?
Small pieces of debt
Small debt is the tech debt that can be addressed then and there when the engineer spots it in the code and it is understood that it is part of the scope of the ticket they're working on. It could be refactoring a function or two or renaming some variables. The best way to think about it is to follow the boyscout rule:
Small jobs like these don't require any kind of planning, and each engineer should feel empowered to fix this kind of debt without anyone's approval. You see, ownership is key again.
These are pieces of debt that can be addressed within a sprint. They should go through the same sprint planning process as any feature work and be considered just as rigorously. That's where most engineering teams fail.
I spoke about this with James Rosen, Engineering Manager at Everlane, and he told me this:
Consider how much time PMs spend curating the set of features to work on. Now compare this to the amount of time and effort engineers dedicate to making the business case for tech debt. Is it that surprising that close to 0 engineering capacity gets allocated to tech debt?
Businesses rightly prioritise work that delivers value to the customer. And, at first glance, getting rid of tech debt won't do that. But, as we discussed earlier, tech debt does hinder your capacity to deliver value to the customer in many ways!
Companies need to identify key pieces of tech debt that get in the way of key goals, cost countless engineering hours in productivity losses, or are the root cause for bugs and other issues that impact the customer experience. But most companies remain blissfully unaware of that and bear the enormous costs of tech debt without even realising it.
In order to fix that, engineering teams need to document their tech debt and the problems its causing carefully so as to quantify its cost to the business. Only then will they be able to make a proper business case for any given piece of debt, and prioritise things properly.
This is where tooling has failed us so far. As Jake, a Lead Software Engineer at Unqork, told me:
Jira is 'a great place to manage projects, but a terrible place to track and monitor tech debt'.
Just take a look at your 'tech debt' epic where the 7000 tickets your engineers diligently logged went to die, until everyone gave up on the idea. And code quality tools are helpful at surfacing one facet of tech debt, but won't catch most other types of tech debt.
Fortunately, that's exactly what our new product is for. Engineering teams have limited time to deal with tech debt and need to make it count. Stepsize helps them capture and track tech debt from their workflow so they can quantify its cost to the business and ultimately prioritise the most important tech debt.
A good way to make sure engineers pick up these new habits and that the process keeps running smoothly is for each engineering team to run a tech debt retro every couple of weeks where they review new tech debt that's been reported, document any missing pieces, and decide which tech debt is most important to bring up during the next sprint planning session. Remember, you want to create a culture of ownership, so make sure the team lead understands that they own this part of the process and that each engineer is responsible for reporting, documenting, and costing tech debt.
During sprint planning, for each feature, ask: is there an opportunity to address tech debt as part of this feature work? Or, which tech debt could we address to make delivery of this features smoother and faster? Then actually scope out the work, create a ticket, and add it to your sprint.
You need to have conversations about tech debt in your usual sprint ceremonies if you want a chance at creating the right culture and managing it properly.
Large pieces of debt
Large debt is the tech debt that cannot be addressed right then and there or even in one sprint. The best companies I've interviewed have quarterly technical planning sessions in which all engineering leaders participate, even sometimes Product. Engineering managers are tasked with bringing up large pieces of tech debt that their team leads have reported to them, and to make the business case for the ones they judge to be the most important.
The business case includes explicitly stating:
Again, this process which sounds laborious becomes very easy for Stepsize users. Their individual contributors have been reporting debt from the frontlines regularly, this data is consistently reviewed and groomed by each team and their leaders, who relay large pieces of debt along with the data necessary to understand what they've cost business to their engineering managers. The EMs can then use their understanding of the company's broader priorities and vision to prioritise large pieces of debt accordingly.
Once engineering leadership has approved each large piece of debt, they can be scheduled on to the roadmap just like feature work would. To make sure everything is going according to plan, engineering leadership can monitor progress on each tech debt project directly in Stepsize.
Let's go away the take-aways from this session. If you leave with anything, I want you to remember that tech debt is inevitable. It's not that anyone's doing a bad job, it's due to entropy in the codebase which is a law of the universe.
But importantly, you can use tech debt to gain extra leverage and beat your competitors.
However, if you don't manage tech debt carefully, it'll come back to bite you.
And the best way to manage tech debt properly is to create an engineering culture of ownership, include tech debt in your agile processes, and use Stepsize.