Our entire business at Stepsize is to empower the best software engineering teams in the world to measure, prioritise, and address technical debt objectively and reliably. Today, we’re sharing with you some of the best, tried and tested, scientific methods to do this. You too will be able to get quantitative data to back up your intuitions about technical debt in your company’s codebase.
High growth software companies invariably take on technical debt. It’s a truly effective tool to move fast to deepen your understanding of customers and the problems they face so that you can develop the perfect solution—your software.
However, this inevitably comes at a cost. Once your company reaches ‘product-market fit’, technical debt needs to be managed carefully if you want your users to keep buying and using your software, and your company to grow. As Paul Graham explained in one of his many famous essays: ‘Startup = growth’. If you don’t manage technical debt, you’ll go technically bankrupt, and won’t be able to grow anymore. In the words of Peter Drucker, ‘What gets measured, gets managed’.
Long story short: measure technical debt, so you can manage it, and allow your company to keep growing.
Simple, right? Except that measuring technical debt was never an easy thing. We’re cracking this nut at Stepsize, and want to share the findings of the best researchers in the field with you so that you too can put them into practice at your company.
To effectively measure technical debt, you need to measure 3 main metrics: ownership, cohesion, and churn. Let's look at how to do that.
We wrote about this fuzzy cultural attribute that is ‘ownership’ and how it can be measured from Git data in How to stop wasting engineering time on technical debt. Bottom line: ownership is a leading indicator of engineering health.
The parts of the codebase receiving contributions from many people accumulate cruft over time, while those receiving contributions from fewer people tend to be in a better state. It's easier to maintain high standards in a tight group that is well-informed about their part of the codebase.
This provides you with some predictive power: weakly owned parts of the codebase are likely to accumulate debt over time and become increasingly hard to work with. In particular, it's likely for debt to be unintentionally taken on, simply as a side-effect of incomplete information and diluted ownership of the code's quality.
You can compute code ownership by looking at the Git blame data of the current revision, the historical activity of each file, and blending the two numbers while applying a time discounting factor to favour recent activity.
We can do so at the level of teams or individual contributors, and can aggregate code ownership scores over any subset of your codebase.
We’ve found that a good sweet spot for high growth software companies is to call contributors with more than 5% of ownership ‘major contributors’ and all other contributors ‘minor contributors’.
You can read the full paper here: Don’t Touch My Code! Examining the Effects of Ownership on Software Quality — Microsoft Research, but I thought you’d appreciate a summary.
Microsoft explored two hypotheses on the Windows Vista & Windows 7 codebases:
Both hypotheses were valid for the two codebases.
While both minor contributor numbers and ownership levels had statistically significant impacts, the number of minor contributors had the largest impact.
Based on our experience implementing this with some of the best engineering teams out there, we recommend making strong ownership the default and asking yourself where exceptions should be made.
In certain cases, it may be that your product is changing so much that shared ownership should be preferred to optimise for change and flexibly handling uncertainty. This may apply to parts of your codebase relating to experimental features you are testing.
In most cases, however, where uncertainty is not high and requirements don't change regularly, strong ownership is likely to yield higher quality software with low amounts of technical debt taken on unintentionally. This would apply to the core parts of your codebase that power your most successful product that took you this far.
Identify the minor contributors who aren't part of the group who should own that code and try to minimise their contributions. Discuss this data with the team: maybe their contributions are a symptom of bad architecture; or maybe they're a symptom of a communication breakdown.
This will help you maintain code quality. Minor contributors often aren’t knowledgeable about the code they’re modifying and are therefore more likely to make mistakes. A major contributor reviewing their code will allow you to catch these mistakes before they make it to production.
Confirm whether your codebase domains are owned by the right people/teams and that the strength of ownership is satisfactory. Identify weakly owned domains, define who should own them, and plan how to increase their ownership level going forward (find out how to do this in our article on creating an engineering culture of ownership).
For example, you might not expect your platform team to be making minor contributions to code relating to payments, invoicing, and billing.
Cohesion is a trailing indicator of well-defined components. It will help you assess whether your current code architecture makes sense, and what to do about it if it doesn’t.
Cohesion and its counterpart, coupling, have long been recognised as important concepts to focus on when designing software.
Code is said to have high cohesion when most of its elements belong together. High cohesion is generally preferable because it's associated with maintainability, reusability, and robustness. High cohesion and loose coupling tend to go hand in hand.
Beyond being associated with more reusable and maintainable code, high cohesion also minimises the number of people who need to be involved in modifying a given part of the codebase, which increases productivity.
Measuring cohesion, as originally defined, is extremely challenging in polyglot codebases with different programming paradigms.
Instead, you can measure the cohesion of developer activity in Git—i.e., whether modifications to components are isolated or accompanied by changes to other components. Isolated activity indicates high cohesion and loose coupling and vice versa.
A given commit is defined as cohesive relative to a given path if all its modifications are to files "inside" this path, otherwise it's a non-cohesive commit. Cohesion for the given path is then derived as the ratio of cohesive commits to the total number of relevant commits.
You can read the full paper here: On the Relationship between Program Evolution and Fault-Proneness: An Empirical Study — Fehmi Jaafar et al.
This study examined the relationship between program evolution and the distribution of defects.
They looked at Object Oriented codebases and separated classes into two groups: those that evolved independently, and those that co-evolved together with others. This is very similar to our cohesion of activity metric.
They found that co-evolved classes (i.e. those with low cohesion of activity) are linked to significantly higher numbers of defects than classes that evolved independently.
High cohesion and loose coupling should be a goal for components of any well-architected system, and we recommend tracking it for every component that makes it past prototyping stages.
The degree to which you try to optimise cohesion is up to you. Some instances of coupling might be tolerable while others are suspect and likely the source of defects and team inefficiencies. But just being aware of coupling will empower you to make better planning decisions.
Increasing cohesion indicates that modifications to your component are increasingly isolated from changes to other components. That’s a good thing, and something to shoot for.
For a given component, understand which other components are coupled to it, how strongly, and find all the coupled activity to try to diagnose the root cause. For example, you might find out that a specific part of your notifications stack is coupled to a component of the message queue it shouldn’t know about. This will guide your refactoring efforts.
Churn (repeated activity) helps identify and rank areas ripe for refactoring in a growing system.
As systems grow, it becomes harder for your engineers to understand your architecture. If engineers have to modify many parts of your codebase to deliver a new feature, it will be difficult for them to avoid introducing side-effects leading to bugs, and they will be less productive because they need to familiarise themselves with more elements and concepts.
This is why it's important to strive for single responsibility to create a more stable system and avoid unintended consequences. While some files are architectural hubs and remain active as new features are added, it's a good idea to write code in a way that brings closure to files, and rigorously review, test, and QA churning areas.
Churn surfaces these active files so you can decide whether they should be broken down to reduce the surface area of change in your codebase.
A path that hasn't been modified in the past month is considered "stable".
A path that has been modified at least twice in the past month is considered "active".
An active path that has also been active for the previous n months is considered "recurrently active" (n is configurable).
Your recurrently active files will be responsible for most of the bugs in your system.
You can read the whole paper here: Active files as a measure of software maintainability — Microsoft Research. Or check out the summary article.
Microsoft conducted a study of six large software systems within Microsoft (ranging from products to services) and found that, while active files only make up 2-8% of the total systems, they are responsible for 60-90% of all defects.
This provides a very clear direction for QA, testing, and refactoring.
In a growing system, churn will help you identify the pieces of technical debt that are most pressing to pay off to preserve the team's productivity and avoid unintended consequences.
In a maturing or mature system, churn will help you identify the components to break down to minimise the surface area of change and optimise for stability.
In a legacy system, churn will help you identify the components that remain active so you can plan how to phase them out.
Examine the top churning files for your whole system as well as individual components to determine whether the activity is expected and desirable or whether it's a symptom of debt that should be paid back to preserve productivity and avoid unintended consequences.
Ensure all changes to churning files are reviewed since they are so strongly linked to defects.
If you're dealing with a growing system, look to maintain this metric to an appropriate level for your various components to ensure sane surface area of change. If you're dealing with a mature or legacy system, look to minimise this metric at the system level to trend towards stability.
Whether you wish to write your own code to calculate these metrics or use the Stepsize VSCode and JetBrains extensions, these 3 metrics will help back up your team’s intuition about where the most pressing tech debt lies and what to do about it.
Start prioritising tech debt today.