Over the years, I can remember working with exactly one organization that used my idea of an excellent approach to software engineering metrics. Their approach was based on several points:
- They treated metrics as first-order approximations, and recognized that they were fundamentally limited and fallible.
- They used the metrics for estimating, rather than for predicting. When their estimations didn’t work out, they didn’t use the discrepancy to punish people. They used it to try to understand what they hadn’t understood about the task in the first place.
- They used inquiry metrics, rather than control metrics. That is, they used the metrics to prompt questions about their assumptions, rather than to provide answers or drive their work.
- They used a large number of observational modes to manage their business and to evaluate (and solve) their problems. Most importantly, the managers observed people and what they did, rather than watching printed reports. They used close personal supervision, collaboration, and conversation as their primary approach to learning about what was happening on the project. They watched the game, rather than the box scores.
- They didn’t collect any metrics on things that weren’t interesting and important to them.
- They didn’t waste time collecting or managing the metrics.
- They had no interest in making the metrics look good. They were interested in optimizing the quality of the work, not in the appearance afforded by the metrics.
- They took a social sciences approach to measurement, as Cem Kaner describes the social sciences here (in particular on page 3 of the slides). Rather than assuming that metrics gave them complete and accurate answers, they assumed that the metrics were giving them partial answers that might be useful.
In summary, they viewed metrics in the same kind of way as excellent testers view testing: with skepticism (that is, not rejecting belief but rejecting certainty), with open-mindedness, and with awareness of the capacity to be fooled. Their metrics were (are) heuristics, which they used in combination with dozens of other heuristics to help in observing and managing their projects.
The software development and testing business seems to have a very poor understanding of measurement theory and metrics-related pitfalls, so conversations about metrics are often frustrating for me. People assume that I don’t like measurement of any kind. Not true; the issue is that I don’t like bogus measurement, and there’s an overwhelming amount of it out there.
So, to move the conversation along, I’ll suggest that anyone who wants to have a reasonable discussion with me on metrics should read and reflect deeply upon
Software Engineering Metrics: What Do They Measure and How Do We Know (Kaner and Bond)
and then explain how their metrics don’t run afoul of the problems very clearly identified in the paper. It’s not a long paper. It’s written by academics but, mirabile dictu, it’s as clear and readable as a newspaper article (for example, it doesn’t use pompous Latin expressions like mirabile dictu).
Here are some more important references:
- The Dark Side of Software Metrics (.pdf, Hoffman)
- Meaningful Metrics (.pdf, Allison)
- How to Lie With Statistics (book, Huff)
- Measuring and Managing Performance in Organizations (book, Austin)
- Quality Software Management, Vol. 2: First Order Metrics (book, Weinberg)
- Why Does Software Cost So Much? (book, deMarco)
Show me metrics that have been thoughtfully conceived, reliably obtained, carefully and critically reviewed, and that avoid the problems identified in these works, and I’ll buy into the metrics. Otherwise I’ll point out the risks, or recommend that they be trashed. As James Bach says, “Helping to mislead our clients is not a service that we offer.”
Update: I’ve just noticed that this blog post doesn’t refer to my own Better Software columns on metrics, which were published later in 2009.
Three Kinds of Measurement (And Two Ways to Use Them)
Better Software, Vol. 11, No. 5, July 2009
How do we know what’s going on? We measure. Are software development and testing sciences, subject to the same kind of quantitative measurement that we use in physics? If not, what kinds of measurements should we use? How could we think more usefully about measurement to get maximum value with a minimum of fuss? One thing is for sure: we waste time and effort when we try to obtain six-decimal-place answers to whole-number questions. Unquantifiable doesn’t mean unmeasurable. We measure constantly WITHOUT resorting to numbers. Goldilocks did it.
Issues About Metrics About Bugs
Better Software, Vol. 11, No. 4, May 2009
Managers often use metrics to help make decisions about the state of the product or the quality of the work done by the test group. Yet measurements derived from bug counts can be highly misleading because a “bug” isn’t a tangible, countable thing; it’s a label for some aspect of some relationship between some person and some product, and it’s influenced by when and how we count… and by who is doing the counting.
Thanks for a wonderful, thoughtful look at metrics. I wish more people in our industry had such great understanding.
While I agree with the main position here, I disagree about one perspective. There are no bogus measurements, only bogus interpretation of those measurements. We humans have a tendency to force meaning on everything, including measurements, where they do not exist. I do agree if a metric does not reveal interesting information it should be set aside. I also agree that we should not spend too much time gathering statistics. Measurements should be a way to point toward areas of interest and allow us to form ideas about our area of study.
My employer recently started a Quality program here. I will be sharing this article with many of my colleagues.
great post about metrics. Everyone fears misapplied metrics, but we learn it’s acceptable to do so, from political polls to clinical trials. We should take a cue from these. Metrics used this way are tools of persuasion, not information.
However, properly applied metrics (as you said, for inquiry) can be useful. The most important rule of using metrics is that the benefit gained be greater than the effort to obtain them. And that doesn’t mean elevating the perceived importance of the metrics, as is so often the case, especially if they have pretty charts and animated powerpoint, that besides the presentation time, takes an hour away from each worker.
Useful metrics cannot intrude, or else there’s also the chance of an uncertainty prinicpal that the observation influences the outcome, whether it be through lost productivity, morale, or gaming the system.
[…] This post was mentioned on Twitter by Albert Gareev, Lynn McKee. Lynn McKee said: A /must/ read on #metrics by @michaelbolton "Meaningful Metrics" http://bit.ly/hn8N9j (Hours of /re/reading with many references) #testing […]
Metrics are like guns in the hands of the untrained, I don’t fear metrics alone anymore. I say teach people to use safely use them.
Michael replies: A good idea. Be careful, though, that they don’t get left lying around where kids, crooks, the insane or the untrained can get at them. 🙂
[…] Use measurements to illuminate, not as a goal. As Michael Bolton says, good metrics allow you to ask better questions. They don’t answer them. The […]
[…] examples, but there are others too — whether testers should learn to code or not, the use and misuse of metrics, the advantages of one tool over another, the value of certifications, scripted test cases and test […]
[…] I think we need to stop playing these games. These numbers should be used in the right context and should be made meaningful. […]
[…] have dug into the validity of metrics in great detail previously, and I don’t want to get sidetracked into (just) validity. We will get to […]