DevelopsenseLogo

Talking About Coverage

A while back I wrote a post on coverage. In fact, I’ve written a few posts about coverage, but I’m talking about this one. A question came up recently on LinkedIn that helped me to realize I had left out something important.

In that post, referring to coverage as “the proportion of the product that has been tested”, I said

A software product is not a static, tangible thing; it’s a set of relationships. What would 100% of a product, a set of relationships look like? That’s an important question, because unless we know what 100% looks like, the idea of “proportion” doesn’t carry much water.

And I also said

In the Rapid Software Testing namespace, when we’re talking about coverage generally,

Coverage is how thoroughly we have examined the product with respect to some model.

For some, this prompts the question “How do we measure coverage?” That amounts to the same thing as asking “How do we measure the thoroughness of our testing?”

The answer to that question starts simply: most of the time, we don’t. Most of the time we can’t measure coverage in an objective, valid, reliable way, because unlike length, or votes, or basketballs, coverage doesn’t come with a good unit of measurement. For instance:

  • Counting the number of test cases doesn’t tell us about the conditions or factors or observations those test cases targeted.
  • Counting the lines of product code that have been exercised by some automated checks doesn’t tell us what those checks evaluated.
  • Counting the number of elements on a risk list doesn’t take the relative significance of those risks into account.
  • Making sure that there’s “one positive test case” and “one negative test case” “per requirement” not only begs the question of what is a positive test case and what is a negative test case; it also begs the question of how to put a unit of measurement on a requirement.

Sometimes we can count things that might have some bearing on coverage. “There are eight user roles available on this system. For how many of these have we performed testing that evaluates whether the appropriate permssions and restrictions are in place?” If the answer is less than eight, we know something about the incompleteness of our coverage. The number of roles that we have covered suggests something, perhaps, but it doesn’t tell us about deeply or thorougly we’ve examined them. It doesn’t tell us whether our testing has been simple or complex, sympathetic or harsh, trivial or important.

In particular, “percentage of test cases that have been automated” is a seriously empty kind of measurement. A “test case that has been automated” could refer to a single function call and a single assertion, or to dozens of function calls with thousands of assertions. An “automated test case” could refer to a simple unit check or a set of checks in a set of complex transactions representing some workflow. “83% of our examples are being checked by machinery; the rest are being done manually” begs all kinds of questions about the what’s in the examples. It also ignores the fact that human interaction with and observation of the product is profoundly different from an automated check of some output.

As James Bach points out, when someone says “I have 61 test cases”, they often mean that they have 61 entries in some test case management tool. That may have a certain kind of simplistic appeal to some managers because, as some testers say, management “wants to see numbers”.

I’m not so convinced that’s true. I believe it’s more likely that management wants to see. That is, they want to be able to observe and assess and reason. Helping them do that for things like coverage starts with summarizing our work, and our mental models and feelings about it, and making them legible.

Legibility is an important idea, with lots of associated bugs and features. For a bug-oriented perspective, have a look at Venkatesh Rao’s blog post on the subject, and James C. Scott’s mind-blowing book Seeing Like a State. Legibility is one reason that dopey test case management tools are so appealing to managers; the charts and graphs provide a reassuringly visible illusion of understanding. “You see?”

The trouble is that coverage, like quality, doesn’t yield well to the kind of measurement that Cem Kaner and Pat Bond talk about in this extremely important paper. “Measurement,” they say, “is the empirical, objective assignment of numbers, according to a rule derived from a model or theory, to attributes of objects or events with the intent of describing them.” That paper prompts key questions. For us here, these could include:

  • What is the model, the construct, that forms the basis of your measurement? (What is a test case, anyway? What might be more or less than one test case? What is a unit of coverage?)
  • What is the operation that you use to assign a number to the observation? (Do we count an assert statement in some test code, a line in an Excel sheet, or a paragraph in a requirements document and call it a day?)
  • How — and how well — does the number you’re using map the attribute you’re observing to the description you’re producing? (Is everything that we call a test case equivalent in terms of the thoroughness of our testing?)
  • What are the threats to the validity of that construct? (How might our ways of counting coverage not really represent the thoroughness of our testing, and thereby mislead us?)

Just as quality isn’t a property of your product, coverage isn’t a property of your testing. Quality and coverage are complex sets of relationships between products, people, models, observations, and evaluations. They’re subject to The Relative Rule. You can’t really measure these things in the same sense that you can measure distance, or time, or metres per second.

Yet a busy manager’s desire to model complex things in simpler terms isn’t exactly unreasonable. It would be well for us to help them do that in a way that is accurate and informative, rather than misleading and dangerous.

Like quality, you can’t measure coverage, but you can assess it and you can discuss it. How can you start that discussion in a reasonable and sensible way that avoids the pitfalls of construct validity?

Here’s one way: we might choose to model and represent coverage in ways that can be accurate without being precise. When we strive for accuracy and acknowledge imprecision, our story about accuracy might be challenged. That challenge prompts assessment, disccussion, and evaluation, which is exactly what management needs to make informed decisions about risk.

So: consider framing coverage of given models and areas of the product in terms of a nominal scale that has some aspects of an ordinal scale:

Level 0

We don’t know much about this area. We’re aware that this area exists, but it’s mostly a black box to us, so far. Whatever testing that’s been done, we don’t really trust.

Level 1

We’re just getting to know this area.  We’ve done basic reconnaissance; surveyed it; we’ve done some smoke and sanity testing. We may have some artifacts that represent our emerging models, which will help us to talk about them and go deeper. If the product were completely broken, we’d know.

Level 2

We’ve learned a good deal about this area. We’ve looked at the core and the critical aspects of it. We’re collecting and diversifying our ideas on how to cover it deeply. We’ve done some substantial testing focused on common usage patterns, the most significant suspected risks, and the most important quality criteria. Nonetheless, there are still some other ways we could look for trouble, or places where we have looked yet.

Level 3

We have a comprehensive understanding of this area. We’ve looked deeply into it from a number of perspectives, and applied a lot of different test techniques. We’ve done harsh, complex, and challenging tests on a wide variety of quality criteria. If there were a problem or unrecognized feature in this area that we didn’t know about, it would be a big surprise.  Moreover, any problem that escapes would present an opportunity for us to learn something deep and important (rather than being evidence of not trying very hard).

With that, you might be able to say “we know very little about this area of the product; we know a lot about that area”, and you might be able to say “we’ve covered this part of the product more thoroughly than that part“. And then let the discussion begin.

Update, 2023/02/27: This post has been translated into Portuguese by Rafael Bandeira.

3 replies to “Talking About Coverage”

    • You’re more than welcome to do so. Attribution appreciated. If you’d like to chat about it, I’m happy to help out. Cheers!

      Reply

Leave a Comment