In the last post in this series, I noted some potentially useful structual similarities between bug reports (whether oral or written) and newspaper reports. This time, I’ll delve into that a little more.
To our clients, investigative problem reports are usually the most important part of the product story. The most respected newspapers don’t earn their reputations by reprinting press releases; they earn their reputations through investigative journalism. As testers (or, heaven help us, quality assurance) people, we tend to be chartered to look for problems, and to investigate them in ways that are most helpful to programmers, managers, and our other clients. A failing test on its own tells us little, and a failing check even less; as I pointed out here, a failing test is only an allegation of a problem. Investigation and study of a failing test is likely to inform us of something more useful: whether someone will perceive a problem that threatens the value of the product. I’ll talk more about the nature of problems in a later post, but for now, think of a product problem in terms of a perceived absence of or threat to some dimension of quality. (See the Heuristic Test Strategy Model for one list of quality criteria; see Software Quality Characteristics, by Rikard Edgren, Henrik Emilsson and Martin Jansson for another.) Since the manager’s goal is generally to release a product at her desired level of quality, problems that could threaten that goal are likely to be interesting and important. Or, as they say in the newspaper business, “if it bleeds, it leads”.
Potential showstoppers are usually the most important stories. In the 1990s, I was a technical support person, a tester, a program manager, and a programmer for a mass-market, commercial shrink-wrap software company. Since we had millions of customers, even minor problems could have a big impact on technical support and on the reputation of our products in the market. The market was enormous, hardware and software were even less standardized than they are now, and we worked under a great deal of time pressure. Classifying and prioritizing problems was contentious. One of the important classification questions was “What should we consider a showstopper?” One of the senior programmers came up with an answer that I’ve used ever since:
Showstopper (n.): Something that makes more sense to fix than to ship.
(I talked about showstoppers here.) In a development project, a showstopper—any threat to the timely release of the project—is a page-one, above-the-fold story, a story that you can see and begin to read without opening the newspaper or picking it up.
There’s always one story that leads. The most important threat to a timely, successful release may be a single problem, or it may be a collection of problems—what Ian Mitroff calls a mess. Do we have a problem, or a couple of problems, or a mess? No matter what the answer, there’s only so much space on the front page above the fold. Will you have one headline, or two, or three? What will that headline say? What will the lead paragraph of each story look like? Does the lead paragraph cover the five Ws—who, what, where, when, and why? If not, are those questions answered shortly? Might there be a reasonable reason not to answer them?
There’s only one front page, and there’s almost always more than one story on it. Our clients need to be able to absorb the lead story and the other front-page stories quickly, so we need to be able to provide headlines, lead paragraphs, and details in appropriate proportions. See an example front page here, with details that follow.
Very infrequently, serious newspapers give their entire front page to a story. In those cases, it’s usually an overwhelmingly important story, or one that threatens the newspaper or journalism itself.
The most compelling stories are those that have an impact on people. Although product problems are often technical in nature, the “making sense” part of the showstopper decision is focused on the business. Testers must to be able to connect technical problems with business risk. Problems related to technical correctness are often easy to describe, but they might not be important. The skill of bug advocacy—making sure that the customer is aware of the best possible motivations for fixing the bug—depends on your ability to report the bug in terms of its most significant effect on the business. Ben Simo has a lovely way to sum this up. Early in his career, when Ben was trying to advocate a bug fix, his project manager said, “Revenue is king. Liability is queen. Tell me how this bug impacts them.”
The number of stories usually isn’t as important as the significance of the stories. This is another way in which test reports can be like newspapers. We don’t usually evaluate the quality of a newspaper by the number of stories in it. Instead, we look at the significance, relevance, and credibility of the stories.
It may take time to distinguish between a breaking story and a major story. Sometimes the news cycle doesn’t afford time for investigation, even though the story might be important. Information gets passed around the project at various moments during the test and development cycle. Sometimes a discovery happens just before a meeting. Smart reporters know to balance urgency and restraint when there’s a breaking story. When I worked in commercial mass-market software in the 1990s, we sometimes discovered a terrible-looking problem a couple of hours before release. Such discoveries would trigger arousal (no, not sexual friskiness, but arousal in the psychological sense of being suddenly snapped awake and alert to danger). All of a sudden, we’d be noticing all kinds of things that we hadn’t noticed before, and most of them were non-problems of one kind or another. We were biased by fear. We called it the “snakes on everything” moment. When reporting, testers need to take stock of the emotional factors surrounding them, and report cautiously and accurate. An hour from now, an allegation or a rumour might be an important story—or it might be nothing.
Non-problems aren’t news. There’s a pattern of stories in the first section of the newspaper: they’re mostly stories about problems, and there’s a reason for that: problems compel attention. Our emotional systems evolved to help keep us out of trouble. Problems or threats trigger arousal. Things that are going well are nice to hear about, but they don’t engage emotions in the same way as problems do. In a software development project, non-problems have relatively little significance for project managers. Routine daily successes don’t threaten the project, and therefore need less attention.
Numbers, like pictures, are illustrations, not the whole story. A qualitative report is not quantity-free; after all, identifying the presence or absence of something involves counting to one, and the degree of some attribute of interest can be illustrated by number. But just as a pictorial illustration isn’t the item it depicts, a numerical illustration isn’t the story it might help to describe. A picture looks a part of a scene through a particular lens; a number focuses on one attribute using a particular metric. Each one may emphasizes some observations at the expense of other observations. Each one may crop out detail. Each one may magnify or distort.
Since the product and testing stories are multi-dimensional, be prepared to show the dimensions. Newspapers reports always have a bias, but reporters and editors often attempt to manage the bias by providing alternative sources of information, and alternative interpretations. A story of any length often includes multiple stories, or multiple threads of the main story. When tables of data are appropriate, newspapers print tables (think stock quotes in the business section, or box scores or line scores in sports). Products, coverage, quality, and problems are all multi-dimensional, multi-variate, and qualitative. Where there’s a mass of data, consider using tables such as dashboards or coverage tables. Pin numbers to reliable measurements (see the slip charts, the detailed impact case methods, and the subjective impact methods in Weinberg’s Quality Software Management, Volume 2: First Order Measurement; and pay attention to validity—see Kirk and Miller’s Reliability and Validity in Qualitative Research and Shadish, Cook, and Campbell’s Experimental and Quasi-Experimental Designs for Generalized Causal Inference).
Describe your coverage. Boris Beizer described coverage as “any metric of test completeness with respect to a test selection criterion”. That suggests that it is possible to quantify coverage if you have a quantifiable test selection criterion. For example, if a single-digit field accepts any digit from 0 to 9, one could select 10 tests and claim complete coverage based on that criterion. Mind, that data coverage doesn’t account for flow or sequence coverage; suppose that a bug was triggered only when a 7 replaced a 3 in that field. Since the overall number of possible tests is infinite, test selection criteria are based on models. In practical terms, this means that overall test coverage is some finite number over an infinite number. If you report that accurately, you’re stuck with a number that remains asymptotically close to zero. Instead, focus on the qualitative, and describe your coverage on an ordinal scale. Level 0 means “We know nothing about this area of the product.” Use Level 1 to say “We have done smoke or sanity testing; at this point, we’ve determined whether the product is even stable enough for serious testing.” Level 2 means “we’ve tested the common, the core, the critical, the happy path; our testing has been focused on can it work.” Level 3 means “We’ve tested the harsh, the complex, the challenging, the extreme, the exceptional; if there were a serious problem in this area, we’d probably know about it by now.” In this system, the numbers are barely more than labels for a qualitative evaluation, so don’t be tempted to do serious math with them.