Another day has dawned on Planet Earth, so another tester has used LinkedIn to ask about the difference between severity and priority.
The reason the tester is asking is, probably, that there’s a development project, and there’s probably a bug tracking system, and it probably contains fields for both severity and priority (and probably as numbers). The tester has probably been told to fill in each field as part of his bug report; and the tester probably hasn’t been told specifically what the fields mean—or the tester is probably uncertain about how the numbers map to reality.
“Severity” is the noun associated with the adjective, “severe”. In my Concise Oxford Dictionary, “severe” has six listed meanings. The most relevant one for this context is “serious, critical”. Severity, with respect to a problem, is basically how big a problem is; how much trouble it’s going to cause. If it’s a big problem, it gets marked as high severity (oddly, that’s typically a low number), and if it’s not a big deal, it gets marked as low severity, typically with a higher number. So, severity is a simple concept. Except…
When we’re testing, and we think we see a problem, we don’t see everything about that problem. We see what some people call a failure, a symptom. The symptom we observe may be a manifestation of a coding error, or of a design issue, or of a misunderstood or mis-specified requirement. We see a symptom; we don’t see the cause or the underlying fault, as the IEEE and others might call it.
Whatever we’re observing may be a terrible problem for some user or some customer somewhere—or the customer might not notice or care. Here’s an example: in Microsoft Word 2010’s Insert Page Number feature, choose small Roman numerals as your format, and use the value 32768 (rendered in Roman numerals). Word hangs on my machine, and on every machine I’ve tried this trick on (you can try it too). Now: is this a Severity 1 bug? It certainly appears to be severe, considering the symptom. A hang is a severe problem, in terms of reliability.
But wait… considering that vanishingly few people use lower-case Roman numeral page numbers larger than, say, a few hundred, is the problem really that severe? In terms of capability, it’s probably not a big deal; there’s a very low probability that any normal user would need to use that feature and would encounter the problem.
Except… considering the fact that a problem like this could—at least in theory—present an opportunity for a hacker to bring down an application or, worse, take control of a system, maybe this is a devastatingly severe problem.
There’s yet another factor to consider here. We all suffer to some degree from a bias that can play out in testing. This might be a form of representativeness bias, or of assimilation bias, or of correspondence bias, but none of these seems to be a perfect fit. I think of it as the Heartburn Heuristic, in honour of my dad: for a year or more, he perceived minor heartburn—a seemingly trivial symptom of a seemingly minor gastric reflux problem. What my (late) dad didn’t count on was that, from the symptoms, it’s hard to tell the difference between gastric reflux and esophageal cancer.
The Heartburn Heuristic is a reminder that it’s easy to believe—falsely—that a minor symptom is naturally associated with a minor problem. It’s similarly easy to believe that a serious problem will always be immediately and dramatically obvious. It’s also easy to believe that a problem that looks like big trouble is big trouble, even when a fast one-byte fix will make the problem go away forever.
We also become easily confused about the relationship between the prominence of the symptom, the impact on the customer, and the difficulty associated with fixing the problem, and the urgency of the fix relative to the urgency of releasing the product. (Look at the Challenger and Columbia incidents as canonical examples of how this plays out in engineering, emotions, and politics.) In reality, there’s no reason to believe in a strong correlation between the prominence of a problem and its severity, or the potential impact of a problem and the difficulty of a fix. A missing character in some visible field may be a design limitation or a display formatting bug, or it may be a sign of corruption in the database.
Of course, since we’re fallible human beings, looking for unknown problems in an infinite space with finite time to do it, the most severe problems in a product can escape our notice entirely. So based on the symptom alone, at best we can only guess at the severity of the problem. That’s bad enough, but the problem of classifying severity gets even worse.
Just as we have biases and cognitive shortcomings, other people on the project team will tend to have them too. The tester’s credibility may be called into question if she places a high severity number on what others consider to be a low severity problem. Severity, after all, is subject to the Relative Rule: severity is not an attribute of the problem, but a relationship between the problem and some person at some time.
To the end user who never uses the feature, the Roman numeral hang is not a big deal. To the end user who actually experiences a hang and possible loss of time or data, this could be a deeply annoying problem. To a programmer who takes great pride in his craft, a hang is a severe problem. To a programmer who is being evaluated on the number of Severity 1 problems in the product (a highly dubious way to measure the quality of a programmer’s work, but it happens), there is a strong motivation to make sure that the Roman numeral hang is classified as something other than a Severity 1 problem. To a program manager who has a few months of development time available before release, our Roman numeral problem might be a problem worth fixing. To a program manager who is facing a one-week deadline before the product has to ship (thanks to retail and stock market pressure), this is a trivial bug. (Trust me on that; I’ve been a program manager.)
In light of all this, what is a tester to do? My personal preference (based on experience as a tester, as a programmer, and as a program manager) is to encourage testers to stay out of the severity business if possible. By all means, I provide the project team with a clear description of the symptom, the quality criteria that could be threatened by it, and ideas on how the problem could have an effect on people who matter. I might provide a guess, based on inference, as to the underlying cause. I’ll be careful to frame it as a guess, unless I’ve seen the source code and understand the problem clearly.
My default assumption is that I can’t go by appearances, and that every symptom has an unknown cause with potentially harsh consequences. I assume that every problem is guilty until proven innocent—that it’s a potentially severe problem until the code has been examined, the risk models revisited, and the team consulted.
I’m especially wary of assigning a low severity on a bug report based on an apparently trivial symptom. If I haven’t seen the code, I try to avoid saying that something is a trivial problem; if pressed, I’ll say it looks like a trivial problem.
If I’m forced to enter a number into a bug reporting form, I’ll set the severity of a problem at its highest level unless I have substantial understanding and reason to see the problem as being insignificant. In order to avoid the political cost of seeming like a Cassandra, I’ll make sure my clients are aware of my fundamental uncertainty about severity: the best I can provide is a guess, and if I want to err, I’d rather err or the side of overestimating severity rather than underestimating it and thereby downplaying an important problem. As a solution that feels better to me, I might also request an “unclassified” option in the Severity field, so that I can move on quickly and leave the classification to the team, to the programmers and to the program managers.
As for priority: priority is the order in which someone wants things to be done. Perhaps some people use the priority field to rank the order in which particular problems should be discussed, but my experience is that, usually, “priority” is a tester’s assessment of how important it is to fix the problem—a kind of ranking of what should be fixed first.
Again based on my experience as tester, programmer, and program manager, I don’t see this as being a tester’s business at all. Deciding what should be done on a programming or business level is the job of the person with authority and responsibility over the work, in collaboration with the people who are actually doing the work. When I’m a tester, there is one exception: if I see a problem that is preventing me from doing further testing, I will request that the fix for that problem be fast-tracked (and I’ll outline the risks of not being able to test that area of the product). As tester, one of the most important aspects of my report is the set of things that make testing harder or slower, the things that give bugs more time and more opportunity to hide. Nonetheless, deciding what gets fixed first is for those who do the managing and the fixing.
In the end, I believe that decisions about severity and priority are business and management decisions. As testers, our role is to provide useful information to the decision-makers, but I believe we should let development managers manage development.