All Testing is (not) Confirmatory

In a recent blog post, Rahul Verma suggests that all testing is confirmatory.

First, I applaud his writing of an exploratory essay. I also welcome and appreciate critique of the testing vs. checking idea. I don’t agree with his conclusions, but maybe in the long run we can work something out.

In mythology, there was a fellow called Procrustes, an ironmonger. He had a iron bed which he claimed fit anyone perfectly. He accomplished a perfect fit by violently lengthening or shortening the guest. I think that, to some degree, Rahul is putting the idea of confirmation into Procrustes’ bed.

He cites the cites the Oxford Online Dictionary definition of confirm: (verb) establish the truth or correctness of (something previously believed or suspected to be the case). (Rahul doesn’t cite the usage notes, which show some different senses of the word.)

When I describe a certain approach to testing as “confirmatory” in my discussion of testing vs. checking, I’m not trying to introduce another term. Instead, I’m using an ordinary English adjective to identify an approach or a mindset to testing. My emphasis is twofold: 1) not on the role of confirmation in test results, but rather on the role of confirmation in test design; and 2) on a key word in the definition Rahul cites, “previously“.

A confirmatory mindset would steer the tester towards designing a test based on a particular and  specific hypothesis. A tester working in a confirmatory way would be oriented towards saying, “Someone or something has told me that the product should do be able to do X. My test will demonstrate that it can do X.” Upon the execution of the (passing) test, the tester would say “See? The product can do X.” Such tests are aimed in the direction of showing that the product can work.

Someone working from an exploratory or investigative mindset would have a different, broader, more open-ended mission. “Someone or something has told me that the product does X. What are the extents and limitations of what we think of as X? What are the implications of doing X? What essential component of X might we have missed in our thinking about previous tests? What else happens when I ask the product to do X? Can I make the product do P, by asking it to do X in a slightly different way? What haven’t I noticed? What could I learn from the test that I’ve just executed?

Upon performing the test, the tester would report on whatever interesting information she might have discovered, which might include a pass or fail component, but might not. Exploratory tests are aimed at learning something about the product, how it can work, how it might work, and how it might not work; or if you like, on “if it will work”, rather than “that it can work”.

To those who would reasonably object: yes, yes, no test ever shows that a product will work in all circumstances. But the focus here is on learning something novel, often focusing on robustness and adaptability. In this mindset, we’re typically seeking to find out how the program deals with whatever we throw at it, rather than on demonstrating that it can hit a pitch in the centre of the strike zone.

I believe that, in his post, Rahul is focused on the evaluation of the test, rather than on test design. That’s different from what I’m on about. He puts confirmation squarely into result interpretation, defining the confirmation step as “a decision (on) whether the test passed or failed or needs further investigation, based on observations made on the system as a result of the interaction. The observations are compared against the assumption(s).”

I don’t think of that as confirmation (“establishing the truth or correctness of something previously believed or suspected to be the case”). I think of that as application of an oracle; as a comparison of the observed behaviour with a principle or mechanism that would allow us to recognize a problem. In the absence of any countervailing reason for it to be otherwise, we expect a product to be consistent with its history; with an image that someone wants to project; with comparable products; with specific claims; with reasonable user expectations; with the explicit or implicit purpose of the product; with itself in any set of observable aspects; and with relevant standards, statutes, regulations, or laws.

(These heuristics, with an example of how they can be applied in an exploratory way, are listed as the HICCUPP heuristics here. It’s now “HICCUPPS”; we recognized the “Standards and Statutes” oracle after the article was written.)(Update, 2015-11-17: For a while now, it’s been FEW HICCUPPS.)

At best, your starting hypothesis determines whether applying an oracle suggests confirmation. If your hypothesis is that the product works—that is, that the product behaves in a manner consistent with the oracle heuristics—then your approach might be described as confirmatory.

Yet the confirmatory mindset has been identified in both general psychological literature and testing literature as highly problematic. Klayman and Ha point out in their 1987 paper Confirmation, Disconfirmation, and Information in Hypothesis Testing that “In rule discovery, the positive test strategy leads to the predominant use of positive hypothesis tests, in other words, a tendency to test cases you think will have the target property.” For software testing, this tendency (a form of confirmation bias) is dangerous because of the influence it has on your selection of tests.

If you want to find problems, it’s important to take a disconfirmatory strategy—one that includes tests of conditions outside the space of the hypothesis that program works. “For example, when dealing with a major communicable disease (or software bugs —MB), it is more serious to allow a true case to go undiagnosed and untreated than it is to mistakenly treat someone.” Here, Klayman and Ha point out, if we want to prevent disease, the emphasis should be on tests that are outside of those that would exemplify a desired attribute (like good health). In the medical case, they say that would involve “examining people who test negative for the disease, to find any missed cases, because they reveal potential false negatives.”

In software testing, the object would be to run tests that challenge the idea that the test should pass. This is consistent with Myers’ analysis in The Art of Software Testing (which, interestingly, as it was written in 1979, predates Klayman and Ha’s paper).

As I see it, if we’re testing the product (rather than, say, demonstrating it), we’re not looking for confirmation of the idea that it works; we’re seeking to disconfirm the idea that it works. Or, as James Bach might put it, we’re in the illusion demolition business.

One other point: Rahul suggests “Testing should be considered complete for a given interaction only when the result of confirmation in terms of pass or fail is available.” To me, that’s checking. A test should reveal information, but it does not have to pass or fail. For example, I might test a competitive product to discover the features that it offers; such tests don’t have a pass or fail component to them.

A tester might be asked to compare a current product with a past version to look for differences between the two. A tester might be asked to use a product and describe her experience with it, such that there’s an evaluation with explicit, atomic pass or fail criteria. “Pass and fail” are highly limiting in terms of our view of the product: I’m sure that the arrival of yet another damned security message on Windows Vista was deemed as a pass in the suite of automated checks that got run on the system every night. But in terms of my happiness with the product, it’s a grinding and repeated failure.

I think Rahul’s notion that a test must pass or fail is confused with the idea that a test should involve the application of a stopping heuristic.  For a check, “pass or fail” is essential, since a check relies on the non-sapient application of a decision rule.  For a test, pass-vs.-fail might an example of the “mission accomplished” stopping heuristic, but there are plenty of other conditions that we might use to trigger the end of a test.

Since Rahul appears to be a performance tester, perhaps he’ll relate to this example (the framing of which I owe to the work of Cem Kaner). Imagine a system that has an explicit requirement to handle 100,000 transactions per minute. We have two performance testing questions that we’d probably like to address.

One is the load testing question: “Can this system in fact handle 100,000 transactions per minute?” To me, that kind of question often gets addressed with a confirmatory mindset. The tester forms a hypothesis that the system does handle 100,000 transactions per minute; he sets up some automation to pump 100,000 transactions per minute through the system; and if the system stays up and exhibits no other problems, he asserts that the test passes.

The other performance question is a stress testing question: “In what circumstances will the system be unable to handle a given load, and fail?” For that we design a different kind of experiment. We have a hypothesis that the system will fail eventually as we ramp up the number of transactions. But we don’t know how many transactions will trigger the failure, nor do we know the part of the system in which the failure will occur, nor do we know way in which the failure will manifest itself.  We want to know those things, so have a different information objective here than for the load test, and we have a mission that can’t be handled by a check.

In the latter test, there is a confirmatory dimension if you’re willing to look hard enough for it. We “confirm” our hypothesis that, given heavy enough stress, the system will exhibit some problem. When we apply an oracle that exposes a failure like a crash, maybe one could say that we “confirm” that the the crash is a problem, or that behaviour we consider to be bad is bad. Even in the former test, we could flip the hypothesis, and suggest that we’re seeking to confirm the hypothesis that the program doesn’t support a load of 100,000 transactions per minute . If Rahul wants to do that, he’s welcome to do so. To me, though, labelling all that stuff as “confirmatory” testing reminds me of Procrustes.

6 replies to “All Testing is (not) Confirmatory”

  1. Isn’t it strange how a post can have completely different results to the ones you expect. After reading this post, and the original article by Rahul, I have probably come to some conclusions about confirmatory testing. But more importantly, I think I can finally see the key distinctions between testing and checking you were explaining before. I simply couldn’t seem get it with the original post – even though I though I had.

    Thanks, Callum. I’m delighted that you’re seeing the distinction.

    You raise a particularly interesting point: lots of stuff doesn’t make sense for people on the first go-round. Or the second, or the third—there were lots of comments and follow-ups on the orginal post (it’s been almost a year to the day since I published the first blog post on testing vs. checking, and I’m still getting comments). How is it that people do come to understand? One important route is via the dialectic, a new form of argumentation and reasoning developed through conversation (well, new since at least, uh, Plato). Someone says something; someone else disagrees; and by rational argument they work things out. Even if the parties don’t come to an agreement, the conversation is often helpful to those watching or listening. The dialogue also helps the contending parties to sharpen their points of view and their explanations. That’s why it’s important for as many of us as possible to join The Great Conversation of Testing, and that’s why I value responses even from people with whom I disagree. So, once again, thank you, Rahul!

  2. @Michael, you’ve used sort of verity to demonstrate the distinction between Testing and Checking appositely, also used interesting and relevant examples. Thanks for your great effort in reply to Rahul’s critique which certainly would be more comprehensible to many testers, also thanks to Rahul for his heedful post which give us opportunity to learn/know more clarification on the subject matter.

    – Selim


Leave a Comment