Pass vs. Fail vs. Is There a Problem Here?

A test, for the purposes of this discussion, is at its core a process of exploration. Initially, our community described exploratory testing as “simultaneous test design, test execution, and learning.” Later descriptions included “simultaneous test design, test execution, and learning, with an emphasis on learning“, “a parallel process of test design, test execution, test result interpretation, and learning, with an emphasis on learning”. At the Workshop on Heuristic and Exploratory Techniques in Palm Bay, Florida, 2006, we had a bunch of conversations (one might call them transpection) that ended up with Cem Kaner synthesizing this lengthy but very thorough description.

“Exploratory testing is a style of testing that emphasizes the personal freedom and responsibility of the individual tester to optimize the quality of his or her work by treating test design, test execution, test result interpretation, and learning as mutually supporting activities that continue in parallel throughout the project.”

A little unwieldy for conversation over drinks around the pool, but when I want to be really explicit about exploratory testing, that’s the description I turn to. When I want to be quick, I’ll say “parallel test design, test execution, result interpretation, and learning”.

Why mention this? Because I would argue that, at its core, all testing is exploratory in nature (apparently Boris Beizer would agree, according to an earlier comment), but checking (again, at least for the purposes of this discussion) isn’t. As James pointed out in our conversation, checking does require some element of testing; to create a check requires an act of test design, and to act upon the result of a check requires test result interpretation and learning.

This helps, I think, to distinguish testing and checking. Some person (a programmer, a tester, or anyone else for that matter) is required to create a check. Creating a check is a sapient process; it is an act of test design, and it requires a human to do it.

After someone designs and instantiates a check, someone or something else may perform the check, making an observation and applying a decision rule. If it’s a check, the rule has a binary answer that is closed and machine-decidable. The decision may or may not be made by a machine, but the important thing is that it could be; by definition, a check is a non-sapient process, and a non-sapient process, also by definition, is one that doesn’t require a human.

So the outcome of a check is simply a pass or fail result; the outcome doesn’t require human interpretation. I’ve seen organizations in which some programmer, business analyst, or test designer creates a list of checks for people to execute. The list often takes the form of a suite of assertions in a test tool or a spreadsheet, and there’s a checkbox in a form or a column in the spreadsheet headed “pass or fail”. And I’ve seen this many times, most commonly in certain kinds of outsourced testing. Taken to the extreme (and I have seen it taken to the extreme), it’s an immense waste of human capabilities. Humans are capable of providing vastly more information than a handful of check results can provide. In particular, they can see things around the check that a non-sapient agent would miss.

What happens after a check? If we want to interpret the result—that is, to determine its meaning and significance—a human is required again. That too is an act of testing, but note that it’s possible to perform a check without interpreting the result. This happens in at least two cases: 1) When we give a script to a machine to execute, or 2) when we give a script to a human to execute and when our requirements for the performance of the script emphasize pass-or-fail decisions, rather than the question, “Is there a problem here?”

If there is a problem, it takes sapience to recognize it and sapient acts to address it.

Some organizations emphasize checking vastly more than they emphasize testing. IMVU, at least at the time James Bach and I poked around it, seemed to be that kind of organization (or at least Timothy Fitz’s description of what he found cool about IMVU emphasized that aspect of it). Note that lots of checking is usually a good thing; it’s handy for finding regression problems, for example. Even exclusively checking is not intrinsically a bad thing to do. Right or wrong, good or bad, sufficient or insufficient are value judgements, and they’re up to the person making the decision. From my perspective, though, the problem with a check-focused approach to testing is that it becomes very easy to be satisfied by checks that pass. In the past, I’ve referred to this as the narcotic comfort of the green bar. In other forms, it’s sometimes also called the pesticde paradox, which Beizer named.

For many kinds of checks, a machine might be a much, much better choice than a human. A complex arithmetic calculation is a case in point; give me an algorithm and a machine and a conditional statement for that. Many test ideas can be designed and expressed as a set of machine-decidable checks, and that may be just the ticket for exercising those ideas. High-volume, randomized testing is an exploratory practice enabled by squillions of actions that may include many simple checks; we’re seeking problems that aren’t in our current risk models. Stress testing happens when we throw a squillion actions (which may include many checks) against a system, supposing ahead of time that it will break but not being sure where. Some kinds of performance testing (which may include many checks) may be checking (when we’re seeking to confirm that a system behaves within expected parameters), or it may be testing (when our objective is to vary things and exercise new ideas).

But in other contexts, things get dicier. For example, a set of checks that compare strings of text with a table of known words is faster than having a human read through a long document. Yet if a peace of text or a hole duck you meant whirr interpreted by dicked Asian soft wear, you’d probably want to test it, in addition to checking it, and if you wanted to evaluate the quality of the arguments in the document, checks wouldn’t really do the trick. More formally, checks may be adequate for syntactic evaluation, but semantic evaluation still needs humans to test. Notice that they’re called spelling checkers and grammar checkers, rather than spelling or grammar testers.

A development strategy that emphasizes checking at the expense of testing is one in which we’ll emphasize confirmation of existing knowledge over discovery of new knowledge. That might be okay. A few checks might constitute sufficient testing for some purpose; no testing and no checking at all might even be sufficient for some purposes. But the more we emphasize checking over testing, and the less testing we do generally, the more we leave ourselves vulnerable to the Black Swan.

See more on testing vs. checking.

6 replies to “Pass vs. Fail vs. Is There a Problem Here?”

Michael,
This seems to be a little bit of a re-hash of your Check vs Test blog a bit back and not what I expected from the title.
On the other hand I do find it a good continuation of your earlier post.

@Kevin

Glad you liked it and sorry you didn't like it.

—Michael B.

[…] http://www.developsense.com/blog/2009/08/testing-vs-checking/ http://www.developsense.com/blog/2009/09/transpection-and-three-elements-of/ http://www.developsense.com/blog/2009/09/pass-vs-fail-vs-is-there-problem-here/ […]

[…] together in a sentence. If you want to read further, Michael Bolton does a great job here and here. I make an attempt […]

[…] Pass vs. Fail vs. Is There a Problem Here – Interesting blog article on testing vs checking and the issues in ‘pass/fail’ type testing and reporting, from 2009 in Michael Bolton’s DevelopSense blog. […]

Hi Michael,

When we are automating some of the repetitive checks from the test design, the picking of these checks is a sapient process; but the automation itself is non-sapient. So, should we call it “Check Automation” instead of “Test Automation”.

Michael replies: We call it “automated checking”. You can read about that here. However, our colleague Pradeep Soundararajan offers another interesting alternative: “programmed checking”.

We’ve debugged our use of “sapience”, since it had bugs in it, and because it was bugging people. We still might talk about sapience, referring to human judgment and wisdom. But although checking is non-sapient, we don’t talk about “sapient testing” and “non-sapient testing” any more. For one thing, because it involves learning, assessing value, study, and a whole bunch of other stuff, testing can’t be non-sapient, even though checking is. Developing and interpreting checks is a sapient process, a testing process. We straightened that out here and here—we hope.

6 replies to “Pass vs. Fail vs. Is There a Problem Here?”

Leave a Comment Cancel reply