DevelopsenseLogo

“Merely” Checking or “Merely” Testing

The distinction between testing vs. checking got a big boost recently from James Bach at the Øredev conference in Malmö, Sweden. But a recent tweet by Brian Marick, and a recent conversation with a colleague have highlighted an issue that I should probably address.

My colleague suggested that somehow I may have underplayed the significance or importance or the worth of checking. Brian’s tweet said,

“I think the trendy distinction between “testing” and “checking” is a power play: which would you preface with “mere”? http://bit.ly/2Cuyj

As a consequence, I was worried that I had ever said “mere checking” or “merely checking” in one of my blog postings or on Twitter, so I researched it. Apparently I had not; that was a relief. However, the fact that I was suspicious even of myself suggests that some maybe I need to clarify something.

The distinction between testing and checking is a power play, but it’s not a power play between (say) testers and programmers. It’s a power play between the glorification of mechanizable assertions over human intelligence. It’s a power play between sapient and non-sapient actions.

Recall that the action of a check has three parts to it. Part one is an observation of a product. Part two is a decision rule, by which we can compare that empirical observation of the product with an idea that someone had about it. Part three is the setting of a bit (pass or fail, yes or no, true or false) that represents the non-sapient application of both the observation and the decision rule. Note, too, that this means that a check can be performed by one of two agencies: 1) a machine. 2) A sufficiently disengaged human; that is, a human who has been scripted to behave like a machine, and who has for whatever reason accepted that assignment.

So checks can be hugely important. Checks are a means by which a programmer, engaged in test-driven development, checks his idea. Creating the check and analyzing its result are both testing activities. Checks are a valuable product (a by-product, some would say) of test-driven development. Checks are change detectors, tools that allow programmers to refactor with confidence. Checks built into continuous integration are mechanisms to make sure that our builds can work well enough to be tested—or, if we’re confident enough in the prior quality of our testing, can work well enough to be deployed. Checks tend to shorten the loop between the implementation of an idea and the disovery of a problem that the checks can detect, since the checks are typically designed and run (a lot, iteratively) by the person doing the implementation. Checks tend to speed up certain aspects of the post-programmer testing of the product, since good checks will find the kind dopey, embarrassing errors that even the best programmers can make from time to time. The need for checks sometimes (alas, not always) prompts us to create interfaces that can be used by programmers or testers to aid in later exploration.

Checking represents the rediscovery of techniques that were around at least in 1957. “The first attack on the checkout problem may be made before coding has begun.” D. D. McCracken, Digital Computer Programming, 1957 (Thanks to Ben Simo for inspiring me to purchase a copy of this book.) In 2007, I had dinner with Jerry Weinberg and Josh Kerievsky. Josh asked Jerry if he did a lot of unit testing back in the day. Jerry practically did a spit-take, saying “Yes, of course. Computer time was hugely expensive, but we programmers were cheap. Getting the program right was really important, so we had to test a lot.” Then he added something that hadn’t occurred to me. “There was another reason, too. Apart from everything else, we tested because the machinery was so unreliable. We’d run a test program, then run the program we wrote, then run the test program again to make sure that we got the same result the second time. We had to make sure that no tubes had blown out.”

So, in those senses, checking rocks. Checking has always rocked. It seems that in some places, people forgot how much it rocks, and that the Agilists have rediscovered them.

Yet it’s important to note that checks on their own don’t deliver value unless there’s sapient engagement with them. What do I mean by that?

As James Bach says here, “A sapient process is any process that relies on skilled humans.” Sapience is the capacity to act with human intelligence, human judgment, and some degree of human wisdom.

It takes sapience to recognize the need for a check—a risk, or a potential vulnerability. It takes sapience—testing skill—to express that need in terms of a test idea. It takes sapience—more test design skill—to express that test idea in terms of a question that we could ask about the program. Sapience—in terms of testing skill, and probably some programming skill—is needed to frame that question as a yes-or-no, true-or-false, pass-or-fail question. Sapience, in the form of programming skill, is required to turn that question into executable code that can implement the check (or, far more expensively and with less value, into a test script for execution by a non-sapient human). We need sapience—testing skill again—to identify an event or condition that would trigger some agency to perform the check. We need sapience—programming skill again—to encode that trigger into executable code so that the process can be automated.

Sapience disappears while the check is being performed. By definition, the observation, the decision rule, and the setting of the bit all happen without the cognitive engagement of a skilled human.

Once the check has been performed, though, skill comes back into the picture for reporting. Checks are rarely done on their own, so they must be aggregated. The aggregation is typically handled by the application of programming skill. To make the outcome of the check observable, the aggregated results must be turned into a human-readable report of some kind, which requires both testing and programming skill. The human observation of the report, intake, is by definition a sapient process. Then comes interpretation. The human ascribes meaning to the various parts of the report, which requires skills of testing and of critical thinking. The human ascribes significance to the meaning, which again takes testing and critical thinking skill. Sapient activity by someone—a tester, a programmer, or a product owner—is needed to determine the response. Upon deciding on significance, more sapient action is required—fixing the application being checked (by the production programmer); fixing or updating the check (by the person who designed or programmed the check); adding a new check (by whomever might want to do so) or getting rid of the check (by one or more people who matter, and who have decided that the check is no longer relevant).

So: the check in and of itself is relatively trivial. It’s all that stuff around the check—the testing and programming and analysis activity—that’s important, supremely important. And as is usual with important stuff, there are potential traps.

The first trap is that it might be easy to do any of the sapient aspects of checking badly. Since the checks are at their core software, there might be problems in requirements, design, coding, or interpretation, just as there might be with any software.

The second trap is that it can be easy to fall asleep somewhere between the report and interpretations stages of the checking process. The green bar tells us that All Is Well, but we must be careful about that. All is well with respect to the checks that we’ve programmed is a very different statement. Red tends to get our attention, but green is an addictive and narcotic colour. A passing test is another White Swan, confirmation of our existing beliefs, proof by induction. Now, we can’t live without proof by induction, but induction can’t alert us to new problems. Millions of repeated tests, repeated thousands of times, don’t tell us about the bugs that elude them. We only need one Black Swan to bump into a devastating effect.

The third trap is that we might believe that checking a program is all there is to testing it. Checking done well incorporates an enormous amount of testing and programming skill, but some quality attributes of a program are not machine-decidable. Checks are the kinds of tests that aren’t vulnerable to the halting problem.Someone on a mailing list once said, “Once all the (automated) acceptance test pass (that is, all the checks), we know we’re done.” I liked Joe Rainsberger‘s reply, “No, you’re not done; you’re ready to give it to a real tester to kick the snot out of it.” That kicking is usually expressed with greater emphasis on exploration, discovery, and investigation, and rather less on confirmation, verification, and validation.

The fourth trap is a close cousin of the third trap: at certain points, we might pay undue attention to the value of checking with respect to its cost. Cost vs. value is a dominating problem with any kind of testing, of course. One of the reasons that the Agile emphasis on testing remains exciting is that excellent checking lowers the cost of testing, and both help to defend the value of the program. Yet checks may not be Just The Thing for some purposes. Joe has expressed concerns in his series Integrated Tests are a Scam, and Brian Marick did too, a while ago, An Alternative to Business-Facing TDD. I think they’re both making important points here, thinking of checks as a means to an end, rather than as a fetish.

Fifth: upon noting the previous four traps (and others), we might be tempted to diminish the value of checking. That would be a mistake. Pretty much any program is made more testable by someone removing problems before someone else sees them. Every bug or issue that we find could trigger investigation, reporting, fixing, and retesting, and that gives other (and potentially more serious) problems time to hide. Checking helps to prevent those unhappy discoveries. Excellent checking (which incorporates excellent testing) will tend to reduce the number of problems in the product at any given time, and thereby results in a more testable program. James Bach points out that a good manual test could never be automated (he’d say “sapient” now, I believe). But note, in that same post that he says, that “if you can truly automate a manual test, it couldn’t have been a good manual test”, and “if you have a great automated test, it’s not the same as the manual test that you believe you were automating”. The point is that there are such things as great automated tests, and some of them might be checks.

So the power play is over which we’re going to value: the checks (“we have 50,000 automated tests”) or the checking. Mere checks aren’t important; but checking—the activity required to build, maintain, and analyze the checks—is. To paraphrase Eisenhower, with respect to checking, the checks are nothing; the checking is everything. Yet the checking isn’t everything; neither is the testing. They’re both important, and to me, neither can be appropriately preceded with “mere”, or “merely”.

There’s one exception, though: If you’re only doing one or the other, it might be important to say, “You’re merely been testing the program; wouldn’t you be better off checking it, too?” or “That program hasn’t been tested; it’s merely been checked.”

See more on testing vs. checking.

9 replies to ““Merely” Checking or “Merely” Testing”

  1. Great post!

    This has all the elements I've been looking for in this discussion. The point that the activity of checking is surrounded (or should be if done well) by so many more skills and activities from selection, result gathering and analysis to determination of significance.

    This is a feedback loop implying that not all checks are equal and some go past their "best before" date – ie the active selection and analysis is important.

    I'm missing a couple of sentences prompted by the following:-

    "Note, too, that this means that a check can be performed by one of two agencies: 1) a machine. 2) A sufficiently disengaged human; that is, a human who has been scripted to behave like a machine, and who has for whatever reason accepted that assignment.
    "

    Suppose a check is performed by an observant and alert checker (tester) – I guess this is the point at which it ceases to be a check. Or an alert & observant checker could execute a check (even though he might spot anomalies he ignores them if they're not on the check-list) and anyone else spotting the anomaly and reacting to it is more of a tester than a checker…

    I enjoyed the read, thanks!

    Reply
  2. Excellent post indeed.

    One thing that I would add to the post is that everything you write above can also apply to scripted manual test cases; you know, the kind that specify exact steps to take, inputs to enter, results to expect and suppress any and all (sapient) exploration while at it. The ones that are typically handed over to the unfortunate tester who is expected to be that sufficiently disengaged human you write about.

    Reply
  3. Hi Michael,

    Great post indeed! I printed it out and taped it on my manager’s monitor. I appreciate the application of sapience to ‘checking’. I was curious to read Joe Rainsberger’s blog on Integration Testing is a Scam but your link is broken. This one works: http://blog.thecodewhisperer.com/2010/10/16/integrated-tests-are-a-scam.

    Michael replies: He changed it! I fixed it! Couldn’t have done it without your help, though; thank you for that.

    Sorry I missed your latest visit to Calgary.

    Me too.

    Cheers,
    Brian

    Reply
  4. A very well composed essay. I would like to add to the point that automation can not replace sapience or pretend to be that artificial intelligence that manual testing requires.

    I would add that any manual test converted to an automated test still has the same ‘value’ of testing. The automated test may now be in ‘checking’ mode but when coded correctly, the test is still ‘testing’ the application. The downfall for automation is the surrounding factors. The time or skill required to program the test. Even exploratory testing can be automated but the question is do we take the time?

    Michael replies: I disagree that any manual test converted to an automated test still has the same ‘value’ of testing. This post addresses that point. Also addressed in that post is the fact that exploratory testing cannot be automated. In addition to all that, have a look at Tacit and Explicit Knowledge, by Harry Collins.

    The simple goal that automation should serve is that it gives the tester more time to do other testing. If the automation framework is good enough so that the user can automate new functional tests used for every code drop and then move on to more manual tests then hasn’t automation done it’s job? Checking or testing is ‘merely’ the difference between the first test execution and every subsequent test execution.

    Again, that’s not the way I think of it.

    I don’t see where artificial intelligence driven automation could every replace manual testing. I do see where automation can and is used to execute any and every test a manual tester could create. It’s another question all together if it’s worth automating every test.

    In the end, any well designed automated test with proper checks has all the sapience it needs from the second execution going forward. Those automated tests are very important. Even today, we still do what Jerry Weinberg did in the fifties and that is “check if the tubes are blown”.

    Automation cannot and is not used to execute any and every test a manual tester could create. A well designed automated test has no sapience in it, just as a machine that puts paint on a chair has no knowledge of chair painting. The machine only reproduces only one mechanical actions in a much larger social context of people who painted chairs, who thought about cost vs. value, who made the determination of what constitutes a well-painted chair, and so forth. (thanks to Harry Collins for that metaphor). And if you actually talk to Jerry, he tell you that they did far, far more than “check if the tubes were blown”. That was a big issue then, because the machinery was expensive and unreliable; but even then, they tested for far more than that. My bet is that he’ll still say what he said in 1961: see the last paragraph in this post.

    Reply

Leave a Comment