A heuristic is a fallible method for solving a problem or making a decision. “Heuristic” as an adjective means “something that helps us to learn”. In testing, an oracle is a heuristic principle or mechanism by which we recognize a problem.
Some years ago, during a lunch break from the Rapid Software Testing class, a tester remarked that he was having a good time, but that he wanted to know how to get over the boredom that he experienced whenever he was testing. I suggested to him that if found that testing was boring, something was wrong, and that he could consider the boredom as something we call a trigger heuristic. A trigger heuristic is like an alarm clock for a slumbering mind. Emotions are natural trigger heuristics, nature’s way of stirring us to wake up and pay attention. What was the boredom signalling? Maybe he was covering ground that he had covered before. Maybe the risks that he had in mind weren’t terribly significant, and other, more important risks were looming. Maybe the work he was doing was repetitive and mechanical, better left to a machine.
Somewhat later, I realized that every time I had seen a bug in a piece of software, an emotion had been involved in the discovery. Surprise naturally suggested some kind of unexpected outcome. Amusement followed an observation of something that looked silly and that posed a threat to someone’s image. Frustration typically meant that I had been stymied in something that I wanted to accomplish.
There is a catch with emotions, though: they don’t tell you explicitly what they’re about. In that, they’re like this device we have in our home. It’s mounted in a hallway, and it’s designed to alert us to danger. It does that heuristically: it emits a terrible, piercing noise whenever I’m baking bread or broiling a steak. And that’s why, in our house, we call it the cooking detector. The cooking detector, as you may have guessed, came in a clear plastic package labelled “smoke detector”.
When the cooking detector goes off, it startles us and definitely gets our attention. When that happens, we make more careful observations (look around; look at the oven; check for a fire; observe the air around us). We determine the meaning of our observations (typically “there’s enough smoke to set off the cooking detector, and it’s because we’re cooking“); and we evaluate the significance of them (typically, “no big deal, but the noise is enough to make us want to do something”). Whereupon we perform some appropriate control action: turn on the fan over the stove, open a door or a window, turn down the oven temperature, mop up any oil that has spilled inside the oven, check to make sure that the steak hasn’t caught fire. Oh, and reset the damned cooking detector.
Notice that the package says “smoke detector”, not “fire detector”. The cooking detector apparently can’t detect fires. Indeed, on the two occasions that we’ve had an actual fire in the kitchen (once in the oven and once it the toaster over), the cooking detector remained resolutely and ironically silent. We were already in the kitchen, and noticed the fires and put them out before the cooking detector detected the smoke. Had one of the fires got bad enough, I’m reasonably certain the cooking detector would have squawked eventually. That’s a good thing. Even though our wiring is in good shape, we don’t smoke, and the kids are fire-aware, one never knows what could happen. The alarm could give us a chance to extinguish a fire early, to help to reduce damage, or to escape life-threatening danger.
The cooking detector is like a programmer’s unit test—an automated check. It makes a low-level, one-dimensional, one-bit observation: smoke, or no smoke. It’s oblivious to any threat that doesn’t manifest itself as smoke, such as the arrival of a burglar or a structural weakness in the building. The maximum value of the cooking detector is unlikely to be realized. It occasionally raises a ruckus, and when it does, it doesn’t tell us what the ruckus is about. Usually it’s for something that we can understand, explain, and deal with quickly and easily. Smoke doesn’t automatically mean fire. The cooking detector is an oracle, a device that provides a heuristic trigger, and heuristic devices are fallible. The cooking detector doesn’t tell us that there is a problem; only that there might be a problem. We have to figure out whether there’s a problem, and if so, what the problem is.
Postscript: There’s another thing. As the late Jerry Weinberg put it, “…when managers and developers to see the product works, telling them it works seems to provide no information; that is, “no information” seems to equal “no value”. It’s the same reason why people don’t replace the batteries in their smoke alarms—most of the time a non-functioning smoke alarm is behaviourally indistinguishable from one that works. Sadly, the most common reminder to replace the batteries is a fire.” (Perfect Software and Other Illusions about Testing, p.71.)
Yet the cooking detector comes at low cost. It didn’t cost much to buy, it takes one battery a year, and it’s easy to reset. More importantly, the problem to which it alerts us is a potentially terrible problems. Although the cooking detector doesn’t tell us what the problem is, it tells us to pay attention so that we can investigate and decide on what to do, before a problem gets serious without our notice. Smoke doesn’t automatically mean fire, but it does mean smoke. Where there’s smoke, maybe there’s fire, or maybe there’s something else that’s unpleasant or dangerous. The cooking detector reminds us to check the steak, open the windows, clean the oven every once in a while, evaluate what’s going on. I don’t believe that the cooking detector will ever detect a real, serious problem that we don’t know about already—but I’m not prepared to bet my family’s life on that.
Michael,
I like your analogy here, it is an excellent point that a change in behaviour of an automated test (check) does not necessarily indicate a problem, but should be viewed as a trigger for re-examination around the areas that affect the behaviour in question. The automation is an indicator of change rather than necessarily a test of correctness, and keeping this principle in mind helps to ensure that we don’t fall into the trap of gaming ourselves by focussing on passing tests rather than working features. I believe that understanding this principle also helps to drive our automation in the right direction, ensuring that, in addition to binary checks we also use the automation to gather logs and evidence to help the tester to understand the behaviour when performing that re-assessment (at the risk of gratuitous self promotion, I wrote about this here last year.)
I’m glad that you raise the issue of cost in the last paragraph. Just as with the smoke detector, automated testing is a cost based compromise as manually monitoring to the same extent on a continuous basis may be prohibitively expensive – ideally we’d all have a firefighter in the kitchen monitoring for hazards but this is not cost effective. Being mindful of this compromise and aware of the inherent risks will help to avoid an over reliance on the automation at the expense of more critical thinking.
Thanks for the excellent insights.
Adam.
[…] The Cooking Detector […]
So let’s see…
1) Regular false positives, that require additional investment to resolve
2) Two false negatives, averted by other means
3) Countless true negatives, asserting the status quo (“yup, gravity still works”)
4) No true positives to justify the initial investment and subsequent maintenance costs
Damn, that is a good analogy.
Michael replies: Agreed. But don’t forget:
5) A true positive would be an indication of a serious problems, therefore…
6) It’s important to take lots of other steps to prevent true positives from ever coming up.
In reply to (6). Yes, it’s important to take lots of other steps to prevent true positives. One of these steps might be to evaluate if it’s necessary to have a smoke detector at all.
And just wondering… will you rename your detector to something else when it goes off when you’re not? 🙂