The Motive for Metaphor

There’s a mildly rollicking little discussion going on the in the Software Testing Club at the moment, in which Rob Lambert observes, “I’ve seen a couple of conversations recently where people are talking about red, green and yellow box testing.” Rob then asks “There’s the obvious black and white. How many more are there?”

(For what it’s worth, I’ve already made some comments about a related question here.)

At one point a little later in the conversation, Jaffamonkey (I hope that’s a pseudonym) replies,

If applied in modern context Black Box is essentially pure functional testing (or unit testing) whereas White Box testing is more of what testers are required to do, which is more about testing user journeys, and testing workflows, usability etc.

Of course, that’s not what I understand the classical distinction to be.

The classical distinction started with the notion of “black box” testing. You can’t see what’s inside the box, and so you can’t see how it’s working internally. But that may not be so important to you for a particular testing mission; instead, you care about inputs and outputs, and the internal implementation isn’t such a big deal.

You’d probably take a black box approach when a) you don’t have source code; or b) you’re intentionally seeking problems that you might not notice so quickly by inspection, but that you might notice by empirical experiments and observation; or maybe c) you may believe that the internal implementation is going to be varied or variable, so no point in taking it into account with respect to the current focus of your attention. I’m sure you can come up with more reasons.

This “black box” idea suggests a contrast: “glass box” testing. Since glass is transparent, you can see the inner workings, and the insight into what is happening internally gives you a different perspective for risks and test ideas.

Glass box testing might be especially important when a) your mission involves testing what’s happening inside the box (programmers take this perspective more often than not); or b) your overall mission will be simpler, in some dimension, because of your understanding of the internals; or maybe c) you want to learn something about how someone has solved a particular problem. Again, I’m sure you can some up with lots more reasons; these are examples, not definitive lists.

Unhelpfully (to me), someone somewhere along the way decided that the opposite of “black” must be “white”; that black box testing was the kind where you can’t see inside the box; and that therefore white (rather than glass) box testing must the name for the other stuff. At this point, the words and the model began to part company.

Even less helpfully, people stopped thinking in terms of a metaphor and started thinking in terms of labels dissociated from the metaphor. The result is an interpretation like Jaffa’s above, where he (she?) seems to have inverted the earlier interpretations, for reasons I know not why. Who knows? Maybe it’s just a typo.

More unhelpfully still (to me), someone has (or several someones have) apparently come along with color-coding systems for other kinds of testing. Bill Matthews reports that he’s found

Red Box = “Acceptance testing” or “Error message testing” or “networking , peripherals testing and protocol testing”
Yellow Box = “testing warning messages” or “integration testing”
Green Box = “co-existence testing” or “success message testing”


For me, there are at least four big problems here.

First, there is already disagreement on which colours map to which concepts. Second, there is no compelling reason that I can see to associate a given colour with any of the given ideas. Third, the box metaphor doesn’t have a clear relationship to what’s going on in the mind or the practice of a tester. The colour is an arbitrary label on an unconstrained container. Fourth, since the definitions appear on interview sites and the sites disagree, there’s a risk that some benighted hiring manager will assume that there is only one interpretation, and will deprive himself of an otherwise skilled tester who read a different site.

(To defend yourself against this fourth problem in interviews, use safety language: “Here’s what I understand by ‘chartreuse-box testing’. This is the interpretation given by this person or group, but I’m aware there may be other interpretations in your context.” For extra points, try saying something like, “Is that consistent with your interpretation? If not, I’d be happy to adopt the term the way you use it around here.” And meaning it. If they refuse to hire you because of that answer, it’s unlikely that working there would have been much fun.)

All of this paintbox of terms is unhelpful (to me) because it means another 30,000 messages on LinkedIn and QAForums, wherein enormous numbers of testers weigh in with their (mis)understandings of some other author’s terms and intentions—and largely with the intention of asking or answering homework questions, so it seems.

The next step is that, at some point, some standards-and-certification body will have to come along and lay down the law about what colour testing you would have to do to find out how many angels can dance on the head of a pin, what colour the pin is, and whether the angels are riding unicorns. And then another, competing standards-and-certification body will object, saying that it’s not angels, it’s fairies, and it’s not unicorns, it’s centaurs, and they’re not dancing, they’re doing gymnastics. And don’t even get us started on the pin!

Courses and certifications on colour-mapping to mythological figures will be available (at a fee) to check (not test!) your ability to memorize a proprietary table of relationships.

Meanwhile, most of the people involved in the discussion will have forgotten—in the unlikely event that they ever knew— that the point of the original black-and-glass exercise was to make things more usefully understandable. Verification vs. validation, anyone? One is building the right thing; the other is building the thing right. Now, quick: which is which? Did you have to pause to think about it? And if you find a problem wherein the thing was built wrong, or that the wrong thing was built, does anyone really care whether you were doing validation testing or verification testing at the time?

Well… maybe they do. So, all that said, remember this: no one outside your context can tell you what words you can or can’t use. And remember this too: no one outside your context can tell you what you can or can’t find useful. Some person, somewhere, might find it handy to refer to a certain kind of testing as “sky testing” and another kind of testing as “ground testing”, and still another as “water testing”. (No, I can’t figure it out either.) If people find those labels helpful, there’s nothing to stop them, and more power to them. But if the labels are unhelpful to you and only make your brain hurt, it’s probably not worth a lot of cycles to try to make them fit for you.

So here are some tests that you can apply to a term or metaphor, whether you produce it yourself or someone else produced it:

  • Is it vivid? That is (for a testing metaphor), does it allow you to see easily in your mind’s eye (hear in your mind’s ear, etc.) something in the realm of common experience but outside the world of testing?
  • Is it clear? That is, does it allow you to make a connection between that external reference and something internal to testing? Do people tend to get it the first time they hear it, or with only a modicum of explanation? Do people retain the connection easily, such that you don’t have to explain it over and over to the same people? Do people in a common context agree easily, without arguments or nit-picking?
  • Is it sticky? Is it easy to remember without having to consult a table, a cheat sheet, or a syllabus? Do people adopt the expression naturally and easily, and do they use it?

If the answer to these questions is Yes across the board, it might be worthwhile to spread the idea. If you’re in doubt, field-test the idea. Ask for (or offer) explanations, and see if understanding is easy to obtain. Meanwhile, if people don’t adopt the idea outside of a particular context, do everyone a favour: ditch it, or ignore it, or keep it within a much closer community.

In his book The Educated Imagination (based on the Massey Lectures, a set of broadcasts he did for the Canadian Broadcasting Corporation in 1963), Northrop Frye said,

“Outside literature, the main motive for writing is to describe this world. But literature itself uses language in a way which associates our minds with it. As soon as you use associative language, you begin using figures of speech. If you say, “this talk is dry and dull”, you’re using figures associating it with bread and breadknives. There are two kinds main kinds of association, analogy and identity, two things are like each other and two things that are each other (my emphasis –MB). One produces a figure of speech called the simile. The other produces a figure called metaphor.”

When we’re trying to describe our work in testing, I think most people would agree that we’re outside the world of literature. Yet we often learn most easily and most powerfully by association—by relating things that we don’t understand well to things that we understand a little better in some specific dimension. In reporting on our testing, we’re often dealing with things that are new to us, and telling stories to describe them. The same is true in learning about testing. Dealing with the new and telling stories leads us naturally to use associative language.

Frye explains why we have to be cautious:

“In descriptive writing, you have to be careful of associative language. You’ll find that analogy, or likeness to something else, is very tricky to handle in description, because the differences are as important as the resemblances. As for metaphor, where you’re really saying “this is that,” you’re turning your back on logic and reason completely because logically two things can never be the same thing and still remain two things.”

Having given that caution, Frye goes on to explain why we use metaphor, and does so in a way that I think might be helpful for our work:

“The poet, however, uses these two crude, primitive, archaic forms of thought in the most uninhibited way, because his job is not to describe nature but to show you a world completely absorbed and possessed by the human mind…The motive for metaphor, according to Wallace Stevens, is a desire to associate, and finally to identify, the human mind with what goes on outside it, because the only genuine joy you can have is in those rare moments when you feel that although we may know in part, as Paul says, we are also a part of what we know.”

So the final test of a term or a metaphor or a heuristic, for me, is this:

  • Is it useful? That is, does it help you make sense of the world to the degree that you can identify an idea with something deeper and more resonant than a mere label? Does it help you to own your ideas?
  • Postscript, 2013/12/10:

    “A study published in January in PLOS ONE examined how reading different metaphors—’crime is a virus’ and ‘crime is a beast’—affected participants’ reasoning when choosing solutions to a city’s crime problem…. (Researcher Paul) Thibodeau recommends giving more thought to the metaphors you use and hear, especially when the stakes are high. ‘Ask in what ways does this metaphor seem apt and in what ways does this metaphor mislead,’ he says. Our decisions may become sounder as a result.”

    Excerpted from Salon.

    Further reading: Round Earth Test Strategy (James Bach)

15 replies to “The Motive for Metaphor”

  1. I think it should be called “open box” testing.

    Among the popular phrases, I prefer white box, because it feels abstract, whereas whenever I hear “glass” box I think of shards of brittle laceration.

    — James

  2. I liked this article very much.

    We are in the middle of a debate on creation of new terms. After reading this blog post, I’m convinced that our thoughts on creation of new terms are not very different from each other.

    I understand that our current debate is in the context of a couple of terms & the related concepts that *you* find useful and I don’t.

    Michael replies: For our readers, Rahul is referring to our conversations here, here, and here.

    As I’ve stated before: I stand behind “checking” (and the distinction between testing, which requires sapience, and checking, which doesn’t). It seems to pass my tests above. If it doesn’t work for you, that’s fine. But please note that I’m NOT using “confirmatory testing” in the sense you’ve taken in in your blog posts on the subject. That is, I’m not using “confirmatory testing” as a term. Instead, I’m using “confirmatory” as an ordinary adjective to modify “testing”, just as I’m using “ordinary” to modify “adjective”. So your please note that your characterization of “confirmatory testing”, whatever you mean by it, is what you mean, and not what I mean.

  3. Almost excellent blog post! You perfectly wrote down what i was thinking, but couldn’t pinpoint it yet. Maybe i was waiting too long before applying my “simplify-before-clarify” heuristic. (Was thinking in english mostly, which obstructs me from clear thinking. As you know, English isn’t my native language). But you definitely hit the nail on the head.

    A small footnote. You lost me at the “In his book…” paragraph, tho. So i’m going to re-read that part.

  4. Very nice article, Michael.

    Since reading it, I find that I’m no longer feel blue, and I’m starting to think outside the box.

    “When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean — neither more nor less.”

  5. Yet again, a very nice and simple way to define it.
    I too was looking at a lot of discussions that were going on around on different forums and the people running behind the definitions. The way it has been defined in this blog should definitely help to curb down those discussions and keep the focus on what is important.

  6. Thanks Michael!

    Trackback made to CSC internal test blog, where I previously wrote on agile/waterfall, Bret Pettichords 4 schools of testing and Vancouver Quadrants to illustrate that testing is many colors, situations, contexts, schools, campfires and soapboxes! 🙂

  7. Hi Michael,

    Great article and one I was hoping someone would conclude that discussion with. It seems insane to keep coming up with new ways to describe a testing term/concept/idea and even more insane to try and make it a globally recognised best practice term. uurgh. I shudder the think about that section of the certification exam.


    Michael’s reply: Thanks, Rob. Yet I’m puzzled by something—are you referring to the pink section of the certification exam, or the yellow section of the certification exam?


  8. it must be the blue section!!

    Great post! Great rules!
    A problem I sometimes have with (people using) metaphors is that they not always clarify something but mystify it.
    Probably another sensible heuristic: if someone uses a metaphor do we ‘see’ the same thing?


  9. I’ve never liked Glass Box testing, personal preference but it’s Black and White for me.

    We really don’t need a multi coloured rainbow of testing labels, another pseudo-clever idea to confuse things. It’ll be on the ISTQB exam before you know it!


  10. Thanks Michael, I agree with the rules you have named. (Though I am already used to “white” box, I am slowly getting used to “Glass” box for the same meaning – it seems like not having English as a mother tongue makes it a bit harder for me 🙂 )

    Michael replies: Thanks for writing. I’d believe that having a different native tongue might be an issue, but I’ve seen many native English speakers that have the same kind of problem as you describe.

    I would like to say, that while I like the use of metaphors, I also find the “Tours” namings far from making much sense to me – But I hope we would be able to get some short metaphor for these, which will make sense to larger portion of the testers community world-wide.

    If you’re referring to James Whittaker’s named tours, I agree. I find his names for them quite unhelpful, because the association between the mental image and the tester’s activity are not at all clear for me, and that makes them less sticky, less memorable. It is of the nature of metaphors that they inspire different things in different people, which is a weakness but also a strength. Most of the weakness can be managed by ongoing conversation.

    To describe tours, I like the more literal descriptions of tours in the Rapid Testing course (sample data, variables, files, complexity, menus & windows, keyboard & mouse), or Michael Kelly’s FCC CUTS VIDS mnemonic list of tours (Feature, Complexity, Claims, Configuration, User, Testability, Scenario, Variability, Interoperability, Data, Structure). But everyone’s mileage may vary. (Both of these lists predate Whittaker’s work on the subject, by the way.

    To me, the Iceberg Heuristic suggests something straightforwardly: when you see evidence of trouble, assume that there is bigger trouble below the surface. The Rumble Strip Heuristic suggests something: when you hear something under your tires (metaphorically speaking), take stock of your position on the road and what’s coming up. But again, your mileage may vary.

  11. yet another wonderful article since i have started reading on software testing.

    Pretty simple yet convincing. Perfect inspiration to encourage out of the box thinking.


Leave a Comment