DevelopsenseLogo

What Do You Mean By “Arguing Over Semantics”? (Part 2)

Continuing from yesterday

As you may recall, my correspondent remarked

“To be honest, I don’t care what these types of verification are called be it automated checking or manual testing or ministry of John Cleese walks. What I would like to see is investment and respect being paid to testing as a profession rather than arguing with ourselves over semantics.”

Here’s an example of the importance of semantics in testing. When someone says, “it works”, what do they mean? In the Rapid Software Testing, we say that “it works” really means “it appears to meet some requirement to some degree”.

That expanded statement should raise a bunch of questions for a tester, a test manager, or a client: What observations gave rise to that appearance? Observations by whom? When did that person observe? Under what conditions? Did he induce any variation into those conditions? Did he observe repeatedly, or just the once? For a long time, or just a glance? If the observation was made somewhere, what might be different somewhere else? Has anything changed since the last observation? Which requirements were considered? Which requirement specifically seems to have been met? Only that requirement? Are there other explicit or implicit requirements that might have not been met? To some degree–to what degree? Have the right people been consulted on the degree to which the requirement seems to have been met? Do they agree? Is the requirement and the evidence that “it works” well-understood by and acceptable to other people who might matter? Yes, that’s a lot of questions—and testers who ask them find nests of bugs living underneath the questions, and in the cracks between the answers.

Semantics has its parallel in science and measurement, in the form of a concept called “construct validity”. In measurement, construct validity centres around the issue of what counts as an instance of something you’re measuring, and what doesn’t count as such an instance. In their book Experimental and Quasi-Experimental Designs for Generalized Causal Inference (I have no idea why a book with such a catchy title isn’t flying off the shelves), Shadish, Cook and Campbell say,

“The naming of things is a key problem in all sciences, for names reflect category memberships that themselves have implications about relationships to other concepts, theories, and uses. This is true even for seemingly simple labeling problems. For example, a recent newspaper article reported a debate among astronomers over what to call 18 newly discovered celestial objects… The Spanish astronomers who discovered the bodies called them planets, a choice immediately criticized by some other astronomers… At issue was the lack of a match between some characteristics of the 18 objects (they are drifting freely through space and are only about 5 million years old) and some characteristics that are prototypical of planets (they orbit a star and require tens of millions of years to form). Critics said these objects were more reasonably called brown dwarfs, objects that are too massive to be planets but not massive enough to sustain the thermonuclear processes in a star. Brown dwarfs would drift freely and be young, like these 18 objects. The Spanish astronomers responded that these objects are too small to be brown dwarfs and are so cool that they could not be that young.” (p. 66)

Well, tomayto-tomahto, right? There are these objects out there, and they’re out there no matter what we call them. Brown dwarfs, planets… why bother quibbling? Shadish, Cook, and Campbell answer: “All this is more than just a quibble: if these objects really are planets, then current theories of how planets form by condensing around a star are wrong!” (my emphasis)

They end the passage by noting that construct validity is a much more difficult problem in social science field experiments—and I would argue that most of software testing is far closer to the field sciences than to astrophysics. More on that in future blog posts.

(Shadish, Cook and Campell cite the newspaper article as “Scientists quibble on calling discovery ‘planets'” (2000, October 6). The Memphis Commercial Appeal p. A5. I did an online search through all of the the Commercial Appeal’s articles for that day, and a more global search for newspaper articles with that title, but I was unable to find it. However, the controversy over what constitutes a planet continues: http://en.wikipedia.org/wiki/IAU_definition_of_planet.)

Labels are intertwined with our ontologies (our ways of categorizing the world) and our theories. Combustion used to be explained by an element called “phlogiston” that escaped when something was burned, and that had negative mass. When problems with that theory arose, phlogiston evolved from an element into a principle. The phlogiston theory had considerable explanatory power, and drove a good deal of invention and research. Eventually, though, people recognized enough problems in the phlogiston theory that Joseph Priestly’s discovery—”dephlogisticated air”—came to be called by Lavoisier’s name, the name we still use today: oxygen. Interestingly, oxidation began as a principle, before the element was identified. So theories of combustion went from element to principle to inverted principle and back to element. (See Steven Johnson’s The Invention of Air.)

In the old days, people simply got sick. Some treatments worked; others didn’t work so well. Modern doctors don’t casually confuses bacteria and viruses. They prescribe antibiotics for bacterial infections, and retroviral drugs for viruses. Labels not only represent underlying meanings; they often incorporate those meanings.

If we’re pleading for professional respect for testing, it’s worth asking ourselves what we think is worthy of respect. I believe that what’s most respectable is our special interest in dispelling illusions, demolishing unwarranted confidence, recognizing unnoticed complexity, and revealing underlying truths in a rapidly changing and developing world. If you agree, I think you’ll see a professional problem in promoting ways of speaking that create or sustain illusions, instead of telling plain truths about our work. I think you’ll see a professional problem with oversimplified models of complex cognitive processes, and I think you’ll see a problem with keeping our vocabulary static.

To my correspondent way above: I understand that dealing with all this discussion is effortful. Taking on new ideas is a pain, and so is defending old ones that are being challenged. Revising some simple, seemingly stable beliefs is hard work. Recognizing that two labels might point to the same thing might feel like a distraction, and choosing whose vocabulary to use might be laden with politics and emotion. You’re almost certainly busy with getting stuff done, and it takes time to keep up with the conversation—never mind the racket. But be sure of this: investment and respect are paid to astronomy, chemistry, and medicine as professions precisely because it is the nature of a profession to question and redefine its ideas about itself. Studying a profession involves developing distinctions and definitions to aid in the study. If we’re going to talk seriously and credibly about developing skill in testing, it’s important for us to develop clarity on the activities we’re talking about.

I thank James Bach for his contributions to this essay.

3 replies to “What Do You Mean By “Arguing Over Semantics”? (Part 2)”

  1. One thing people *might* mean by “arguing over semantics” is “arguing over definitions”, and they have a point – arguing over definitions is often unsound; see http://lesswrong.com/lw/np/disputing_definitions/ for instance.

    One way around this is to introduce what I (and maybe others) call “Humpty Dumpty privileges”: when discussion seems to hit a bump in the road, and that bump is one particular term, then each party gets to state what their working definition of that word is *for that conversation*, and the discussion moves on

    Michael replies: Yes. I gave earlier the example of Mike Hill, who uses “microtests” to describe one kind of human/machine checking that he performs while doing TDD. In a conversation with Mike, I’d be happy to do that. J.B. Rainsberger seemed to agree with the testing/checking distinction at the Agile conference, even before any of this was written down. He still calls his unit checks “unit tests”. No problem there, either.

    One caveat to keep in mind is overloaded terms. Say someone is claiming that “automated tests build confidence”. You ask, “what do you call automated tests?” They say, “tiny programs that call a larger program and compare its output against a preset expectation”.

    Now you can ask, “so how does that build confidence exactly?” If they answer “because each of these programs tests a particular aspect of my code”, notice that they’re perhaps using a different sense of the term now. “What do you mean by the verb test in that previous sentence?” Maybe the answer is “compare an output against a preset expectation”. Now you can say, “I don’t see how that can build confidence; I agree that ‘testing’ builds confidence, but I notice I mean something different by that word, namely ‘evaluating a product by learning about it through experimentation’.”

    The next reply might be “comparing outputs against presets expectations builds confidence when I have many such comparisons, because it’s unlikely that the code change I just made would have undesirable side effects without that being detected through one of these comparisons”.

    Now you can argue about many things: whether “code changes” are the only things with undesirable side effects, whether “side effects” are the only thing that undermines confidence, or how unlikely is “unlikely”.

    But notice that at that point you’re no longer “arguing over definitions”. You’re arguing over *substance*.

    I’m not convinced that things cut as cleanly as all that. For example, notice that your first couple of questions there were “whaddya mean?” questions. In my experience, a common response to “I notice I mean something different by that word” is “See? You’re talking about semantics again” or “when I run my tests, I AM evaulating it through experimentation”. Plus I don’t agree that testing builds confidence, at which point we may well be back to a “whaddya mean?” conversation.

    I do agree that the approach is worth a try, and that it’s sometimes successful. A more common experience for me, though, is the paradigm problem: we’re not just talking about different words, but different worlds, different world views. My experience in general is that those people interested in setting up a trading zone don’t play the “semantics” card.

    I also agree that examples and specifics would help in this particular conversation. And you’ve reminded me that James and I still have some work to do on that, so thank you.

    (As for professions, things may be more complex than you make them out. I rather firmly disagree with the “because” connection in your statement about astronomy, etc.)

    You’re right about that too. Would changing “precisely” to “at least in part” be the start of a fix to the offending paragraph?

    Reply
  2. In my projects, we try to define what is acceptable to the customer rather than just saying ‘it works’. We define what each severity means to the customer and what would be acceptable by the customer, upfront. This reduces some of the subjectivity on how we should be classifying the bugs and whether the release was acceptable or not.

    Michael replies: I’m glad you said “some”.

    Reply

Leave a Comment