DevelopsenseLogo

Talking About Testing

Frequently, both online and in face-to-fact conversations, testers express reservations to me about making a clear distinction between testing and checking when talking to others.

It’s true: “test” is an overloaded word. In some contexts, it refers to a heuristic process: evaluating a product by learning about it through experiencing, exploring and experimenting; that’s what testers refer to when they’re talking about testing, and that’s how we describe it in the Rapid Software Testing namespace. In other contexts, “test” refers to an algorithmic process: applying decision rules to the output from some function or mechanism.

It’s unfortunate that at some point, “test” in computer programming was drained of its much richer historic meaning (“that by which the existence, quality, or genuineness of anything is or may be determined“, per the Oxford English Dictionary). Alan Turing didn’t make that mistake; he wrote the paper “On Checking a Large Routine“. The distinction was still pretty clear in 1972 in Program Test Methods, the first book dedicated to software testing, but it began to dissipate after that.

Yet the distinction remains hugely important! Checking functional output is an important part of disciplined software development work, but it’s no substitute for obtaining experience with the product and performing experiments on it. We can run thousands of output checks on a program, and the product can still have terrible problems that the checks won’t reveal. “Tests” that developers talk about aren’t “tests” that testers talk about.

A tester recently remarked “Those “tests” are not the same things and the fact that they are called the same is very unhelpful. But I don’t believe I will be able to change that or that it will be the best use of my time to try to change that.

Well… it used to be the case that people didn’t make a distinction between viral and bactierial infections. That distinction matters, of course; not least because antibiotics don’t work against viral infections, and over-prescribing antibiotics weakens their effectiveness. These days, not only doctors and scientists distinguish these two kinds of infection; the general public has at least some notion that there are meaningful differences.

How did this happen? It wasn’t because scientists, experts, who actually knew something about these things said “Well… I can’t change the language; we’ll just keep calling them all infections.” It’s because they bothered to make the distinction, and to apply that distinction in their discourse.

In my experience, it’s been pretty easy to get developers to accept the distinction between testing and checking. They’re willing to acknowledge that there’s a difference, just as there’s a difference between programming and compiling. In fact, it would be unreasonable to deny that there’s a difference, and most developers don’t want to feel or appear unreasonable.

Yet it seems to me that some testers struggle with the idea of shifting the language in ways helpful to better testing. I don’t. That’s partly because I’m old, and bearded, and experienced, and confident. I have experience as a developer, and as a program manager. I talk like someone who knows about testing, and who knows that testing matters. Some people aren’t so old, don’t have beards, don’t have the same experience as I do, and aren’t so confident.

Well, fear not, Dear Tester. In a development group, you are (presumably) the testing expert, so you get to be a leader in the way people talk about testing.

On the other hand, nit-picking people’s discourse tends to be a social pitfall. So here’s what I do, and what I offer as a way to help move the language and the crafts of development and testing forward.

I apply this approach when I’m working with a team; when I’m teaching a group of testing students; when I’m helping a manager to solve testing-related problems; or when I’m conversing with someone in social media.

When someone refers to “automated testing”, or calls a unit check a “test”, I wait (albeit not for very long) for some reasonable moment. At that point, I explain the distinction that we make between testing and checking, and I explain why it’s deeply important to me, as a responsible and professional tester.

I note that I use the distinction to explain why testing can’t be automated, just as programming, design, management, investigation, research, or journalism can’t be automated.

I’m happy to answer any questions on the subject. I’m also happy to acknowledge the importance of output checking to efficient testing. Diligent checking will help to reduce easily avoidable and detectable errors in the code. Investigating and reporting those problems interrupts and slows down deeper testing work, so I express gratitude to developers who apply checking in their work.

All this is especially important for students in Rapid Software Testing classes. The ability to talk clearly about testing is essential for developing confidence in the tester, and in attracting respect from others.

When I’m sure that we’ve achieved an understanding, life proceeds as normal. Sometimes (more often than you might think), people adopt the distinction in their speech right away. Sometimes they don’t, and they continue to talk about “unit tests” or “writing tests”. And that’s okay; after that initial conversation, I won’t try to correct them. To do so would be socially awkward.

But I will be absolutely rigourous in my own talk. When someone says “we don’t have enough unit tests in this area,” I’ll reply “I agree; more unit checks would be a good idea.” When someone talks about “flaky tests”, I might answer “yes; we should look into why these checks seem to produce inconsistent results”. When a manager says “We need to automate more of the testing”, I’ll suggest “more automated checks might help, but let’s look at how problems might get past them.” I’ll do this as naturally as I can, without drawing attention to it. (All this applies to “manual testing” and “automated testing” as well.)

And after a while, since it’s a natural thing for people to want to fit in, they start talking about testing and checking in this more nuanced, thoughtful, and powerful way.

And you, Dear Tester, can do that too. And together we can change the world.

9 replies to “Talking About Testing”

  1. I feel that you are mixing two things here:
    1) It is important to make a distinction between different types of interactions/activities that provide information.
    2) It is important that certain types are called “testing” and other types are called “checking”.

    I agree with point 1, but I disagree with point 2. And I choose to use my time and energy on getting people to realize point 1 without the help of point 2. I even feel that point 2 is counterproductive (and wrong according to my definition of testing).

    In a sense you made that argument yourself in the article:
    “[…] it used to be the case that people didn’t make a distinction between viral and bactierial infections.”
    Yes, very important distinction. But both are still referred to as “infections”.

    Reply
    • I agree with point 1, but I disagree with point 2. And I choose to use my time and energy on getting people to realize point 1 without the help of point 2. I even feel that point 2 is counterproductive (and wrong according to my definition of testing).

      You don’t believe that there’s a difference between strictly algorithmic evaluations and heuristic evaluations? That algorithmic evaluations are equivalent to heuristic evaluations? That algorithmic evaluations are not a subset of heuristic evaluations? That it’s unimportant for experts to recognize that machinery cannot design, encode, interpret, analyze a test result? That it would be okay to do so without any social competence? That machinery is not an extension of human capabilities, but a viable replacement for them? A Yes answer to any question in that list would surprise me.

      (quoting me) “It used to be the case that people didn’t make a distinction between viral and bactierial infections.”

      Yes, very important distinction. But both are still referred to as “infections”.

      Well, wait, now I’m confused, because what you just said above would suggest that such distinctions wouldn’t be important.

      Yes; they’re both referred to as “infections”. But the distinction is important when the infections of are a different nature, and require radically different treatments.

      Now, of course, to civilians, the treatment is “pills”. To doctors, the treatment involves different kinds of pills, with different dosages, different side effects, different prospects for outcomes, and so forth. Because the nature of disease varies, the nature of the treatment must vary as well. That doesn’t matter if people are okay with being fooled. As Wendell Johnson said, “To a mouse, cheese is ‘just cheese’; that’s why mousetraps work.”

      To serious testers, the difference between a test and a part of a test that can be automated must be important. To amateurs, perhaps not so much.

      Reply
      • If you got confused, it was because I was not being clear enough. I will try to clarify.

        1) Yes, it is important to make the distinction between different types of testing.
        2) No, it is not important to use specific terminology approved by an authority (e.g. “testing/checking” or the ISTQB glossary) to make this distinction. Instead it is fine to use terms like “X testing” and “Y testing”, just as we do with “X infection” and “Y infection”.

        Arguing about the term used (2) is not important for me, on the contrary I find it gets in the way of discussing what is important, namely the difference (1).

        Reply
        • Hi Baldvin,

          I get what you are saying – in that it can seem like an unimportant distinction, that takes energy away from more useful discussions. I used to think this as well.

          But over the last couple of years my opinion has shifted for a couple of reasons:

          – Making this distinction can lead to some really deep and interesting discussions with stakeholders about testing
          – A couple of times in the past few years I have dealt with stakeholders who believe automating everything is the correct course, and this has led them to over value certain testers and testing skill sets at the expense of a more wholistic approach. Having a conversation with these stakeholders about testing vs checking sometimes works as way to blunt some of the negative impacts of this mindset.

          In summary over the past few years I have become more convinced that this is an important distinction to make – especially if part of your role is test leadership or test advocacy.

          Reply
      • Terminology should constantly adapt to match all the existing information, not just a piece of information from the past. Otherwise, why are we even bothering to improve what we have, if what was said at first is the only thing that matters? Alan Turing did the best that he could with the information and technology that he had at that time.

        Are not all Quick checks, Regression checks (and all other non-exploratory checks) tests until they’ve been executed once? After you execute the test once you can already (generally) define what will be the steps required to perform that check again. All tests check something: they check that the app works as intended or that it looks nice or whatever check we’re making. When you ‘test’ (or experiment as you like) with an app, you learn about all the checks that you can make, but you can also learn about some of those checks by reading the documentation. I can’t figure out a scenario when someone tells me just: ‘Hey, can you test this app?’ – what do you mean test? Test if it matches my needs? Test if I like playing with it?

        You can say: ‘Sure, testing involves checking but checking is only a part of testing because you can always explore the software to find new, undetected issues’. But even then, an issue is an issue because some rule says its an issue. When I was little, my computer used to crash and I thought that’s just they way things are until I found out that it’s not ok to crash. I rule told me so. Eventually I could reproduce the crash with some exact steps. And I could check if it’s still crashing or not after an update.

        To support your point, while I was testing what my computer can do or what I can do with it, I wasn’t reading a documentation to perform some checks and confirm that it’s working as intended, I was just exploring to see if it sparks some interest or it’s something I can throw away in a corner. That is a unique moment in time because when people asked me questions about what I felt like using a computer, they were receiving an innocent, unbiased reaction. The moment they asked what I felt about the UI, about the keyboard, mouse, floppy disk, performance and so on, next time I would be on a computer I would remember those questions and check to see what the answers are. I became aware of other characteristics. And I would check if they exist or not, if they are improved, if I like them. The same goes with every piece of software. The moment you know about a characteristic, you start checking it. You no longer test it because you just can’t. As such, the only person that can truly test an application by also respecting the historical terminology of ‘testing’ is a person that is not a software tester.
        Instead of thinking that checking is to testing what compiling is to programming I would rather say that it is what writing is to programming, you can’t have programming without writing something. But we’re not generally saying ‘I’m writing a computer program by writing code that will compile and turn into machine code that the computer will process and output its result’, we’re saying that we’re just programming an app and everybody understands what we are doing. Just the say way we’re saying we’re testing something (as software testers) we’re actually saying that we’re checking some characteristics.

        “Tests” that developers talk about aren’t “tests” that testers talk about. – sure, agree. Just the same way that tests that testers talk about are not the same as the tests the product owners talk about, or the end users. Why does that matter? You can add precise definition to all terminology and make everyone follow fixed rules, but you’d still end up with the same result. Clear expectations, communication and trust solves these kind of issues with way lesser effort.
        The distinction is already there, we all know that unit tests are not the same as integration tests, api tests, end-to-end tests, performance tests, exploratory and so on. Why is there a need to add an extra distinction, what will that bring to the table? What is the problem that this distinction is trying to solve?

        Should we change the test pyramid with a checks pyramid, that has ‘check types’ and the people performing those checks are checksters? The entire terminology that a good majority is used to and understand should be adapted because some just prefer correcting others with ‘unit checks’?

        Reply
        • Are not all quick checks, regression checks (and all other non-exploratory checks) tests until they’ve been executed once?

          No. A check is a test neither before nor after it is automated. Just as compiling is not programming, but part of programming; just as spell-checking is not editing, but part of editing; just as printing a drawing is not design, but part of a design process.

          A check is a part of a test, such that the check can at least in theory be automated. I remain baffled as to why this might be misunderstood for very long, or controversial once understood.

          After you execute the test once you can already (generally) define what will be the steps required to perform that check again.

          Indeed, you can often define the steps required to perform a check before you’ve executed the test.

          All tests check something: they check that the app works as intended or that it looks nice or whatever check we’re making.

          All tests evaluate something. You can say “check” if you like, and that’s fine in the dictionary sense; no one ought to dispute that. In the Rapid Software Testing namespace, “check” has that special meaning: a process by which we operate, observe, and evaluate some aspect of the product’s behaviour algorithmically.

          No check, and no test, can determine that the app works as intended. When a test suggests that there’s a problem, or a check signals a problem, what happened during the check is part of the assessment as to whether there’s a problem there or not. When a test does not reveal a problem, or when a check doesn’t signal a problem, that doesn’t mean that the product is fine. It means only that we haven’t observed a problem, or that the check hasn’t signalled one. Or, to put it another way, “none of the oracles that I applied prompted me to recognize a problem.”

          When you ‘test’ (or experiment as you like) with an app, you learn about all the checks that you can make, but you can also learn about some of those checks by reading the documentation. I can’t figure out a scenario when someone tells me just: ‘Hey, can you test this app?’ – what do you mean test? Test if it matches my needs? Test if I like playing with it?

          That seems like a highly resolvable problem.

          You can say: ‘Sure, testing involves checking but checking is only a part of testing because you can always explore the software to find new, undetected issues’.

          Even a single check is only part of a test. Whether a check glows green or red, for something to be a test some tester, some human, must analyze the outcome, make sense of it, and interpret whether it represents a product problem or not. No responsible tester will see a check run red and go to the developer saying “there’s a bug here!” without that analysis, sensemaking, and interpretation.

          But even then, an issue is an issue because some rule says its an issue. When I was little, my computer used to crash and I thought that’s just they way things are until I found out that it’s not ok to crash. I rule told me so. Eventually I could reproduce the crash with some exact steps. And I could check if it’s still crashing or not after an update.

          I would suggest replacing rule with heuristic. I once wrote a program that was explicitly designed to crash the machine (so that I could examine the message displayed by a memory manager’s crash handler).

          To support your point, while I was testing what my computer can do or what I can do with it, I wasn’t reading a documentation to perform some checks and confirm that it’s working as intended, I was just exploring to see if it sparks some interest or it’s something I can throw away in a corner. That is a unique moment in time because when people asked me questions about what I felt like using a computer, they were receiving an innocent, unbiased reaction. The moment they asked what I felt about the UI, about the keyboard, mouse, floppy disk, performance and so on, next time I would be on a computer I would remember those questions and check to see what the answers are. I became aware of other characteristics. And I would check if they exist or not, if they are improved, if I like them. The same goes with every piece of software. The moment you know about a characteristic, you start checking it.

          Once again, your use of checking is normal and fine in the dictionary sense. No problem there. “Checking” means something different in the RST namespace.

          You no longer test it because you just can’t. As such, the only person that can truly test an application by also respecting the historical terminology of ‘testing’ is a person that is not a software tester.

          That seems a very peculiar conclusion to arrive at, even in the everday senses of testing and checking. Testing something doesn’t close the door on testing it again, nor on applying the same oracles, nor on following the same procedure.

          Instead of thinking that checking is to testing what compiling is to programming I would rather say that it is what writing is to programming, you can’t have programming without writing something.

          I’m not sure that’s so. At least some context and some caution might be warranted. For instance, machine learning models are programs in most usual senses of the word; they accept input, process data, and produce output. Certainly humans are involved in the creation of machine learning models, but they’re not “written” in the usual way we talk about writing programs, as deliberate expressed instructions for a machine to follow.

          But we’re not generally saying ‘I’m writing a computer program by writing code that will compile and turn into machine code that the computer will process and output its result’, we’re saying that we’re just programming an app and everybody understands what we are doing. Just the say way we’re saying we’re testing something (as software testers) we’re actually saying that we’re checking some characteristics.

          “Everybody understands what we’re doing” is another pretty risky assumption.

          “Tests” that developers talk about aren’t “tests” that testers talk about. – sure, agree. Just the same way that tests that testers talk about are not the same as the tests the product owners talk about, or the end users. Why does that matter? You can add precise definition to all terminology and make everyone follow fixed rules, but you’d still end up with the same result. Clear expectations, communication and trust solves these kind of issues with way lesser effort.

          Would you agree that a key element of clear communication involves coming to certain agreements on what we’re talking about? Would you agree that some of those agreements are tacit, and not explicit? Would you agree that there’s a risk of tacit agreement being shallow and therefore misleading? That a big part of “clear communication” is not just assuming that we’re clear, but doubting or challenging that assumption from time to time?

          The distinction is already there, we all know that unit tests are not the same as integration tests, api tests, end-to-end tests, performance tests, exploratory and so on. Why is there a need to add an extra distinction, what will that bring to the table? What is the problem that this distinction is trying to solve?

          Again, I have no idea what you mean by something as apparently simple as “we all know”. Who is we? All of whom, specifically? “We” testers? “We” involved in software development? Does that include business analysts and managers? “Know” on what level; to what degree? I think what you’re trying to say here is “I have an idea of what I mean, and I assume that other people share that idea.”

          Here is one answer to your question, though. And here is another.

          Should we change the test pyramid with a checks pyramid, that has ‘check types’ and the people performing those checks are checksters?

          You do whatever you like, of course. But it is significant to recognize the difference between what Mike Cohn was talking about in his original post — developer- and code-focused automatec output checking, and what we would consider a model for testing.

          The entire terminology that a good majority is used to and understand should be adapted because some just prefer correcting others with ‘unit checks’?

          I notice that “we all” has suddenly changed “a good majority”. But apropos of “just prefer correcting”… what do you believe someone’s motivation might be for doing that? Might they be wanted to help avoid trouble, or conflict, or misunderstanding, or mischaracterization of testing work? By advocating not to make this distinction in our language, is it possible that you “just prefer correcting” us?

          If someone wants to say “unit tests” after we’ve had a conversation about it, I’m okay with that. Really. Old habits die hard sometimes; that’s all right. But I want to be mighty clear on what I mean, and what they mean, and how differences between those things can mess us up. Hence this post.

          Reply

Leave a Comment