DevelopsenseLogo

Why Test Automation Projects Fail (and How We Might Succeed)

Every fifteen minutes or so, in testing blogs, on LinkedIn, at conferences, or on the job, someone raises the question of “why test automation projects fail”.

Perhaps the question keeps coming up because there are so many failure modes, and so many possible answers.

Here’s one answer, though: “test automation projects” are software development projects, and all software development projects are vulnerable to failure. One of the bigger and more common reasons for failure is that people — designers, developers, managers, and clients of the project — don’t quite comprehend the scope of the task in question. When people don’t understand the role of humans in a task, they miscalibrate how completely and how well that task can be performed by software or machinery.

What, specifically, is being automated?

For instance: people might speak of an “automated test”. What specifically is being automated? In most cases that seem to dominate people’s thinking, three things, at most:

  1. Data entry. We can write code to provide input to some function via some interface, and then trigger that function. This is usually a matter of pressing some virtual keys, or moving a virtual mouse, and then pressing a virtual button on a virtual screen.
  2. Obtaining a comparable result. We might write code to calculate a specific value for comparison with product output in part (3) below. (In Rapid Software Testing, we often call that a parallel oracle, an instance of the Comparable Product heuristic.) Alternatively, we might write code to obtain that comparable value from an already-calculated table of appropriate results. Finally (and the way many people do it, so it seems), we can simply provide (3) with a hard-coded value. (In the case of hard-coding, it’s a bit of a stretch to say that we’re automating this step separately from (1) above.)
  3. Making a comparison. We can write code to compare the calculated, looked-up, or hard-coded value in (2) to some kind of output from the product or a process used to build it. If that output matches the comparable result, our code reports “pass”, or fills in a value on a table and colours that field green, or something similar.

Is that procedure a test? In the Rapid Software Testing namespace, we would say No. The procedure — which we would call a check — is, at best, part of a test.

Data entry and checking output is not a test.

For something to be a test, it must include human intention to learn something about a product, and potential problems in or around it. Human intention and risk analysis can’t be automated.

A test must include design of a means of probing or challenging the product somehow to reveal a problem. Design, as an intentional, human, social process, can’t be automated.

If the design involves an explicit procedure to be run by a machine, that procedure must be encoded so that a machine can enact the intention. That is, someone must write some code. Despite people’s cheery optimism about ChatGPT, that encoding process can’t be automated reliably.

When the intention and the design have been encoded, the automated data entry and checking of the output that I described above can proceed. When the check is done, is the test done? No.

A test requires human evaluation.

After an automated checking procedure, a test must include a judgment by a socially competent human to evaluate the outcome of the procedure, and to decide whether there is a problem somewhere.

After a “red” or “failing” outcome, a competent tester will not report a bug; not before questioning and investigating what has happened. Is the result red because of a problem in the product? A problem in the check? A platform problem? A synchronization problem? Maybe the data in the lookup table is corrupt, or out of date. How might the red result be fooling us? Critical thinking of this nature can’t be automated.

A red output from a check is not the end of the test; it’s the beginning of the investigation.

The tester must evaluate the procedure and the outcome and make sense of the result. The test isn’t really done until that evaluation has been completed. That process of evaluation, investigation, and sensemaking can’t be automated.

(Note that a “green” outcome isn’t really the end of a test either. A “green” outcome does not mean all is well with the product; simply that the procedure and evaluation haven’t signalled a problem. A product can return a correct result from a function and still have terrible problems that would be perceptible to a human observer—if only one were observing.)

After the investigation has been completed, the tester has learned something to be reported back to the project team. A light may glow red or green, but framing and contextualizing the report can’t be automated.

Maybe the test hasn’t revealed a problem. In that case, the message is “my testing hasn’t revealed a problem… yet.” (Note the difference between that message and “there are no problems in the product”. A good tester will never say that there are no problems, since that’s not provable.)

Perhaps the test has revealed a problem in the product; a bug. But bug or no bug, there is always the opportunity to learn something from the test. That learning can be fed back into a new risk idea, a new design, an improved check, or better post-check analysis.

(Truth to be told, there are, alas, some testers who don’t learn from experience of designing and performing the test, or who don’t apply or share that learning. The most egregious instance of this is when testers dismiss a confusing outcome as a “flaky test”. “That’s a flaky test” really means “that’s something I don’t understand and therefore probably should investigate”. A “flaky test” is a test that hasn’t finished!)

In any case, the machinery itself learns precisely none of this. Learning — the kind that can be abstracted and generalized and applied from experience — can’t be automated.

(“But Michael! What about machine learning?” “Machine learning” is a metaphor, not learning in any kind of rich, human sense. Think of “machine learning” as “algorithm refinement”.)

Testing is evaluating a product by learning about it through experiencing, exploring, and experimenting, which includes to some degree: questioning, study, modeling, observation, inference, etc. Testing includes design of experiments, human experience, critical thinking, focus on problems and risks, analysis and interpretation.

We’ve said that before. I’m saying here again, with a special focus on the idea that precisely none of all that stuff in the last paragraph can be automated. Thus any attempt to “automate testing” is doomed to failure from the beginning.

Heuristics for Success

Is there a way we might think about all this more optimistically and more productively? Yes! Much of what we do in testing can be augmented by tools, and much of the trouble we encounter in applying tools (including automated checking) can be reduced, eliminated, or avoided from the start.

Here are a few general heuristics for that:

  • Put the tester, and the tester’s mind set and skill set at the centre of testing work. Not the process model, not documents, not artifacts, not code, not machinery.
  • Consider the problems we’re trying to solve and the risks we’re trying to address. Consider the factors that drive those risks, too. (For instance, if someone says “we have to automate, because the developers have no idea what might have broken”, it might be time to pause and analyze the structures and interfaces of the product. Trying to “automate everything” isn’t a risk-based strategy; it’s more likely a sign of confusion and panic.)
  • Test cases are not testing; therefore, automating test cases is not automating the testing.
  • Consider replacing a strategy like “perform simple and shallow output checks to detect regressions” with “find problems that matter, whether they’re regressions or new problems”.
  • When you’re tempted to automate something to make it faster, pause and ask whether it’s worth doing at all. If it isn’t, don’t automate it; don’t do it at all. If it is, proceed to the next point.
  • When comparing approaches to testing something, consider more than reduced costs in execution time. Don’t forget other costs: design, development, encoding, testing, debugging, equipment, maintenance, interpretation, and analysis costs.
  • After you’re through that list of costs, consider two more: transfer cost (the cost of passing on responsibility to someone who was not originally responsible for the check); and opportunity cost (the degree to which doing this displaces the time or money with which we could be doing other, possibly more valuable things).
  • Using tools can be tricky. Writing code can be trickier. Yet if we apply them wisely, they can aid us in lots of aspects of testing. Use tools to extend, enhance, accelerate, intensify, or enable any testing task that we might need to do. Don’t over-focus on automated checking; consider automating anything that could be done efficiently, effectively, and powerfully by a machine.
  • Remember that tools may support testing very powerfully, but tools don’t do testing.
  • Learn from every problem in the product, and every problem we experience in the work that we do.

There will doubtless be lots more about these points in future posts.

To sum up: one could say that “automating testing” fails at least in part because some people believe that testing consists only of data entry, providing a comparable result, and then performing a comparison. They confuse testing with output checking. This an impoverished view of testing, leaving out all the stuff that happens before and after the check.

With that blinkered perspective, it’s easy to believe that we could write programs to “automate the test cases”. Many such programs target finicky, non-sentient, mechanistic data entry at interfaces designed for flexible, adaptable, sensemaking humans. This typically difficult and time-consuming. Amazingly, though, the claim is that this “saves time for ‘manual testing'”. This claim can only be valid if we also assess the cost of developing and maintaining the checks, and the cost of limiting the data entry to stable, simple tasks that miss lots of bugs.

But perhaps there’s another reason that “automating testing” fails: people often haven’t even considered what success would look like.

To me, successful testing means this: people who matter are well-informed about the actual status of the product and the problems in it, based on interaction, experience, exploration, and experiment with the actual product.

To be successful, consider reframing and broadening the mission of testing beyond checking the output. Remember that the mission is not to “automate the testing!” (which really means “automate some output checks!”). The mission is to find problems that threaten the value of the product, and the on-time successful completion of our work.

7 replies to “Why Test Automation Projects Fail (and How We Might Succeed)”

  1. This is a great post Michael.

    One thing struck me worth clarifying in the following:

    “What specifically is being automated? Three things, at most:

    1. Data entry
    2. Obtaining a comparable result.
    3. Making a comparison”

    Your point 2 heading implies measuring or capturing a value from a function via an interface that can be automated. However, the description you give appears to focus exclusively on dealing with parallel oracles and detemining what a “true” or good value should be.

    I think there’s a fourth useful distinction:

    1. Data entry
    2. Measuring or capturing raw output.
    3. Determining a comparable result.
    4. Making a comparison”

    I can recall previous instances where 1 & 2 (often 3, but not always) were automated and 4 and sometimes 3 were not: driving input and capturing raw output, then using a subject matter expert (a human oracle) to scan the output (making a comparison) looking for problems.

    Paul

    Reply
  2. Thank you so much for your post. With this, I am more deeply understand about automation check the output (not automation testing). Tools can help us in verify the output check with our expected criteria but only human with brain and eyes could really do the test. But in your opinion, should we really apply automation? Is automation really worth applied for long-term investment? Thank you. Hope you can spend time to answer.

    Reply

Leave a Comment