DevelopsenseLogo

Making Progress on Regression Testing

This post picks up on a small LinkedIn essay from a few months back. There’s a fair amount of preamble here before I talk about regression testing as such. Be careful; you might have heard about testing and checking from people who don’t talk about it the ways we do in Rapid Software Testing (RST). If you’re familiar with RST, maybe you’re fine jumping here. If you’re not so familiar with RST, it might be better to start from the beginning — or maybe you can take the direct jump down, and then come back. Or maybe you want to look at the premises of RST, or check out this diagram, or find out how RST is different from factory-style testing.

You’re the one who’s reading! You get to decide!

Go out and look on the Web today, and you’ll likely find Yet Another Post About Regression Testing, asserting that regression testing should be automated, because regression testing is mechanical and repetitive.

It’s true that checking the output of functions mechanistically and repetitively is mechanical and repetitive, but that’s a tautology—one that oversimplifies what really happens in regression testing. Worse, it might also lead us to oversimplify our notions of what might need to happen, and what skills we might need to apply. Let’s see if we can improve on those notions.

Testing and Checking

In the Rapid Software Testing Namespace, testing is the process of evaluating a product by learning about it through experiencing, exploring and experimenting, which includes to some degree questioning, studying, modeling, observation, inference, etc. A test is an instance of testing.

There’s this part of a test that we call checking, short for “output checking”. Checking is the process of operating and observing a product; applying decision rules to those observations; and then reporting on the outcome of those rules; all algorithmically. That is, a check can be turned into a scripted process that can be performed by a human or by a machine.

A test may include one or more explicit checks. A test tends to include a whole bunch of implicit and tacit checks too.

So what’s the big deal about that?

Suppose our Web app presents a total dollar of all the user’s purchases for the past year, and we want to alert ourselves to a problem with it. We can use a framework help us make the product do something like submitting the year to a field on the appropriate page, and then clicking on a submit button. Then we can look in the browser’s Document Object Model, find the appropriate element on the page, and check to see that it contains the right result. If we don’t find it, we have reason to suspect a problem somehow.

If we do find the right result, that suggests that the communication between the browser and the Web server went right; a modestly sophisticated database lookup went right; a function that summed the value of the purchases went right;… And if you use code and tools to check for the right output, similar functions in your check had to go right as well. All good!

When a check presents a happy outcome, it affords the belief that everything went right. However, it’s not necessarily the case that everything went just right everywhere.

  • Maybe there’s an error in the database lookup. The sum that got returned would have included purchases from other years too, but the test database only included one year’s worth of transactions.
  • Maybe the check code works this time, but contains a bug such that it only ever returns data for the year we happen to be checking for, and misses problems related to other years.
  • Maybe performance was intolerably slow, but the “self-healing” feature of the framework “corrected” the problem without drawing it to our attention.
  • Maybe the user’s query wasn’t logged as it should have been.
  • Maybe the browser element wasn’t properly set up with appropriate ARIA attributes for people with disabilities.
  • Maybe the gods of CSS determined that the output would be invisible, or hard to see, or covered up by something else, or off screen…

And maybe, since the last build, something subtle changed in the code or the data in such a way that it’s invisible to the existing suite of checks. We’ll return to that point.

In any case, every test that includes even a single explict check needs a bunch of stuff to be done before a check can be run; and a bunch of stuff afterwards, too.

A responsible tester creating the check needs to

  • learn about the product
  • consider risks associated with it
  • identify some aspect of the product that’s relevant to that risk, and therefore worth checking
  • identify how to check it
  • design a check
  • encode the check (whether as program code or as some kind of explicit procedure for a human to follow)
  • make decsions about when and how the check will be run

The check (or suite of checks) gets run completely algorithmically. It’s a mechanical process. The tester might observe it directly, but more often than not, checks tend to run unattended, and aggregate a bunch of results.

After the check ends, though, a responsible tester has work to do to finish the test. The tester first has to observe the output of the check. If the output from the check is “red”, suggesting a problem, the tester responsible tester doesn’t run to the developer or file a bug report. The tester must typically

  • investigate the situation
  • reproduce and pinpoint the problem
  • perform some analysis to try to determine the reason for the problem
  • decide whether there’s a problem in the production code, or in the check code, or in the environment, or in the relationships between and and all of them
  • if it’s a problem, consider and decide whether it’s a problem that matters
  • if it is a problem that matters, consider decide how to report it, and to whom
  • gather evidence required to support the belief that it’s a problem, and that it matters

Whether the check ran red or green, a responsible tester will go through this process too — sometimes publicly and explicitly; more often privately and tacitly, and not always entirely consciously. In other words, a responsible tester will maintain a degree of professional and pragmatical concern about the possibility of being fooled by green checks.

Problems? Where? What Kind? How to Find Them?

Where, or how, might we look we see a problem about the product? We would do well to look at the product from varying perspectives; to model the product based on those perspectives, and then cover the models with testing. In Rapid Software Testing, we model the product in terms of product elements or product factors:

  • structural elements of the product
  • functions that make things happen or change in the product
  • data that the product takes in, processes, stores, puts out, and deletes
  • interfaces that afford the opportunity to control the product, or to get access to its data or state
  • platforms upon which the product depends
  • operations that users, adminstrators, perform with the product — and the conditions under which those operations are performed
  • time, and its relationship the the product

What quality criteria important to users might be threatened by some problem about the product?

  • capability
  • reliability
  • usability
  • charisma
  • security
  • scalability
  • compatibility
  • perfomance
  • installability

Some product problems may matter only indirectly to our customers, and might matter more directly to the business and the development group:

  • supportability
  • testability
  • maintainability
  • portability
  • localizability

How might we go about looking for those problems? Test techniques!

  • function testing — test everything the product does
  • domain testing — divide and conquer the data
  • stress testing — systematically overwhelm the product, or starve it, or undermine it
  • flow testing — do lots of things, one after the other, without resetting the system
  • scenario testing — develop compelling stories about how people might use or misuse the product, or about other things affected by the product
  • claims testing — compare the product to the things that important people say about it
  • user testing — systematically engage with or simulate users of the product
  • risk-based testing — identify something that could go wrong, and then perform tests to reveal how the Bad Thing could happen, or the Good Thing could fail to happen
  • automated checking — use machinery to operate the product, collect data, apply decision rules, and report the outcome, all algorithmically

Product Elements, Quality Criteria, and Test Techniques are all elements of the Heuristic Test Strategy Model that we use to help develop and checklist ideas that guide our choices in testing. You can find that here.

Four Testing Activities

Testing, in general, addresses the risk of oblivion to problems that the development group and the business would probably prefer to know about before it’s too late.

“Product” is easy to conceive in terms of a built, running piece of software — a Big-P Product, but we can think more generally: a product is anything that anyone has produced.  On the way to developing the Product, we’re constantly developing small-p products that inform it or become part of it: components, functions, features, designs, specifications, one-line changes, ideas…

Testing activity can be applied to any of these products. We might call some forms of testing “review”, and the experiments might take the form of thought experiments, and the experiencing may take place in our imaginations or in our reasoning. Nonetheless, we’re exploring ideas or artifacts to gather information, in order to learn and inform our decisions.

Generally speaking, there are at least four families of testing activity that we engage in. As you’ll see, only one of the four is mechanical and repetitive.

New testing

New testing is experiencing, exploring, and experimenting with something for which we have not yet developed a working mental model.

The point of new testing is, obviously, to find bugs. Less obviously, the point of new testing is to learn about the product and how to test it; to identify risks; to develop models of the product; to develop ideas about how to test the system more deeply and more efficiently. Sometimes, new testing can persuade us that there’s nothing too risky nor dangerous.

New testing tends to be interactive, requiring the tester’s presence for the most part. This no way precludes applying and developing tools as we go; for testers, using tools is normal and natural.

New testing is a heuristic process of mental evolution. It changes us and our minds. It’s transformative, rather than transactional. It’s not mechanistic. It’s not repetitive.

Retesting

Retesting is experiencing, exploring, and experimenting with something for which we have already developed a working mental model — by testing it. We often perform retesting in response to some change in the product — but not necessarily. We might do retesting when we’re investigating a bug; when we want to revisit or refine our mental models; when we recognise new risks; when we worry about having missed something; or when we want to broaden or deepen our coverage.

That exposes a bit of a paradox here. Our working mental models are never complete, and they tend to develop in parallel with the product, but not entirely simultaneously. Retesting might entail some new testing, and vice versa.

As with all forms of testing, retesting can be aided by tools. For instance, I’ve been having fun lately with jq, a library for probing the contents of JSON files and data structures. Results from retesting may also suggest new ways to use tools, to develop new tools, or to go deeper using the tools we’ve got. As I probe more deeply, I learn about how to model the product, and how to cover my models with testing. You might want to see some example of coverage models; look here.

Retesting is about applying our working mental models of the product. Again, retesting is typically done in an interactive and exploratory way—a heuristic process of mental application; neither mechanistic nor repetitive.

Hold on, though… where’s scripted checking? Isn’t that part of retesting? Hang on for a moment. Before we can do scripted checking, we’ve got to get the checks developed.

Check Development

Check development means designing, coding, or refactoring algorithmic processes to check output from functions in the product. Check development is intrinsic to the process of test-driven development (TDD), which includes the production programmers developing low-level output checks as a means of aiding design. Check development is also an activity inside behaviour-driven development (BDD), which focuses on producing checkable examples of (typically happy-path) product output.

At first, the checks in both TDD and BDD are focused on confirming that the code is reasonably close to what the developers intend it to be. Later, the checks help to maintain development discipline, alerting developers to easy-to-find, avoidable errors as the code is being changed. If there’s a regression covered by the checks, the developers find out about it right away.

There’s another kind of check code, too: formalized, procedurally structured, deterministic, algorithmic test cases; “Do this; do that; do this other thing; then observe this result. Then compare the result to some presumably desirable value that has been specified, looked up, or computed.” This kind of check is not really oriented toward the design of the product, but to confirming and demonstrating that the product can work.

In the Bad Old Days (and alas still in many places today), the “code” would take the form of a written script of behaviours — typically intended to simulate something that an end-user might do — for the tester to execute. The approach here is to make testers ready to apply those behaviours to the realized product as soon as it’s available.

Somewhere along the line someone realized that people aren’t very good at acting like machines, and they don’t like it very much either. If it’s mechanistic behaviour we seek — pressing buttons, making specific observations, comparing output, why not get a machine to do it? So the “code” of the scripted check turns into real code, to be executed by machinery.

On a typical development team, there tend to be at least a few people who can write code, and there are sometimes people in a dedicated testing role. Really good check development requires two sets of expertise: the specialist, technical skills of analysing and investigating risk from a critical perspective, and the specialist, technical skills of writing code. It’s rare for someone to be really great at both. Fortunately, those two skill sets don’t have be embodied in the same person.

In Rapid Software Testing, we speak of a role called toolsmith. A toolsmith is a developer — someone inclined towards the builder or maker role, whose statement of purpose goes something like “I will make things that end your troubles!” A toolsmith’s job includes writing and maintaining code and tools to help testers — those people inclined to the critic, investigator role, whose statement of purpose is “I will find problems wherever I look — and I will look hard for them.”

Toolsmiths can develop all kinds of stuff to aid with new testing and retesting. For instance, a toolsmith could develop scripts to support automated flagging — identifying data or pages that warrant interactive observation by tester. (There’s a whole video about that here.) Toolsmiths can write code to generate data, or to visualize patterns in the output, or to sort and search and sift log files.

One kind of code that the toolsmith can write is, of course, check code. When we put a skilled toolsmith in partnership with one or more skilled testers, we can get around the problem that not everyone is great at everything. Better yet, toolsmiths and testers can collaborate and build on each other’s expertise, knowledge, and experiences to build on their own.

If there is no toolsmith available, testers can certainly write check code for themselves. But be careful! I can tell you from experience, writing code can be super powerful and satisfying. It can also be highly engaging, dazzling and hypnotic, preying on our attention and our ambitions, with little bursts of accomplishment providing little burst of endorphins. The happy hours flying by writing code can easily displace the goal of studying and interacting with the product — and the less happy moments trying to get the danged check code running can do so even more easily.

Like all software development, check development is an evolving, heuristic process; neither mechanistc nor repetitive.

Scripted Checking

After new check development is done — after the checks have been designed, coded, tried, troubleshot, debugged, and fixed — the checks can be run over and over when the product is changed for new features and bug fixed.

You might choose to thinking of scripted checking the same as retesting. It isn’t though. It’s a specific activity that can happen as part of retesting, inside of retesting, and I might agree. Scripted checking doesn’t happen in all retesting, though — and since it’s strictly algorithmic, it doesn’t involve either mental evolultion or mental application. The stuff before and after the scripted check runs — that requires human mental evolution and mental application, social competence, actual concsciousness.

Do scripted checks help to find bugs? Sure; sometimes — the kinds of bugs that are covered by the checks. And that can be valuable. As noted above, though, scripted checks can’t tell us that everything went just right everywhere. Scripted checks don’t notice unanticipated problems — and that’s okay, as long as we have other means of identifying them. Scripted checking is, simply, applying existing algorithms to check output — which is very handy for catching easy bugs relatively near to the time and place of their creation.

When people say that regression testing is mechanistic and repetitive, this is what they’re referring to, and it’s all that they’re referring to. They’re talking about the act of pressing (virtual) buttons on (virtual) keyboards, and doing some computations to make a (virtual) light turn green or red. They can’t be talking about new testing; nor about retesting; nor check development. Those things aren’t mechanistic and repetitive.

Regression Testing

In the Rapid Software Testing namespace, regression testing is testing motivated by change to a previously tested product — or, if you like, testing focused on regression-related risk.

What kind of problems might be regression problems? The product might slide backwards from the perspective of any of the product elements, and any of the quality criteria mentioned above. We might discover a regression problem using any kind of test technique.

What does this mean? It means that scripted checking is not the only form of regression testing.

Literally any test, in any form, whether it’s new testing, retesting, or check development, can be a regression test when it’s movitivated by the risk of regression. Any test activity that we perform, even when it’s not motivated by or focused on regression risk, has the chance of exposing a regression-related problem.

Scripted Checking in Regression Testing

When people think about regression testing, they tend to leap towards thinking of running existing, already-written checks of one kind or another.

Regression testing might start this way, with the scripted checking activity mentioned just above. Scripted checking by definition is an algorithmic, mechanistic, repetitive process that can be automated, whether in the course of simple and quick developer checks, or integrated checks in a build pipeline, or “end-to-end” checks that may get run from the GUI.

The big deal in people’s minds, so it seems, is to demonstrate that having produced this output before, the product will produce consistent output after the change, too.

That’s a fine thing, if everything runs green. We might treat green checks as a quick indication that all is well, and that there hasn’t been regression. A responsible tester worries about the possibility that our mechanistic, repetitive checks could be fooling us. If we want to do regression testing responsibly, we must at least consider the other three testing activities as well, none of them mechanistic and repetitive.

Retesting in Regression Testing

When confronted with a check that’s reporting an inconsistency, we need to do some degree of retesting to investigate.

Sometimes that starts simply, with running the check again. If the check runs green, a not-very-responsible tester might shrug, and dismiss the anomaly as a “flaky test”. A more responsible tester will apply her existing mental models and investigate, trying to reproduce and pinpoint the problem. That might involve performing similar tests; varying the data; considering different platforms; assessing what factors might or might not matter.

All this happens in an interactive and exploratory way. Even though you can apply your working mental models of the product, there’s no pre-defined algorithmic process for analyzing a potential problem that you’ve never encountered before. Retesting is a heuristic process, with degrees of mental application and mental evolution. When we’re looking for regression risk, retesting need be neither mechanistic nor repetitive.

New Testing in Regression Testing

When we’ve been notified of some change in an existing product, we may focus some new testing on learning about implications of the change — including the need for new checks. Moreover, discoveries that we make during retesting might help us to recognize the limitations in our working mental models. Those discoveries may prompt us to do new testing. That is: since, after a change, we’re applying our existing mental models to perform new testing, we are to some degree retesting! We often go back and forth between the two.

New testing in the context of regression testing tends to be interactive; a heuristic process, leading to mental evolution; not mechanistic; not repetitive.

Check Development in Regression Testing

After a change in the product, we might choose to develop some new checks to address the parts of the change we’ve been told about. We might also develop new checks in response to product failures, or to ideas that occur to us while retesting, or while doing new testing. Upon discovering that we’ve got bad or unhelpful checks, we adapt and refine some of the checks we’ve got, or expand their scope. We might even choose to retire checks that aren’t relevant, or that don’t address important risks.

Again, check development, like any kind of development, is an evolving, heuristic process; not mechanistic; not repetitive.

Wrapping Up

Many of testing’s clients — managers, executives, developers — and even many testers have a legimate goal; they’d like to defend against the risk of regression. Automated regression checks can be a good way to detect the easy regression problems.

The trouble is, not all regression problems are easy to notice. Like any other problem that we might encounter anywhere along the way, some regression bugs are deeply hidden, rare, subtle, intermittent, condition-dependent, platform-dependent, timing dependent. And many bugs — including many regression bugs — are emergent. The problem is not in any of the product element per se, but in their interactions with each other, and in their encounters with the world outside the development shop.

Such problems can be elusive, getting past our checks. Just as burglars can enter a building without triggering the alarm system, bugs can evade our automated means of detecting them. That’s why secure installations don’t rely on instrumentation alone; they engage people to walk around and look for trouble.

Automated checks can be useful, but let’s not ignore their limitations and what it takes to develop and apply them skillfully. Let’s pay attention to the mental application and evolution in regression testing. Let’s notice the check development, retesting, and new testing that goes on while we’re focused on regression risk.

2 replies to “Making Progress on Regression Testing”

  1. Great post 🙂

    A couple of things resonated with me. The first one was making decisions about when and how the check will be run. I am working through a service transition process with my team at the moment, and trying to have some grown up conversations about business objectives and how we support those with regression testing on an ongoing basis. Just looked at a reasonably tight test design that was a good mix of specifying coverage – while leaving the tester free to explore.

    The other thing that resonated was the danger of the false green tick. A company I used to work for started treating its automated regression checks as an all knowing oracle that did not need to be scrutinized on an ongoing basis. They let a lot of their testers go as a result. At one point it took them six months to realise that their deployment process to the test hardware had broken, and that they were just constantly testing the same build, over and over.

    Reply

Leave a Comment