Regression Testing and Discipline

Another tester on an “Agile” team complains of being overwhelmed by the volume of regression testing he says he must do at the end of each sprint.

Why are some development organizations fixated on regression testing? Not why do they do it (that can be quite reasonable), but why are they fixated on it? I have a theory.

It goes without saying that every change to the product or system holds the risk of problems that could cause quality to backslide in some sense. That’s regression, slipping backwards to some presumably less advanced state. Regress is the opposite of progress.

With change, there’s a risk of regression, so it seems sensible to focus some testing on that risk. But is testing a sure-fire, reliable way to deal with the risk of regression?

Sure-fire? No. Testing can certainly help to find bugs, so that bugs can be recognized and dealt with. But no matter how thorough testing is, or how early it starts, testing can miss bugs too. So let’s remember that the easiest bug to deal with is the one that is never hatched in the first place; the next easiest is the one that gets squashed before it can bury itself in a mass of code.

No matter how skillful or powerful the testing, to some degree, finding a bug remains a matter of luck. In the face of regression risk, we’d prefer not to leave things at that; better to start with fewer bugs to reduce our dependence on luck. Thus, it would seem like a good idea for the people making the changes to avoid bugs by working in a careful and disciplined way.

Discipline, says Chambers, is “1. training designed to engender self-control and an ordered way of life; 2. The state of self-control achieved by such training.” The idea of self-control suggests the idea of agency, which is essential to exploratory work, which is in turn essential to engineering work.

Depending on the product, the project, and the preferences of the individual programmer and the programming team, what might we see and hear as they did disciplined work? Try pausing for a moment to remember the scene when you noticed people doing work you considered “disciplined”.

How’s your list? Here are a few things I’ve seen and heard from time to time in work I’d call “disciplined”:

When a change or a new feature was on the table, groups of people reviewed and discussed ideas to understand the change and the motivation for it. Talk was focused on making the system better, and on the problems that the changes were intended to solve. But that focus softened and sharpened, zoomed in and zoomed out, and moved around to help people see everything they could see—including problems. People often disagreed, but they were willing to try little experiments to sort out the disagreements.
I’ve seen people consulting with colleagues and with users to get a variety of ideas about design, implementation, and risk. Conversations happen at desks and in conference rooms, but also outside the office, in restaurants, eating, drinking, joking, walking, playing games, shopping… Discipline gets relaxed sometimes. Social life can foster trust and responsibility that helps people aspire to discipline.
I’ve seen people using talk, text, tables, sketches, diagrams, stories, mind maps, toys, and props to help describe things in lots of different ways for analysis and for memory. Disciplined work often seems associated with careful note-taking, too.
In disciplined shops, order doesn’t necessarily come right away; sometimes it has to be bootstrapped. Stuff tends to start messy and get more tidy if it needs to; when things get too formal too soon, ideas get lost. Development work is one way of life, and a self-controlled, ordered way of life often starts with being uncontrolled and disordered when we’re starting to build something new. Order emerges.
Some disciplined places were quiet and focused, but in others I heard lots of regular background chatter, too. Highlights were stories about how people solved problems—and created new ones on the way. Storytelling of this kind helped people to think about risk in a vivid way, which prompted thinking about discipline.
I’ve heard open and honest disagreement when there were things worth disagreeing about. I’ve people getting upset… and taking responsibility for working things out. Discipline isn’t always smooth.
I saw builders paying attention to testability—which includes simplicity, cleanliness of code, modularity, visibility, and controllability—to make it easy to do less expensive deeper testing later on.
In the disciplined shops, the developers were resolved not to take on too much change all at once. They would make patient, careful, reflective, unhurried changes, and try them out themselves. When they felt the work was ready for other people, they’d make it easily accessible, asking for and getting feedback right away.
While designing, building and trying things, developers would try to anticipate potential exceptions and error conditions, and they’d generally be quite successful. Then they would give the product to someone else to test, whereupon they would learn something about what they had missed.
Developers who were really good at debugging carefully tried out specific little changes as they worked on solving a problem.
The disciplined builders would tend to have a sober preference for reliable, widely-used, field-tested components over a mad rush to implement new stuff developed from scratch. As a consequence, there tended to be fewer surprising bugs.
I’ve seen programmers whose style was test-first or test-driven development—and who were given the time to apply it. And I’ve worked with disciplined programmers who don’t bother with TDD, exercising discipline in other ways.
I’ve seen code that contained inline assertions in debug builds. I’ve seen exception handling built into the product and logs to report on its status. (Every now and again, I see well-thought-out, helpful error messages.)
I’ve seen see developers checking their own work with configuration checks, unwanted-change detectors, and unit testing, including programmed output checks.
I’ve watched people spending hours and days in each other’s offices or cubicles, doing pair programming for immediate, real-time review.
I’ve seen formalized review sessions throughout—wherein new developers learned from more senior developers and, interestingly, vice-versa.
I’ve seen developers using lots of appropriate tools to see hidden things, or to see unhidden things in different ways (e.g. IDE syntax checking while writing code; attention to compiler warnings; database schema diagramming; dependency checking; profiling for performance; etc….);
I’ve seen consistent refactoring for readability, maintainability, and portability; paying down technical debt, as they say.
I’ve listened in on discussions about the development of shared coding styles, which also helped with readability.
I’ve observed developers keeping careful notes about setup procedures and configuration settings.
I’ve watched the entire team working collaboratively throughout so that there are lots of eyes and minds to notice things that could go wrong.
I’ve seen teams cultivate good relations with technical support.
I’ve noticed disciplined people who went home consistently on time. Also, disciplined people who stayed late from time to time.
In disciplined shops, I’ve seen shared skepticism about the completeness, accuracy, or relevance, of requirement statements, acceptance criteria, or a “definition of done”. Amidst optimism, I’ve noticed a suspended certainty about whether things were really done.
Disciplined shops often do frequent bursts of shallow, non-invasive interactive testing near the coal face, to help confirm that what the programmers were doing is reasonably close to what they intended to do.
I’ve seen project managers provide support staff, including people to set up test systems, to help keep track of the backlog, and a group administrator to help the manager in acquiring resources.
I’ve seen frequent building, to make builds for deep testing and bug fixes available at the drop of a hat. But I’ve also seen relatively infrequent yet still reliable building, too.

These are ideas and practices I’ve seen people applying to help them keep on track while building products. Most or all of these things would be done by the developers in collaboration with people working reasonably close to them (some of those people might be testers, and others might not be).

Each item on the list lends a kind of discipline to a development process. Each one represents something people might mean when they murmur something vague about “building quality in”. They’re heuristics, not rules. No one did all of them. I’ll bet you’ve got a ton of stuff on your list that’s missing from this list. Notice, too, how each item above could represent disciplined action in one context and a lapse of discipline in some other context.

Discipline doesn’t have to be burdensome, bureaucratic, or otherwise slow. Informal actions can support discipline, and help people find out where they might need to apply discipline. Remember, according to Chambers, discipline means “self-control to obtain an ordered way of life”; the self-control part suggests that discipline comes from within, rather than being imposed from outside.

Some forms of discipline might feel slow to some, at first, but prudent driving feels slow to people who are used to driving recklessly. When we’re driving, we almost always drive more slowly than we could possibly drive. Driving faster than that increases the risk that we’ll arrive late—or not at all.

Some of the discipline-related activities above represent some form of testing; others don’t. However, the processes of building a product are very different from the processes of experiencing a product. Bugs, especially of the latter kind, can elude even a disciplined development process. Accordingly, it makes sense for there to be different kinds of testing: testing for examining a product as it’s being built; and testing for obtaining experience with the built product.

So when builds are available, it’s probably wise to do some periodic deeper testing, some of it focused on potential, reasonably foreseeable, undesirable effects and side effects of a change—the risk of regression. That regression testing can be far better targeted when the product has been carefully built and already tested to some degree.

Deep testing doesn’t have to happen on every build; indeed, it probably shouldn’t. In lots of places, it can’t. Testing for hidden, rare, subtle, intermittent, emergent bugs tends to take time—the kind of time that can interrupt or slow down development. It can take a while to set up data and tools for deep testing. When systems have complex interactions, problems emerge at the interfaces between things that worked fine on their own. Working out those interactions and studying them in a search for problems can take time. That time might be worthwhile when safety or health or money are on the line. If there’s discipline in the building, the rewards of testing a build deeply tend to dominate the risk of skipping a few well-controlled builds.

Critical distance can aid deep testing to be done by people at some critical and even social distance from the people who are changing the product. Risk is a big deciding factor on that score—including the risk of regression.

And there’s the rub. In many organizations, people don’t mandate, or foster, or do well-disciplined work; or they exercise discipline in a very shallow way, cherry-picking one or two items from the list above, and ignoring the others. In such organizations, it seems as though the object is for the developers to write code, rather than to write code that works.

But perhaps, triggered by subconscious recognition of the risk of regression, managers (and, often, testers) feel compelled to do an overwhelming amount of expensive work: sitting at the keyboard and repeating every scripted test procedure that has been performed before, as quickly as possible. When you ask them why, they often reply, “because the developers have no idea of what might be affected by this change.” Then some of them proceed to convert those scripted procedures into automated scripted procedures, whereupon they gain a second undisciplined development project and a new maintenance nightmare. And they feel even more overwhelmed.

If someone feels overwhelmed, that’s a sign that there’s something probably something overwhelming going on.

If the developers really do have no idea about what might be affected by change, then that’s a problem—one that the organization should definitely address. It’s like the principle that you shouldn’t try to automate a process that you don’t understand; when you’re working with something important, you shouldn’t rush to change it unless and until you’ve got a reasonably good idea of the extents and effects and risks of the change, and how to manage them.

Now: there’s a problem here for testers. Testers don’t design, write, or fix the code. Many testers don’t have significant programming experience; of the few who do, few have experience with writing production code. Testers don’t manage the project, and very few testers indeed have been project managers. Testers don’t manage the developers. In light of that, it’s inappropriate, in my view, for testers to tell programmers and managers how to do their jobs. Testers cannot and should not try to force, or enforce, discipline.

It’s quite reasonable, though, for testers to report on problems with the product. It’s reasonable for testers to identify patterns of problems related to particular coverage areas or quality criteria. It’s reasonable for testers to report on patterns of regression-related problems.

It’s also reasonable for testers to report on where testing time is going. If investigating and reporting shallow bugs is dominating testing work, testers will obtain less thorough coverage of the product. Developers and managers need to be aware of that. If troubleshooting and maintenance of automated checks is swamping the testers’ ability to gain critical experience with the product, that’s noteworthy; that work will displace the testers’ opportunities to learn about the product deeply, and perform new experiments on it. Things that slow down testing and make it harder allow deeper and possibly more dangerous bugs to hide and survive.

That’s why it’s important for testers to learn the skills of analyzing and describing the state of the product, the state of the testing, and the quality of the testing—including problems that threaten any of these things. It seems that managers and developers are often unaware of problems of lapsed discipline. Testers shouldn’t be trying manage the project, but they can shine light on the problems.

Obsession with regression testing is a hint that something else might be amiss in the process that leads to it. Sure, it’s a good idea to do some testing after a change. But it’s a lot less expensive to test after a change when people have been testing during the change.

Discipline is a heuristic for reducing the risk of regression and the need for regression testing. When people apply discipline, the effects of change tend to be better known, the code tends to be cleaner, the feedback loops get faster, and the risks tend to be lower—and deep testing can become targeted on the risk, faster, cheaper, and deeper—helping to find hidden problems that matter.

====================

I’m presenting Rapid Software Testing Explored Online November 9-12, timed for North American days and European/UK evenings. You can find more information on the class, and you can register for it.

James Bach teaches in European daytimes December 8-11. Rapid Software Testing Managed is coming too. Find scheduling information for all of our classes.

1 reply to “Regression Testing and Discipline”

Testing Bits: 361 – October 4th – October 10th, 2020 | Testing Curator Blog

October 11, 2020 at 10:08 am

[…] Regression Testing and Discipline – Michael Bolton – https://www.developsense.com/blog/2020/10/regression-testing-and-discipline/ […]

1 reply to “Regression Testing and Discipline”

Leave a Comment Cancel reply