AI and Rapid Software Testing

In our forthcoming book, Taking Testing Seriously: The Rapid Software Testing Approach, James Bach and I have included a chapter on AI. AI is fraught with risk, but writing about it is too.

All through its history (since the 1950s, NOT just since 2022 or 2012), “AI” has not been an engineering term, but a marketing term, without clear notions of what “artificial intellience” really means. And all along, the field has been subject to spasms of economic and technological climate change — sudden changes of season, with increasingly hotter summers and colder winters. (It’s still blazing hot out there at the moment, but there are signs of a fall coming for those who have experience in watching the weather.)

How do we talk about AI for a book that will be in print (we hope!) for years? One way is to talk about magic.

Let’s assume someone develops a technology in the form of a magical tool that will do anything we ask it to. When we look through history and literature, we find stories about magical forces and magical genies that enable them — and we find cautionary stories about the pitfalls. Those stories consistently remind us that great power to do good things comes with great power to do harm to ourselves or to others.

A magical system is one whose behaviour is neither controlled nor understood, nor otherwise known to be safe. Given that, a few heuristics:

If something matters very much, it’s probably not a good idea to rely on magic to get it.
If something matters very much, and you have no other way to get it, you might as well try magic.
If something matters very much, and you are trying to use a magic box to get it, you cannot ignore this question: is the magic box doing a good job?

A responsible approach to developing or using AI-based products and tools requires knowing enough about them to be aware of the risks and avoid the traps — which means you must learn how to test them, to look at them critically, and to experiment and obtain experience with them. That’s espeically necessary when a culture becomes dominated by wishful thinking and hype.

It takes only a little testing to discover that the more lavish claims currently being made for generative AI aren’t supportable. Because of that, in the end, the hype will recede. As it does so, we believe it will be very interesting to observe some specific phenomena associated with historical patterns of magical thinking.

People making claims about saved time based on one part of the process, without actually measuring the amount of time that the whole process took. It takes no time for the magic box to type code. It takes a lot more time to review, evaluate, test, debug, fix, and revise the code that the magic box generated — if you review, evaluate it and test it seriously and responsibly.

As experienced programmers have been saying forever, the analytical, social, and problem-solving parts of the job are the important ones. The typing part of programming time is, relatively speaking, trivial. It’s even possible that typing provides a bit of useful friction as programmers wrap their minds around problems and how to solve them. Typing the code certainly affords familiarity with the code that’s being produced—which is skipped when the code is being extruded from GenAI.

Meanwhile, from a testing perspective, going quickly but not too quickly to observe and evaluate is crucial. One of our most important testing tools is the pause.
People basing their assessments of “saved time” on feelings, rather than actual measurement of time. I know that when I’m in The Zone, developing something often feels like it’s taken practically no time. When I look at the clock, though, it’s a different matter. (And here’s a study that bears that out.)
The endowment effect, in which we tend to ascribe value to something that we’ve got — which in turn leads to sunk cost bias. For less experienced programmers, GenAI affords an IKEA effect for vibe-coded software (“I didn’t make it, but I assembled it, which makes me feel like I made it”, which bestows the endowment effect).
People insisting that the magic box did a good job and saved them time — even if it didn’t do a good job or didn’t save time for those other shlubs who “don’t know how to use it”.
Oblivion about context factors that frame safe and unsafe applications of the things that come out of the magic box. For instance, failing to distinguish between programs generated to help a non-programmer organize a recipe book vs. programs that help to run businesses or governments.
Gradually increasting awareness of the Large Language Mentalist Effect, wonderfully named and described here.
As people are affected by hazards and problems, growing recognition that as work is delegated to the magic box, the need for skepticism and scrutiny of its output goes up. One component of this is that assessment of the magic box’s behaviour requires expertise in the domain into which the magic box is being placed. Harry Collins’ fancy term for this is the Legitimate Locus of Interpretation. Claims that a particular job will be made obsolete are ridiculous when they’re being made by people who don’t understand that job. A long time ago, Jerry Weinberg pointedly warned “Never automate a process you don’t understand.”
Gradually increasing awareness of patterns of failure in the magic boxes — patterns that can help us to anticipate, test for, recognize, and mitigate the possibility of trouble.
Accommodation of the magic boxes in a more sober, pragmatic, limited, and ultimately safer way. That is: understanding of when it might be okay to use magic, and when to avoid it.

The hype will go away; hype always does. But the technology—in various forms—will not. Responsible people will need to know how to use the technology wisely. That fact and our work on the book have prompted (no pun intended) us to create new material for our classes, including RST: Testing, Automation, and AI (description here) and Rapid Software Testing Focused: AI (here).

All along, our goals have been to help people to learn strategies for testing any kind of software or technological product, and for using tools powerfully and responsibly. That now includes testing products that incorporate AI, and using AI-based tools for testing, too.

We’re not new to this. I was working an a project that incorporated extensive machine learning algorithms in 2018. James first used an AI-based product as a target for our Rapid Software Testing Applied classes in 2020. We convened three sessions of Workshops on AI and Testing (WAIT I, II, and III), exchanging and developing ideas and experiences with developers and other testers.

And all the way along, we’ve been testing! We’ve also been incorporating reports on our findings and our approaches in classes, blogs, webinars, conference presentations.

Expect that to continue.

Leave a Comment Cancel reply