DevelopsenseLogo

Testing, Now More Than Ever

To all managers and executives: despite how it’s in fashion these days, it’s not a good time to be laying off testers, or to be leaving them unprepared and untrained.

Software can be wonderful. It can help us with all kinds of stuff, unimaginably quickly and at enormous scale. This sounds very appealing. Skilled testers, at least, have always known that we must treat output from machinery with appropriate skepticism — because machinery and software are created by people, and people are fallible. The consequence of fallibility is that bad stuff and harm can arise at the same speed and scale as the good stuff.

When someone has written a computer program or an algorithm intentionally, we must assume that there is a possibility of problems that are hidden, subtle, intermittent, and emergent. Those problems can exist even when the programmer has been diligent about checking output from the functions in the code.

Testing — evaluating a product by learning about it through experiencing, exploring and experimenting — can help reveal such problems. The product can be like a black box to the testers, even when its internals are legible to the programmers. When problems are found, the authors and maintainers can intentionally change the code and the checks. So far so good.

Then came machine learning approaches, in which no human writes an algorithm intentionally. Instead, the training process creates zillions of algorithms, and then selects one that best fits the training data. Machine learning models are vulnerable to errors because of missing, biased, or bad data, or overfitting better data, among other things.

When no one has written a computer program or an algorithm intentionally, the risk of problems that are hidden, subtle, intermittent, and emergent is even greater.

We can always assert that the chosen model can work; that’s available from a single demo. In order to find instances where the model might not work, or where it doesn’t work, we need empirical, experiential, experimental testing — and probably a lot more of it than before, because we can’t be sure of the factors that led to a particular model being chosen. Such programs are black boxes to everyone. The “developer” doesn’t have human intention, and the code isn’t amenable to review.

With feedback from humans who can identify problems, we might be able to tune the model, or put guardrails around it. If we wish to attend to risk, those modifications require still more testing.

With Large Language Models, anything — not just data, not just code, but anything, including “scientific” papers — can be output from an opaque process. This chilling set of examples collected by Gary Marcus should prompt us to worry about science itself being undermined; “flooding the zone with shit”, as certain sociopathically inclined people describe it.

Creating limitless amounts of output by machine might seem wonderful to some, but the machinery is agnostic as to whether the output is bad or good. Even with aid from sophisticated tools, it can take orders of magnitude more time to evaluate output critically than to produce it uncritically. That’s a polite way of expressing Brandolini’s Law.

Any complex, responsibly developed sociotechnical system that entails risk needs testing and critical evaluation, from conception to final output. Today’s technology is capable of producing output at a scale sufficient to undermine institutions like science and journalism. Although imperfect, these institutions have helped us to sort out who and what to trust.

So again, to the managers: now is a bad time to be laying off skilled testers, or to be leaving testers unprepared, untrained, or uncritical. It is dangerous and irresponsible to believe that algorithmic output checking is enough to address business and social risk. It’s reckless to believe that the solution to unreliable software is even more unreliable software. We need skilled testing, critical thinking, and strong ethics more than ever.

2 replies to “Testing, Now More Than Ever”

  1. Hi Michael

    Reading this led me to reflect on the fact that so many in our profession have devoted so much effort to presenting (and misrepresenting) testing as a deterministic and mechanistic process. “Crank handle A according to Process B with inputs from source C and behold the wonder of my metrics”…

    I have spent my entire career dealing with the mindset that this sort of thinking has created in the organisations I have worked for. Sometimes successfully, other times – not so much.

    I do think that as a profession we are so much more vulnerable to the challenge posed by the uncritical promotion of LLM’s and their supposed testing benefits, simply because of the persistent prevalence of this mechanistic thinking in the profession

    I know that this is not a new insight by any means – but it is what leaped into mind when I read your post

    Reply

Leave a Comment