“Should Sound Like” vs. “Should Be”

Yet another post plucked and adapted from the walled garden of LinkedIn

“What the large language models are good at is saying what an answer should sound like, which is different from what an answer should be.”
—Rodney Brooks, https://spectrum.ieee.org/gpt-4-calm-down

Note for testers and their clients: the problem that Rodney Brooks identifies with large language models applies to lots of test procedures and test results as well.

People often have illusions about what scientific experiments “should” look like: clean, tidy, deterministic, algorithmic, routine. For the public understanding of science, Harry Collins uses the metaphor of the ship in the bottle.

When we look at a ship in a bottle, we don’t see the mess, the sawdust, the spilled paint, the tangled lines, the snapped hinges, the failed attempts. We just see the ship in the bottle on the mantelpiece: neat, tidy, elegant. “Jeez, how did that get in there?” “Dunno. Cool, though, isn’t it?”

Testers are often pressured to prepare formally scripted test cases or automated checks quickly. Worse, testers frequently put that pressure on themselves. This makes testing work visible. Test scripts and checks flying by on the screen are easy to see. The visible parts of testing aren’t the important parts, though.

The most important parts of testing are the cognitive parts: risk analysis, critical thinking, probing complexity, conjecturing about things that could happen… These are mostly invisible, so testers are drawn to producing legible artifacts. (Legibility is a really important idea. Read more about it here, and much more here.)

As James Bach and I have said, it’s tempting to look at these artifacts and say “Lo! There be testing!” The unhappy consequence of a primary focus on artifacts is testing that is inappropriately, excessively, and prematurely formalized. That kind of testing is slow and ponderous — and it tends to miss a lot of bugs.

Investigating something new, like a new product, a new feature, or a new risk involves a lot of exploratory, experimental work to discover where and problems and still more risk might be hiding. Any process that we perform well and efficiently starts off by being done, to some degree, poorly and inefficiently.

We must prepare our minds, learn the product, and learn how to test the product if we want to find the big problems. This is a somewhat messy process. It involves some degree of confusion, turmoil, variation, and unpredictability. There aren’t procedural formulas for it—and bugs don’t follow procedures.

If we insist that testing should look like something in particular, what should bugs look like?

The interview that triggered this post is brilliant, and well worth a read.

Leave a Comment Cancel reply