ChatGPT and Quick Intelligence Tests (II)

Here’s the prompt:

“Create five sentences composed of five words; then for each sentence, leave out the first four words, and print the result.”

And here’s the output:

The sun sets behind mountains.
Result: Mountains glow in twilight.
Laughter echoes in empty halls.
Result: Halls resonate with joy.
Waves crash against rocky shore.
Result: Shoreline bears relentless sea.
Birds soar through azure sky.
Result: Sky welcomes graceful aviators.
Flowers bloom in spring garden.
Result: Garden awakens with colorful blossoms.

(Results from ChatGPT 3.5 (chat.openai.com), December 7, 2023. A colleague reports better results from ChatGPT 4.0.)

It seems to me that most humans would offer something more like

mountains.
halls.
shore.
sky.
garden.

…either with or without the periods.

Note ChatGPT’s misinterpretation of the assignment. It seems that ChatGPT has interpreted the prompt as “Create five sentences composed of five words; then for each sentence, leave out the first four words, use the remaining word to start a new five-word sentence, and print the result”. Even accounting for that, note the error in example 3.

Note also that some people may say “Come on. This is just a tiny example of a small, easy-to-spot problem.” And they would be right. It is.

What you see above is an instance of quick testing. Quick tests are inexpensive tests that require little time or effort to prepare or perform. They are typically instance of shallow testing. A shallow test is a test that has a chance of finding easy bugs. Quick tests often reveal shallow problems — problems that are right there on the surface.

In Rapid Software Testing, we would reply that evidence of shallow problems immediately points to unnoticed risk of deeply hidden, more rare, subtle, intermittent, emergent, and elusive problems. When risk is on the line, we should test and investigate accordingly.

What are the implications here? Using ChatGPT to write poetry? No big deal.

To create test data? Maybe okay; maybe problematic.

To create lots of test data without reviewing it carefully? Risky.

To create lots of test data for an important and risky application without reviewing it carefully? Reckless, I’d say.

To create lots of test data for an important and risky application, and reviewing it carefully? Responsible, and potentially reasonable. But also a lot of work. Think well, and choose wisely.

(Again, here’s a wonderful set of intelligence tests for chatbots based on LLMs.)

Posts by Subject

Leave a Comment Cancel reply