DevelopsenseLogo

Testing ChatGPT and Data Generation

December 11, 2023; ChatGPT 3.5 (chat.openai.com) I believe that it’s unwise and irresponsible to promote Large Lanaguage Models for any testing purpose unless we have tested and assessed how well the LLM does on that task. We should not simply accept the claims and fantasies of AI fanboys; nor should we accept the idea that because Everything Will Be Fine in the Future, we should start applying LLMs indiscriminately today. … Read more

ChatGPT and Quick Intelligence Tests (II)

Here’s the prompt: “Create five sentences composed of five words; then for each sentence, leave out the first four words, and print the result.” And here’s the output: (Results from ChatGPT 3.5 (chat.openai.com), December 7, 2023. A colleague reports better results from ChatGPT 4.0.) It seems to me that most humans would offer something more like …either with or without the periods. Note ChatGPT’s misinterpretation of the assignment. It seems … Read more

ChatGPT and Quick Intelligence Tests

A few nights ago, a colleague noted that he was impressed by ChatGPT’s capacity to evaluate a sentence. He had offered a prompt something like “Is honesty the best virtue” without the question mark, and was surprised that ChatGPT could infer that he had intended a question. (In this post, I will use cLLMs to refer to chatbots based on LLMs.) I was less surprised, since cLLMs reply by design. … Read more

ChatGPT and Test Data

People are touting ChatGPT as a tool for generating test data. Let’s try something simple: MB: You are a powerful, diligent assistant to a professional software tester. Give me a table of 30 numbers. In the first column, provide the number. In the second column, provide the English spelling of the number. Sort the column in alphabetical order by the values in the second column. ChatGPT 3.5: Certainly! Here’s a … Read more

Bing Chat, the Evaluate Function, and the Wolfram Alpha Plugin

When you read or even scan this post, you’re likely to say something like “Holy hopscotch, that’s a long post.”  And you’ll be right. And you might be inclined to say “…and it’s boring.” And depending on your perspective, you’ll be right about that, too. It certainly has taken a significant amount of time to edit and to narrate. If you’re interested in risk associated with Large Language Models and … Read more

A Reply to “Running a crowd-sourced experiment on using LLMs for testing” — Part 2: Analysis

Vipul Kocher is a fellow whom I have known for a long time. I think we met in North America in the mid 2000s. I know I visited his company in Noida, New Delhi about 15 years ago, and spoke with his testers for an hour or so. On that occasion, I also visited his family and had a memorable home-cooked meal, followed by a mad dash in a sport … Read more

A Reply to “Running a crowd-sourced experiment on using LLMs for testing”

This post and the ones that follow represent an expansion on a thread I started on LinkedIn. On September 30, 2023, Vipul Kocher — a fellow with whom I have been on friendly terms since I visited his company and his family for lunch in Delhi about 15 years ago — posted a kind of testing challenge on LinkedIn. I strongly encourage you to read the post. I’ll begin by … Read more

Reliably Unreliable

ChatGPT may produce inaccurate information about people, places, or facts.  https://chat.openai.com/ Testing work comes with a problem: the more we test, the more we learn. The more we learn, the more we recognize other things to learn. When we investigate a problem, there’s a non-zero probability that we’ll encounter other problems — which in turn leads to the discovery of more problems. In the Rapid Software Testing namespace, we’ve come … Read more

Experience Report: Using ChatGPT to Generate and Analyze Text

In the previous post, I described ChatGPT as being a generator of bullshit. Some might say that’s unfair to ChatGPT, because bullshit is “speech intended to persuade without regard for truth”. ChatGPT, being neither more nor less than code, has no intentions of its own; nor does it have a concept of truth, never mind regard for it, and therefore can’t be held respsonsible for the text that it produces. … Read more

Response to “Testing: Bolt-on AI”

A little while back, on LinkedIn, Jason Arbon posted a long article that included a lengthy conversation he had with ChatGPT.  The teaser for the article is “A little humility and curiosity will keep you one step ahead of the competition — and the machines.”  The title of the article is “Testing: Bolt-on AI” and in Jason’s post linking to it, I’m tagged, along with my Rapid Software Testing colleague … Read more