Since June 2023, James Bach and I have been collecting a set of “syndromes” associated with certain forms of AI — chatbots based on Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs). The most prominent of these, at this writing is OpenAI’s ChatGPT. Today we added a new syndrome: Voldemort Syndrome.
Today LinkedIn (and much of the rest of the internet) lit up over the “The Man Who Shall Not Be ChatGPT’d” — a man whose first name is David, and whose last name is Mayer. Georg Zoeller provides a superb account of the phenomenon, with some plausibly strong theories on what’s happening: some process is monitoring the output from the model, and constraining it.
When I first heard about the man with the first name David, and the last name Mayer, I did a bit of experimentation myself:
Below what you seen in the the screen capture, ChatGPT initially provided “I am unable to provide a response at this time.” So I tried again:
Again, ChatGPT bailed out, saying “I am unable to provide a response.” It’s really interesting that it cot as far as the opening markup, and the first name “David”, but choked as soon as the last name would have come into the output stream. Let’s try one more time:
As I suspected from the second try, the GPT was providing a response that was vetted and then suppressed by a very leaky filter downstream of that. When I read Georg’s post, I found that he had demonstrated a way around this filter by persuading the model to produce output in which the letter D was replaced by “ASCII art” — |), to produce the string |)avi|) Mayer. Not to be outdone, I tried my own:
Clearly the output filtering is pretty ham-fisted.
It’s possible that there’s filtering at the input stage too. Note how the chatbot responds when I ask directly:
(This time the session was still in a state where I could grab a screen shot so you can actually see “I’m unable to produce a response”.)
There’s lots of speculation about who the problematic fellow — the guy whose first name is David, and whose last name is Mayer — might be. But this leads to another issue: suppose your name was the same as some famous or infamous person. (Anyone else named “Michael Bolton” can easily relate to this.) In this case, suppose your first name were David, and your last name were Mayer, and you wanted to use ChatGPT to create a résumé for you.
Or suppose that your last name were Mayer, and your first name were David, and you were an expert in climate change. Imagine if you had a paper worthy of citing, but the intersection of LLMs and AI’s lack of social competence mena that you could never by cited in GPT output.
In the USA, one-in-a-million odds of being in a category means there are 335 people in that category. The odds for a David also being a Mayer are higher than that. I looked at a couple of sources for first and last name frequencies; not terribly reliable and not necessarily up to date, but using the raw odds, it’s fairly easy to estimate something on the order of 1000 people in the United States with that pairing — probably somewhat higher, taking first name choices into account given ethnicity.
How many more instances will we see of Voldemort Syndrome: He Who Shall Not Be Named? And what other instances of this phenomenon might apply to information that is useful in other ways? The story continues in the next post.
Notice something important here, for people who are not technical testers: you don’t need an intimate, insider’s knowledge of how a technology works to show that there are problems about it. The interface with LLM-based chatbots is intended to be interactive. As with all software, a great thing to do is to interact with it, and track of what you learn. The cool thing about interactive testing is that you’re in a stance that affords and emphasizes a naturalistic encounter with the system — the kind that real users have. And to those who would give you grief for daring to interact with a product as a human would do, remind them: no end user uses Selenium, Playwright, Cypress, or Postman to engage with the product.
This is NOT to suggest that testers should ignore the technology; on the contrary, it’s a very good idea to get as close to the technology as you’re willing to do. It’s also really good to learn to code, too; it would be fun to run this experiment a million times, using the thousand most common first names and the thousand most common last ones, to see if there are other one-in-a-million combinations that produce this kind of result.
However, if you aren’t inclined to code, you can use your social smarts to enlist coders to help you — such that everyone (and especially the business) gets the best of both human and tool-assisted interaction with the product.
My GPT is smarter than yours. It can even repeat it.
https://chatgpt.com/share/675aef03-588c-8011-b9ec-4ce5392fd2cb
How does your smart GPT do with Jonathan Turley?