DevelopsenseLogo

A Super-Quick Guide to Evaluating “AI” Claims

The producer of practically every product or service on the market seems desperate to surf the AI hype wave these days. It seems the big thing is to claim the product to be “AI-enabled” or to have “AI features”.

Here’s a quick and (mostly) easy way to evaluate claims about AI products. (I’ll say “product” to save saying “product or service” every time.)

  1. Try replacing “AI” with “software”, and see if there’s any material difference in what’s being said, offered, claimed, or advised.
  2. Now try replacing “software” from Step 1 with “software that was written without a skilled human designing, reviewing, or even understanding the resulting algorithm”, and ask yourself if there’s anything you might be concerned about.
  3. Finally, take the result from Step 2 and add “and without monitoring or reviewing the output from the algorithm”.
  4. Take the results from the first three steps and ask “What could go wrong, and how would we find out about it?”

If the answer to (4) is “nothing that causes harm, damage, or diminished value; nothing that involves loss of life, health, safety, money, data, reputation, freedom, opportunity, rights, or brings significant risk related to any of those; and if there were such problems, we’d know right away, before it’s too late” then we might be okay, and not on dangerous ground.

Finally, Step 5.

  1. To what degree is the answer from (4) supported by experiment, experience, and exploration from a critical stance? (That is: has the claim been tested?)

If you’d like to probe a little deeper, you can ask other questions too:

  • What parts of the product actually involve any kind of “AI” or machine learning algorithm, and what parts don’t? Is “AI” being used as a marketing term for “reasonably sophisticated software as usual”?
  • What flavour(s) of machine learning were applied? Evolutionary algorithms? Bayseian analysis? Connectionism? Symbolic manipulation?
  • What was the training data and where did it come from? What biases exist in it? (Don’t bother asking “Is the training data biased?” It’s all biased in some sense. For instance: everything in the training data has been converted into machine-readable strings of bits. A machine learning model designed to select images of dogs from a large set of images has no experience of dogs in the world. It only processess bit patterns that it has been fed, and ulitimately that’s based on some human’s choices. The model only “recognizes” images that are consistent with those that some human has evaluated as being consistent with dogs, or with models of dogs.)
  • To what degree is training data from the past relevant to what happens, or to what might happen, in the future? (All of the model’s training data comes from the past. How far in the past?)
  • Who will be affected by this product? Who stands to gain, and who might lose?
  • How might this product help to enhance expertise? To erode it?
  • To what degree might this pleasing demonstration be insufficent evidence of reliable behaviour?
  • Do we have to give the product the correct answer beforehand? To what degree? How much? How correct?
  • How much damage might we have to repair?

You may not get to those questions — or need to — before the first five steps, though.

Leave a Comment