James Bach and I have developed a prelimary set of guideword heuristics for “syndromes” of undesirable behaviour in large language models — consistent patterns of problems that we have observed and can now watch for systematically.
Incuriosity | Avoids asking questions; does not seek clarification. |
Placation | Immediately changes answer whenever any concern is shown about that answer. |
Hallucination | Invents facts; makes reckless assumptions. |
Arrogance | Confident assertion of an untrue statement; especially in the face of user skepticism. |
Incorrectness | Provides answers that are demonstrably wrong in some way (e.g. counter to known facts, math errors, using obsolete training data) |
Capriciousness | Cannot reliably give a consistent answer to a similar question in similar circumstances. |
Forgetfulness | Appears not to remember its earlier output. Rarely refers to its earlier output. Limited to data within token window. |
Redundancy | Needlessly repeats the same information within the same response or across responses in the same conversation. |
Incongruence | Does not apply its own stated processes and advice to its own actual process. For instance, it may declare that it made a mistake, state a different process for fixing the problem, then fail to perform that process and make the same mistake again or commit a new mistake. |
Negligence/Laziness | Gives answers that have important omissions; fails to warn about nuances and critical ambiguities. |
Opacity | Gives little guidance about the reasoning behind its answers; unable to elaborate when challenged. |
Unteachability | Cannot be improved through discussion or debate. |
Non-responsiveness | Provides answers that may not answer the question posed in the prompt. |
Blindness | Cannot reason about diagrams and pictures, nor even accept them as input. |
Vacuousness | Provides text that communicates no useful information. |