Error messages are displayed by programs in response to unusual or exceptional conditions that can’t be rectified by the program itself. A well-written program should post very few error messages indeed; instead, absolutely whenever possible, the program should cope with the problem gracefully and continue without bothering the customer. By this yardstick, of course, most programs are poorly written.
For the purposes of this discussion, there are two classes of poorly written programs. First, there is the program that can’t remedy things on its own, or that needs so much hand-holding that it bothers its customers unnecessarily. Second, and the focus of this discussion, is the kind of program that encounters some real problem, but confuses or offends the customer by providing an inadequate error message.
Of course, the best error message is no error message at all. In the case where something has gone awry, a program should do everything within its power to remedy the situation at hand. For example, a program should never post a dialog saying that a file cannot be found unless the program has actually bothered to look for it. At a minimum, a program (that is to say, a programmer) should search all local hard drives for the missing file. If the program finds the file in an inappropriate place, the program should either update its own records to point to the file, or make a copy of the file in an appropriate place. There should be no need to disturb the customer in either case.
If your program has to post an error message, don’t waste the customer’s time either before or after the error condition is detected. For example, an installation program should not begin copying files unless it is certain that the files will fit onto the destination disk. A simple set of calculations can determine whether there is adequate disk space, but most programs don’t even bother with this basic check. Just as bad, installation programs frequently refuse to proceed, even when already-existing files are going to be overwritten.
Don’t depend on the operating system to handle things properly. Amazingly, after almost twenty years in the field, the DOS COPY and XCOPY commands don’t bother to check for disk space before the copy starts; instead, they begin copying blindly and hope that the destination disk doesn’t fill up before the operation is complete. Windows is no better; like DOS, it stupidly fails to check for sufficient disk space before performing a file copy. Worse, if you are copying a set of files, Windows will stop the process on the first error, will refuse to continue, and will forget your selection.
When you write code, anticipate the error conditions and code around them. Try to fulfill the user's goal to the greatest degree possible, and don't view error conditions as catastrophic unless they are. Remember the program's state at the time that the error occurred, and permit the user to restore that state easily. Always write functions that return status codes, and return a unique error code for each error condition. At the point the status code is returned, there is typically quite a bit of information available that you can relay to people who are going to need to identify and fix the problem. On the other hand, remember that your program's internal errors are not the customer's concern, so don't overload or intimidate the customer. Make it clear that some information is for the customer to act upon, but that other information is there only to help the person that is helping her.
A well-constructed error message
One of the best error messages I have ever seen went something like this:
This was an error message from an applicant tracking system (called "Applicant Tracking System") that was designed for a personnel agency by an independent consultant in 1988. The message looked almost, but not quite exactly, as I’ve rendered it above. A significant difference is that the original message did not have had a Windows look and feel, because this message came from a DOS program. I mention this because the author provided this detailed message even in the days of the 640K memory limit. The customers of this system were not experienced with computers, but even if they had been experts, the message would have been helpful.
Let’s look at this error message and compare it with the list of requirements above:
Now, by contrast, here are some examples of the very worst kinds of error messages. You’ll see that my examples are all from Microsoft software. Microsoft is not the only company that releases software with lousy user interfaces, but it certainly seems to have perfected the art of the irritating error dialog.
Duh. This message states something that is entirely obvious, and fails to state anything at all that is helpful. There is nothing here to remedy the customer’s problem or to help him through it. There is no information that would help even an imaginative tech support person to work through some possible solutions with the customer. The developer responsible for maintaining this code--typically not the person who wrote the original program--is not offered even a hint of what the problem is, or the error code returned by the called function. If more than one error condition posts this dialog, there's no way to tell which one caused the problem.
I have no comment on this message.
I have no comment on this message either. Although somehow this looks a little less severe than the last one.
You know more than you’re saying, don’t you? And by the way-- restarting Outlook will help how, exactly?
Which applications? How will it be incompatible? Why didn’t you fix the problem? Thank God it doesn’t seem to be incompatible with non- existent applications.
"May" again. Is a component busy or missing, or is it neither? If a component is involved, which component? Is it busy? Or is it missing? And what is a component anyway? A file? If so, could we have the file name please?
Really? Really? Which action? Which action? What should I do to fix the problem? What should I do to fix the problem?
Nope, I don’t. I want you to find it.
Still won’t look for it, eh? In fact, I’ve forgotten the context in which I got this message, and so I’ve forgotten which application is involved. However, I do remember that it was unclear to me even at the time which application needed to be reinstalled.
Our systems for teaching programming almost never discuss error messages, or even error handling. How many programming books emphasize the importance of checking return codes from operating system or library functions, and handling errors gracefully? How many source code examples show even minimal error checking or commenting? How many programming books discuss even the most basic user interface issues, such as how to construct a useful error message?
Let's start with what is displayed to the world outside your program. Error messages are often less than helpful or useful because they’re written by people who have an intimate knowledge of the program. Those people often fail to recognize that the program will be run by other people, who don’t have that knowledge. Thus it is important that you consider the customer’s plight carefully when writing error-handling routines; that you involve someone other than yourself with the design and testing of the program; and that you provide each and every error message to someone else for review. The reviewer should not be an expert in the program. Your messages should be detailed and polite. They should not offend or frustrate the customer.
Write and test your program so that it will have to display as few error messages as possible. If your programming language provides debug-build validity checking like the C ASSERT macro, use it; if you have to hand-roll validity checking yourself, do it. Walk through code in the debugger. Include features in the release version of the program, such as log files or verbose modes, to help with troubleshooting. Each condition in the program that has a chance of failure should return a distinct error code, and should display this code as part of the error message. The error code will not only help to narrow a problem down, but is also good internationalization strategy; error codes will form a useful cross-check when the program is translated. Comment each status code as thoroughly as you can to make life easier for the maintenance programmer and for documenters, and use the header to help define a table of error status codes for technical support. Make sure that there is a mechanism to identify missing files, registry entries, and the like. Create error handling classes and functions to supply consistent, well-formatted error messages--and reuse them consistently. Use code review and walkthroughs with other developers and quality assurance to make sure that your program is readable, consistent, maintainable, and free of defects. Provide testers with tools or a test program that will allow them to view all of the error messages displayed by your program.
Façade programming is a useful construction strategy. As the program is being constructed, write skeletons of each function. Until you have the internals of the function coded, simply have the function do nothing and return a positive return code. Define the return codes as symbols-—constants or enumerated values. Later, when you begin to flesh out the function (and as you check return values at each stage), define distinct symbolic codes for each type of error.
Programming is, of course, more complicated than ever. There are more technologies, more languages, and more different disciplines to master this week than there were last week. Developers are pressured to design too little, and to code too quickly. Each step of the development process is squeezed so that products can be released as quickly as possible. However, neither programmers nor managers should kid themselves; other parts of the company are not likely to take responsibility for a program that is sent to testing (or worse, to customers) laden with obvious defects and opaque error messages. Developers and development managers must therefore learn to include design and debugging time in planning estimates, and must argue effectively for more time and more help, especially in areas that don't require coding, such as user interaction design.
It's rational to assume that help won't arrive immediately, so walk a mile in the customer's shoes and program defensively. When you’re constructing an error message, the important thing to remember is that your message must convey useful information. Useful information saves time. Remember that the message will be read not only by the customer. The message must also be interpreted by the tech support person who handles the call, the quality assurance analyst who helps to track down the problem, and the maintenance programmer who is charged with fixing the problem in the code. Each person in this process represents a cost to your company. What’s more, while the error-handling routine need be written only once, the support path is typically followed many times--tens, or hundreds, or thousands of times. Form alliances with technical support, testing, and documentation; ask questions, do the math, and put dollar amounts on what it costs to solve (or sandbag) a problem after the product has been released. Don't forget future lost sales in your calculations. If senior management at your company wants to rush the product to market without leaving you time to code proper error handling, remind management politely of the cost of such a policy.
If you’d like to read more on this subject, have a look at Alan Cooper’s books About Face: The Essentials of User Interface Design and The Inmates are Running the Asylum: Why High Tech Products Drive Us Crazy and How To Restore The Sanity , both just a click away on Amazon. Mr. Cooper's primary thesis is that software confuses customers and makes them feel stupid, and that as software professionals, we are obliged to serve them better than we do. This is quite true. I respectfully submit that Mr. Cooper makes several errors in his books--particularly when he argues that the hierarchical file structure mirrors the computer’s view of the file system, which is simply untrue--but there is much of value in his writing. Even when he’s wrong, his views are interesting and worth considering, and several of the ideas in this essay are inspired and informed by his work.
Steve McConnell’s definitive book on good software construction practices, Code Complete: A Practical Handbook of Software Construction, has some excellent material on defensive programming and error handling. Few other books--particularly introductory programming texts--pay the topic any more than lip service, which is a disgrace. New programmers should read this book in parallel with an introductory text on their language of choice.
This essay is copyright ©2003 Michael Bolton. If you liked it, please let me know. If you didn't, please let me know how I might improve upon it. If you'd like to be notified when I've posted another essay that you might enjoy, please click here to be put on my mailing list. If your reply-to address isn't altered to prevent spam, you don't have to put anything in the body of the message.
You are welcome to reprint this article for your own use or for your company's use, as long as you follow the instructions here (https://www.developsense.com/reprints.html).
Best of all, if you (or your company, or your manager, or your employee) need counselling or instruction in this area, I can help with engaging and informative courses on quality assurance and software testing in plain English that can save your company lots of time and money. Contact me for details. Thanks!