Dave Nicollette responds to my post on the Ellis Island bug. I appreciate his continuing the conversation that started in the comments to my post.
Dave says, “In describing a ‘new’ category of software defect he calls Ellis Island bugs…”.
I want to make it clear: there is nothing new about Ellis Island bugs, except the name. They’ve been with us forever, since before there were computers, even.
He goes on to say “Using the typical behavior-driven approach that is popular today, one of the very first things I would think to write (thinking as a developer, not as a tester) is an example that expresses the desired behavior of the code when the input values are illogical. Protection against Ellis Island bugs is baked in to contemporary software development technique.”
I’m glad Dave does that. I’m glad his team does that. I’m glad that it’s baked in to contemporary software development technique. That’s a good thing.
First, there’s no evidence to suggest that excellent coding practices are universal, and plenty of evidence to suggest that they aren’t. Second, the Ellis Island problem is not a problem that you introduce in your own code. It’s a class of problem that you have to discover. As Dave rightly points out,
“…only way to catch this type of defect is by exploring the behavior of the code after the fact. Typical boundary-condition testing will miss some Ellis Island situations because developers will not understand what the boundaries are supposed to be.”
The issue is not that “developers” will not understand what the boundaries are supposed to be. (I think Dave means “programmers” here, but that’s okay, because other developers, including testers won’t understand what the boundaries are supposed to be either.) People in general will not understand what the boundaries are supposed to be without testing and interacting with the built product. And even then, people will understand only to the extent that they have the time and resources to test.
Dave seems to have locked onto the triangle program as an example of a “badly developed program”. Sure it’s a badly developed program. I could do better than that, and so could Dave. Part of the point of our exercise is that if the testers looked at the source code (which we supply, quietly, along with the program), they’d be more likely to find that kind of bug. Indeed, when programmers are in the class and have the initiative to look at the source, they often spot that problem, and that provides an important lesson for the testers: it might be a really good idea to learn to read code.
Yet testing isn’t just about questioning and evaluating the code that we write, because the code that we write is Well Tested and Good and Pure. We don’t write badly developed programs. That’s a thing of the past. Modern development methods make sure that problem never happens. The trouble is that APIs and libraries and operating systems and hardware ROMs weren’t written by our ideal team. They were written by other teams, whose minds and development practices and testing processes we do not, cannot, know. How do we know that the code that we’re calling isn’t badly developed code? We don’t know, and so we have to test.
I think we’d agree that Ruby, in general, is much better developed software than the triangle program, so let’s look at that instead.
The Pickaxe says of the String::to_i() method: “If there is not a valid number at the start of str, 0 is returned. The method never raises an exception.” That’s cool. Except that I see two things that are suprising.
The first is that to_i returns zero, instead of an exception. That is, it returns a value (quite probably the wrong value) in exactly the same data type as the calling function would expect. That leaves the door wide open for misinterpretation by someone who hasn’t tested the function seeking that kind of problem. We thought we had done that, and we were mistaken. Our tests were revealing accurately that invalid data of a certain kind was being rejected appropriately, but we weren’t yet sensitized to a problem that was revealed only by later tests.
The second surprising thing is that the documentation is flatly wrong: to_i absolutely does throw exceptions when you hand it a parameter outside the range 2 through 36. We discovered that through testing too. That’s interesting. I’d far rather it threw an exception on a number that it can’t parse properly, so that I could more easily detect that situation and handle it more in the way that I’d like.
Well, after a bunch of testing by students and experts alike, we finally surprised ourselves with some data and a condition that revealed the problem. We thought that we had tested really well, and we found out that we hadn’t caught everything. So now I have to write some code that checks the string and the return value more carefully than Ruby itself does. That’s okay. No problem. Now… that’s one method in one class of all of Ruby. What other surprises lurk?
(Here’s one. When I copied the passage in bold above from my PDF copy of the Pickaxe, I got more than I bargained for: in addition to the text that I copied, I got this: “Report erratum Prepared exclusively for Michael Bolton”. Should I have been surprised by that or not?)
Whatever problem we anticipate, we can insert code to check for that problem. Good. Whatever problem we discover, we can insert code to check for that problem too. That’s great. In fact, we check for all the problems that our code could possibly run into. Or rather we think we do, and we don’t know when we’re not doing it. To address that problem, we’ve got a team around us who provides us with lots of test ideas, and pairs and reviews and exercises the code that we write, and we all do that stuff really well.
The problem comes with the fact that when we’re writing software, we’re dealing with far more than just the software we write. That other software is typically a black box to us. It often comes to us documented poorly and tested worse. It does things that we don’t know about, that we can’t know about. It may do things that its developers considered reasonable but that we would consider surprising. Having been surprised, we might also consider it reasonable… but we’d consider it surprising first.
Let me give you two more Ellis Island examples. Many years ago, I was involved with supporting (and later program managing and maintaining) a product called DESQview. Once we had a fascinating problem that we heard about from customers. On a particular brand of video card (from a company called “Ahead”), typing DV wouldn’t start DESQview and give you all that multitasking goodness. Instead, it would cause the letters VD to appear in the upper left corner of the display, and then hang the system. We called the manufacturer of that card—headquartered in Germany—, and got one in. We tested it, and couldn’t reproduce the problem. Yet customers kept calling in with the problem. At one point, I got a call from a customer who happened to be a systems integrator, and he had a card to spare. He shipped it to us.
The first Ellis Island surprise was that this card, also called “Ahead” was from a Taiwanese company, not a German one. The second surprise was that, at the beginning of a particular INT 10h call, the card saved the contents of the CPU registers, and restored them at the end of that call. The Ellis Island issue here was that the BX register was not returned in its original state, but set to 0 instead. After the fact, after the discovery, the programmer developed a terminate-and-stay-resident program to save and restore the registers, and later folded that code into DESQview itself to special-case that card.
Now: our programmers were fantastic. They did a lot of the Agile stuff before Agile was named; they paired, they tested, they reviewed, they investigated. This problem had nothing to do with the quality of the code that they had written. It had everything to do with the fact that you’d expect someone using the processor not to muck with what was already there, combined with the fact that in our test lab we didn’t have every video card on the planet.
The oddest thing about Dave’s post is that he interprets my description of the Ellis Island problem as an argument “to support status quo role segregation.” Whaa…? This has nothing to do with role segregation. Nothing. At one point, I say “the programmer’s knowledge is, at best, is a different set compared to what empirical testing can reveal.” That’s true in any situation, be it a solo shop, a traditional shop, or an Agile shop. It’s true of anyone’s understanding of any situation. There’s always more to know than we think there is, and there’s always another interpretation that one could take, rightly or wrongly. Let me give you an example of that:
When I say “the programmer’s knowledge is, at best, is a different set compared to what empirical testing can reveal,” there is nothing in that sentence, nor in the rest of the post, to suggest that the programmers shouldn’t explore, or that testers should be the only ones to explore. Dave simply made that part up. My post says one thing, mostly on epistemology, that we don’t know what we don’t know. From my post, Dave takes another interpretation about organizational dynamics that is completely orthogonal to my point. Which, in fact, is an Ellis Island kind of problem on its own.
Perhaps I’m missing something, here (please enlighten me, if I am), but, using your Ruby-based triangle program example, what is “illogical”, from a user’s standpoint, about the input values of “300, 300, 44”?
Nothing is illogical about the input values 300, 300, 44. What is surprising to most users is the answer that you get: “equilateral”, instead of “isoceles”. (Note also that the triangle program in question is Delphi-based, not Ruby-based.)
What’s surprising to me is that this example, at the top of your original post, is prima facie proof that Dave’s claim that “Protection against Ellis Island bugs is baked in to contemporary software development technique” is false.
If the input is not illogical, but the result is an Ellis Island bug, then protection is not “baked in”.
I still feel like I’m missing something, because this strikes me as blindingly obvious, but Dave seemed to blow right past that example, and then you and Dave spent an awful lot of words arguing this issue. And I’m not sure he’s willing to concede the point, yet.
You’d have to take that up with Dave.
One thing that I’ve noticed over the years is that “obvious” is a one-word tautology. Something is obvious only to someone to who it is obvious, but it’s not at all obvious to someone to whom it is not obvious.
(Thanks for the Delphi versus Ruby correction. Sloppy of me! In my defense, I’m one of those testers who hasn’t yet learned to code.)
“…’obvious’ is a one-word tautology…”
Thanks for pointing out that I wasn’t being a very good critical thinker, there, Michael!