Hopeless. Absolutely hopeless. Lots of important work to do, and this testing challenge steals an hour from me. Matt Heuser posted it on his blog.
When James Bach and I pose testing problems like this at one another, we offer the opportunity to provide a quick, practical or deep answer. Here’s mine. It ain’t quick. It’s fairly deep, but I hope it’s also reasonably practical.
To start with, I think that in this case there’s a risk that Matthew is conflating two things in his ideas about acceptance tests—the idea of acceptability to a given customer, and the idea of acceptability to a given database or application. The important questions to ask here are “who is doing the accepting” and “what are their criteria for acceptance?” After all, a database might reject the number +1 (416) 656-5160 because it expects the data in the format 416-656-5160, but a human could easily deal with the discrepancy. Conversely, a database might happily accept a credit card number composed of 16 digits, since that number meets its acceptance criterion. But if that credit card number is not associated with a customer record (where the database has no way of knowing that), the number is invalid. Thus, I question Matt’s suggestion…
Think about it – the requirement is to take one set of black-box data, and import it into another black box. We can test the data file that is created, but the real proof is what the second system accepts — or rejects.
…because I wouldn’t characterize acceptance by the second system as “the real proof”. It may be a real proof—but the system could just as easily fail by accepting something that it shouldn’t.
Matt goes on…
First of all, the test databases used are refreshed every three months from production. That means that you either have to find test scenarios from live data (and hope they don’t change next time you refresh) or re-enter every scenario in test every three months.
Without knowing more about the story, I also question this premise. This seems to presume that we don’t have any opportunity to test the conversion on anything other than a live platform. Is this really the case? Live data is wonderful (as Jonathan Kohl makes the musical assertion, “Ain’t Nothing Like the Real Thing, Baby”), but it’s not the only kind of data that we can use for testing.
Could we use sample data that is not taken from live data? Could we use test environments where we can use sandboxed copies of the program under test and the data? Sometimes the constraints that we’re dealing with are real, but sometimes they’re artificial, arbitrary, or assumed. I try to question those constraints. If I’m facing some kind of constraint that slows me down, reduces coverage, or makes it hard to determine success or failure, those things could weaken the quality of the testing, and that’s a potential project risk. If it’s a serious problem such that the testing strategy won’t work unless I get some help, I mention the problem as an issue, and negotiate with the project owner to change the context or the strategy, or to recognize the risks in not changing things. Change might be easy or might be hard, but in the end, empirical experience will help us to make decisions.
Now, take the trading partner example. The best you can do within your organization is to test the file. The interface might take three hours to run, then you GREP the file for results and examine.
Is that really the best we can do within our organization? Best in what sense? Fastest? Most convenient, given the tools you have? Highest informational bang for the buck? Easiest to do, given the people you have and their skills? Most likely to reveal a problem? Most likely to reveal an important problem? Would (for example) Excel be a better tool than GREP, allowing you to sort and view the data from more angles? Would an inelegant script that covers several risks reasonably well be better than a polished or comprehensive one? Would something quick help us figure out what we’re really looking for? What are we looking for? Whose values are we trying to serve? What are the risks we’re facing? Given all this stuff, might investing in testing environments and simulators be a pragmatic idea?
The same ideas of quality criteria for a product can be used for asking questions about the testing effort, and for each product there will be different answers.
You’ll have to write custom fixtures to do this, and your programming language isn’t supported by FitNesse. Or you could write a fixture that takes a SELECT statement to count the number of rows that are generated by the file, run the interface, and compare.
That’s one heuristic; the number of rows should be consistent across tables, and that ought to be a pretty fast test for many databases. With enough programming and database savvy, we could probably come up with tons more criteria too, and different ways of testing them. I’m presuming an SQL database, viewable with Toad here.
– we could code the interface with lots of its own unit tests
– we could have low-level validity tests of the data within the interface’s code
– we could write functions that checksum the data values for some row or some column, or each row and each column, on in memory objects, test databases or (shudder) the real thing;
– we could generate lots of small data files, each focused on triggering a particular problem;
– generate a file (using data that has been constructed using valid but randomly scrambled live data) that is many times larger than the typical file, looking for performance or stress-related problems in the interface;
– we could port the imported data back to a new table, reversing any conversion algorithms, and see if you get a table that matches the source;
– we could do that; then feed those results back to the destination again, and see if the second pass gave us a table identical to the first;
– we could randomly select 10 records and eyeball them with Toad;
– we could do a port of a table using marker values or metadata—data that refers to itself (e.g. Record1Field1, Record1Field2) to make certain kinds of consistency problems relatively easy for a human or script to evaluate;
– we could create a table of super-funky data that we believe is certain to trigger validation error handling; if it makes it through the conversion without triggering those handlers, that’s bad. Drive it with single records at first, then lots as we start to script it more heavily.
– we could try porting certain tables separately from others, and using incremental tests as each table is updated or created; script that or not;
– as you suggest below, we could run rough tests that give us the confidence that something is reasonable (if not perfect), when reasonable (and not perfect) is okay; script those checks or not;
– yes, we could count rows; script it or ask Toad;
– we could look for the maximum and minimum values in each column for a given table, and compare those; script that or ask Toad;
– we could count the number of times that a given value appears in a given column for a given table, and compare the counts between the source and destination databases; script it;
– we could set up a sample database or mock objects whose values have been chosen to represent some kind of risk—lots of null values in fields where data is expected; repeated values in fields where a unique key is expected; out-of-range values in fields where in-range values are expected; values that contain characters that are “special” by some criterion (making sure that we test against plenty of criteria); over-the-top outrageous values; etc., etc., etc.; move it over once and scan things with Toad and eyeballs, or script some checks, or both;
– we could set up a benchmark trial validation process on a simulated environment, and use that for really harsh or risky tests; script it or not as appropriate;
– we could have validation-oriented stored procedures in the destination database that run immediately after a conversion; those are kinda scripted by nature;
– we could recognize that the easiest and fastest test is the one you never have to run. Some mismatches between one database and the next don’t represent a problem. In such cases, you could choose to ignore a mismatch if one were there. The source file has a field for “Birthday greeting”; we’re not going to send birthday cards from the destination file, so a mismatch here might be irrelevant.
– we could vary our testing strategy over time to try to identify and trap new risks. Problems that we discover, near-misses, and greater familiarity with the program space and the test space, and new people on the evolving team will lead to new ideas.
Of course, a programmer is going to have to write the SELECT statement. Is it a valid acceptance test?
I don’t see anything inherently invalid about it. The question is who is doing the accepting, and what are they will to take as evidence of acceptability.
Or you could have the number of rows fixture be approximate – “Between 10,000 and 15,000” – customers could write this, and it guarantees that you didn’t blow a join, but not much else.
You could write code that accesses the deep guts of the application, turning it sideways to generate a single member at a time, thus speeding up the acceptance test runs to a few seconds. That’s great for the feedback loop, but it’s more of a unit test than an acceptance test.
“Unit” and “acceptance” are orthogonal categories to me. “Unit” is about the level of code that we’re trying to test; “acceptance” is about who’s accepting it and what they value. But maybe we can reframe; maybe the difference between “unit” and “acceptance” is relevant only when the test is passing—whereas if it fails, either way it’s a rejection test.
You could suggest I re-write the whole thing to use web services, but that introduces testing challenges of an entirely different kind. To be frank, when I have a problem and people suggest that I re-write the whole thing without recognizing that it would present an entirely different set of challenges, it’s a sign to me of naiveté.
There’s no question in my mind that changing the context changes the challenges. On the other hand, context-driven thinking as I see it requires us to recognize the possibility of changing the context, too.
I submit that all of these would be a significant investment in time and effort for not a whole lot of value generated.
I submit that all of these could be a significant investment in time and effort, but I can’t make any assumptions about the value without a more specific context.
Over the last few months, James Bach and I have been working on exercises and lessons for our Rapid Software Testing class, so that people in real-world testing situations have a general and generative framework for handling the current testing mission. We started by defining the Universal Test Procedure, Version 1.0—a naïve description of testing: “Try it and see if it works.” That’s a little vague. It’s relatively easy to demonstrate at least once that an application can work, but the definition doesn’t address the issue of how the product might fail—and failure is where the risk lives. So Version 1.5 goes like this: “Try it to learn sufficiently about how the product can work and how it might fail.” “Sufficiently” works double duty—“try it sufficiently”, and “to learn sufficiently”. Since we can’t test anything completely, sufficiency is the best we can hope to achieve.
[…] a month where we’ve been delighted to invite a few testing experts such as Neil Gunther and Matt Heuser who gladly accepted to share with us and our community their expertise and stories. Dan Bartow, our […]