Ron Jeffries and I, among others, have been discussing our contrasting approaches to testing in the Agile Testing mailing list. We’ve been doing this for about five years now, and we’re not getting very far.
Recently a colleague suggested that most arguments seem to be about conclusions, when in fact they’re about premises. I believe that Ron and I are working from dramatically different premises about what a test is, and what its purpose is. For Ron, a test is a single assertion about the output of a function, in a highly controlled environment such as a single developer’s machine. For me, a test is any observation and evaluation that anyone can make about the product operating within its context. For Ron, tests are lines of code. For me, tests are ideas. Under Ron’s notion of a test, tests can be counted easily. Under mine, they can’t.
Some of my tests ideas can be automated easily, and I think they should be automated.
Ron asks: Suppose there is a team implementing software in an Agile fashion, by which I mean that every week they add, say, 10 new features, and every week they improve the overall design of the software from the simple design they perforce started with, to support all the features now in place.
In such a situation, any feature can potentially be broken at any time, because the overall design is being modified all the time.
In such a situation, the number of features increases essentially linearly as a function of time. In week N, there are N times as many features as in week 1. It may be appropriate to consider that there are O(N^2) potential two-way interactions between features.
I would note that there are O(N^3) potential three-way interactions between features. There are also O(M^N^2) potential interactions between individual customers (M) and two features within.
Note that there is a difficulty here. When we count features, we’re reifying them; that is, we’re giving construct attributes to things that are really only concepts. This is the fundamental problem of software metrics–that we attempt to quantify things that are not very quantifiable. (Cem Kaner and Pat Bond have a wonderful paper on this: Software Engineering Metrics: What Do They Measure and How Do We Know?) Are “check that the customer has a positive balance” and “check customer has not exceeded daily withdrawal limits” features? How do we evaluate those feature with respect to one feature called “customer withdraws money from chequing account?”, which may incorporate several or hundreds of such smaller features?
Ron continues: Agile projects are supposed to be ready to ship at the end of any iteration, that is, at the end of any week. This means that the program needs to be fully tested by the end of the week in which the most recent features are implemented. Our objective is to have high
confidence at the end of each week that the program has been appropriately tested, and high confidence that it is ready to ship.
“Fully tested” is a similarly unquantifiable measure–unless we choose to evaluate it in some qualitative notion like “high confidence”. “High confidence”, for me, arrives when the team has
- figured out all the questions that we’d like to ask about the product and the risks associated with it
- asked those questions; and
- answered them.
Some risks, many of them, will be addressed by saying “we think the risk is sufficiently low that we’re not even going to run any kind of test that would address the question.”
The kind of risks that are worth considering fall into several groups. I’ll list only a few:
- We want to be able to ensure that we’ve implemented some mechanically decidable aspect of some requirement, and we have a straightforward principle or mechanism by which we can evaluate that. For this risk, an automated programmer test is great, in most cases that I can think of. Create an assertion, put it into a test framework, watch it fail, write code to make it pass, and move on to the next. It’s quick to do, feedback is immediate, and helps to guide design. This is largely a confirmatory function of testing. Let’s do absolutely as much of that as we can, several tests for function. Let’s add tests for that when we discover problems or new risks in one of the approaches that follow. Let’s get the developers to do this work.
- We want to ensure that, in adding or refining some feature, we at least know everything about it that we used to know. See (1), but instead of focusing on the current test, run all of them after each revision of the code. It’s pretty quick to do, feedback is only slightly less immediate than in (1), and the value is quite high. This is also a confirmatory function of testing. Let’s do lots of that too, as much as we can stand. Developers will do most of this.
- We want to ensure that several features, strung together in some order that we’ve anticipated, interact reasonably with each other. See (2), but provide lots of variations. In this circumstance, it takes somewhat longer than the first two to prepare the variations, but since they’re built using the building blocks established in (1) and (2), things aren’t too bad. Feedback is a little longer, but the value is still pretty high. This is largely a confirmatory function, but there is an investigative aspect to it as well. When the investigation discovers something new, we’ll strongly consider automating a test for it–in which case that test becomes confirmatory again. Let’s do plenty of that. Testers will tend to do most of this work, and developers will help.
I think these three categories cover “all tests” in Ron’s senses of the words when he says “automate all tests”.
- We want to interact with respect to user-, platform-, or business-oriented risks, not yet discovered by (1), (2), and (3). This is a problem for automation. First, real users aren’t so deterministic. They’re busy, distractable, impatient, unreasonable, experienced, novice, near-sighted, colour-blind, non-programmers, domain experts, unobservant, creative, non-English-speakers, untrained, plodding, supervisory, incautious, or deaf (among about a million other attributes), who are working with older machines, newer machines, different browsers, different operating systems, different video cards, different printers (with different physical interfaces, buffer capacities, colour options, page-description languages, and “out of paper” messages, etc.) In this case, automation can be very helpful, since we can often alter the environment in which the automation is run at very low cost, and run it again. But automation doesn’t interact with the application in the same way that these users do.
Second, and more importantly, automation doesn’t speculate, change its mind, empathize, judge, get annoyed, project, appreciate, imagine, pause or vary (except to the extent we ask it to); it doesn’t have an esthetic sense; and in particular it doesn’t recognize or identify new risks. (Note that people tend not to do that either if we treat them like machines, which is why we think it’s usually a dumb idea to give people scripts to follow.) The kind of testing that we’d need do here is highly investigative. Let’s do lots of that, too.
This (4) is, to me, a subset of “all tests”, and I don’t think it should be automated. Automation of some kinds can help us to do it faster. First, we’ll be able to do it better, more rapidly, if we don’t have to find all of the bugs that (1), (2), and (3) above can find for it. We’ll be able to do it better if we use tools or probes that could should us things that are otherwise invisible. We’ll be able to get to the point faster if an automated script can deliver us to a stage in the program–some dialog or some decision point that would be tedious to get to by pressing keys or clicking mice. However, this last kind of automation takes time to develop, so we’ll want powerful tools, and/or we’ll want to do it judiciously.
Anyone in the project can do the idea part, but testers are expected to be very good at it, in my world. Almost anyone on the project can do the operate-the-program-to-evaluate-it part, but testers will be expert at getting to the heart of the idea, executing it quickly, and observing it diligently with all kinds of different values and from lots of different user perspectives. Much of this work will be done manually, because we don’t know what we’ll discover as we investigate. Some testers will write scripts or automated tools where they’re useful. Some developers will write some more elaborate tools. At some point, though, we’ll have to make decisions about whether the developers will write test rigs or new features–won’t we?
My question is, to what extent can manual testing play a role in such a project, and to what extent is automated testing necessary in order to accomplish the goals described above? I’m interested in the percentage of tests of each kind, and the percentage of time spent by testers in test automation, manual testing, and any other important usages of time your model may require.
Clearly manual testing has very limited but increasing roles in (1), (2), and (3) respectively, and a dominant role in (4). For the purposes of this discussion, Ron has identified that software development is happening in an Agile context. Rapid feedback is going to be necessary and valuable, and the developers will drive a lot of it. So I presume, in the context that he proposes, that there will be a lot of tests from categories (1), (2), and (3) above. The missing ingredient in this discussion (as in most of the discussions on Agile Testing list, alas) is all of rest of the context.
Any kind of software can be produced in an Agile context. That means that the project could be a word processor, a shoot-em-up game, the fuel injector computer on a car, medical record-keeping, cable TV billing, an online dating service, automation for a convenience store distribution and logistics, rendering software for animated films, air traffic control systems, a class marking system for teachers, editing and timing for TV news readers, control software for missiles… any kind of software, fostering any kind of task that humans want to perform. The software might be developed in a reflective or relaxed timeframe, or under extreme time pressure. The quality of the automated tests might be extremely high, and the skills of the testers very low; the testers might be investgating subtle or unanticipated risks, where the developers have bags of assumptions. The highest risks might be oriented towards performance in an internetworked environment, or they might involve following the users task flow in a unpredictable weather conditions.
Ron notes: I’m interested in at least rough numbers regarding the time needed to test the features, and the underlying software architecture, using manual and automated techniques, plus the estimated time required to automate tests, to run automated tests, and whatever
other rough numbers you may need to tell us about.
Before I could answer that, I’d need some rough numbers regarding the number of developers on the project, the number of lines of code, the number of function points in the product, the number of risks that you want to address, the number of features that you want in the product? Can you tell me about how many risks exist at the beginning of the project, and how many we’ll discover along the way? None of these are meaningful questions, not even in a thought experiment. Any answer would involve reification, and would easily change the moment we learned almost anything new about the project–which we’ll tend to learn every minute of every day, if we’re working well.
Bottom line, to sum up: what percentage of tests will be automated in week N, and what percentage manual? What percentage of time spent automating, and what percentage spent doing manual tests.
So my answer to each of these two questions is 50% of each, plus or minus 50%. Here we return to reification of tests. Think of tests as “critical thoughts about this post”; how many tests have you had so far? Are they of equal weight or equal value? How do you count an idea?
Ron ends his original message by asking: Choose your detailed context and support your answer. Until then, automate more tests.
Well, okay… but more than what?