Yesterday I set up a thought experiment in which we divided our day of testing into three 90-minute sessions. I also made a simplifying assumption that bursts of testing activity representing some equivalent amount of test coverage (I called it a micro-session, or just a “test”) take two minutes. Investigating and reporting a bug that we find costs an additional eight minutes, so a test on its own would take two minutes, and a test that found a problem would take ten.
Yesterday we tested three modules. We found some problems. Today the fixes showed up, so we’ll have to verify them.
Let’s assume that a fix verification takes six minutes. (That’s yet another gross oversimplification, but it sets things up for our little thought experiment.) We don’t just perform the original microsession again; we have to do more than that. We want to make sure that the problem is fixed, but we also want to do a little exploration around the specific case and make sure that the general case is fixed too.
Well, at least we’ll have to do that for Modules B and C. Module A didn’t have any fixes, since nothing was broken. And Team A is up to its usual stellar work, so today we can keep testing Team A’s module, uninterrupted by either fix verifications or by bugs. We get 45 more micro-sessions in today, for a two-day total of 90.
(As in the previous post, if you’re viewing this under at least some versions of IE 7, you’ll see a cool bug in its handling of the text flow around the table. You’ve been warned!)
Module | Fix Verifications | Bug Investigation and Reporting (time spent on tests that find bugs) |
Test Design and Execution (time spent on tests that don’t find bugs) |
New Tests Today | Two-Day Total |
A | 0 minutes (no bugs yesterday) | 0 minutes (no bugs found) | 90 minutes (45 tests) | 45 | 90 |
Team B stayed an hour or so after work yesterday. They fixed the bug that we found, tested the fix, and checked it in. They asked us to verify the fix this afternoon. That costs us six minutes off the top of the session, leaving us 84 more minutes. Yesterday’s trends continue; although Team B is very good, they’re human, and we find another bug today. The test costs two minutes, and bug investigation and reporting costs eight more, for a total of ten. In the remaining 74 minutes, we have time for 37 micro-sessions. That means a total of 38 new tests today—one that found a problem, and 37 that didn’t. Our two-day today for Module B is 79 micro-sessions.
Module | Fix Verifications | Bug Investigation and Reporting (time spent on tests that find bugs) |
Test Design and Execution (time spent on tests that don’t find bugs) |
New Tests Today | Two-Day Total |
A | 0 minutes (no bugs yesterday) | 0 minutes (no bugs found) | 90 minutes (45 tests) | 45 | 90 |
B | 6 minutes (1 bug yesterday) | 10 minutes (1 test, 1 bug) | 74 minutes (37 tests) | 38 | 79 |
Team C stayed late last night. Very late. They felt they had to. Yesterday we found eight bugs, and they decided to stay at work and fix them. (Perhaps this is why their code has so many problems; they don’t get enough sleep, and produce more bugs, which means they have to stay late again, which means even less sleep…) In any case, they’ve delivered us all eight fixes, and we start our session this afternoon by verifying them. Eight fix verifications at six minutes each amounts to 48 minutes. So far as obtaining new coverage goes, today’s 90-minute session with Module C is pretty much hosed before it even starts; 48 minutes—more than half of the session—is taken up by fix verifications, right from the get-go. We have 42 minutes left in which to run new micro-sessions, those little two-minute slabs of test time that give us some equivalent measure of coverage. Yesterday’s trends continue for Team C too, and we discover four problems that require investigation and reporting. That takes 40 of the remaining 42 minutes. Somewhere in there, we spend two minutes of testing that doesn’t find a bug. So today’s results look like this:
Module | Fix Verifications | Bug Investigation and Reporting (time spent on tests that find bugs) |
Test Design and Execution (time spent on tests that don’t find bugs) |
New Tests Today | Two-Day Total |
A | 0 minutes (no bugs yesterday) | 0 minutes (no bugs found) | 90 minutes (45 tests) | 45 | 90 |
B | 6 minutes (1 bug yesterday) | 10 minutes (1 test, 1 bug) | 74 minutes (37 tests) | 38 | 79 |
C | 48 minutes (8 bugs yesterday) | 40 minutes (4 tests, 4 bugs) | 2 minutes (1 test) | 5 | 18 |
Over two days, we’ve been able to obtain only 20% of the test coverage for Module C that we’ve been able to obtain for Module A. We’re still at less than 1/4 of the coverage that we’ve been able to obtain for Module B.
Yesterday, we learned one lesson:
Lots of bugs means reduced coverage, or slower testing, or both.
From today’s results, here’s a second:
Finding bugs today means verifying fixes later, which means even less coverage or even slower testing, or both.
So why is testing taking so long? One of the biggest reasons might be this:
Testing is taking longer than we might have expected or hoped because, although we’ve budgeted time for testing, we lumped into it the time for investigating and reporting problems that we didn’t expect to find.
Or, more generally,
Testing is taking longer than we might have expected or hoped because we have a faulty model of what testing is and how it proceeds.
For managers who ask “Why is testing taking so long?”, it’s often the case that their model of testing doesn’t incorporate the influence of things outside the testers’ control. Over two days of testing, the difference between the quality of Team A’s code and Team C’s code has a profound impact on the amount of uninterrupted test design and execution work we’re able to do. The bugs in Module C present interruptions to coverage, such that (in this very simplified model) we’re able to spend only one-fifth of our test time designing and executing tests. After the first day, we were already way behind; after two days, we’re even further behind. And even here, we’re being optimistic. With a team like Team C, how many of those fixes will be perfect, revealing no further problems and taking no further investigation and reporting time?
And again, those faulty management models will lead to distortion or dysfunction. If the quality of testing is measured by bugs found, then anyone testing Module C will look great, and people testing Module A will look terrible. But if the quality of testing is evaluated by coverage, then the Module A people will look sensational and the Module C people will be on the firing line. But remember, the differences in results here have nothing to do with the quality of the testing, and everything to do with the quality of what is being tested.
There’s a psychological factor at work, too. If our approach to testing is confirmatory, with steps to follow and expected, predicted results, we’ll design our testing around the idea that the product should do this, and that it should behave thus and so, and that testing will proceed in a predictable fashion. If that’s the case, it’s possible—probable, in my view—that we will bias ourselves towards the expected and away from the unexpected. If our approach to testing is exploratory, perhaps we’ll start from the presumption that, to a great degree, we don’t know what we’re going to find. As much as managers, hack statisticians, and process enthusiasts would like to make testing and bug-finding predictable, people don’t know how to do that such that the predictions stand up to human variability and the complexity of the world we live in. Plus, if you can predict a problem, why wait for testing to find it? If you can really predict it, do something about it now. If you don’t have the ability to do that, you’re just playing with numbers.
Now: note again that this has been a thought experiment. For simplicity’s sake, I’ve made some significant distortions and left out an enormous amount of what testing is really like in practice.
- I’ve treated testing activities as compartmentalized chunks of two minutes apiece, treading dangerously close to the unhelpful and misleading model of testing as development and execution of test cases.
- I haven’t looked at the role of setup time and its impact on test design and execution.
- I haven’t looked at the messy reality of having to wait for a product that isn’t building properly.
- I haven’t included the time that testers spend waiting for fixes.
- I haven’t included the delays associated with bugs that block our ability to test and obtain coverage of the code behind them.
- I’ve deliberately ignored the complexity of the code.
- I’ve left out difficulties in learning about the business domain.
- I’ve made a highly simplistic assumptions about the quality and relevance of the testing and the quality and relevance of the bug reports, the skill of the testers in finding and reporting bugs, and so forth.
- And I’ve left out the fact that, as important as skill is, luck always plays a role in finding problems.
My goal was simply to show this:
Problems in a product have a huge impact on our ability to obtain test coverage of that product.
The trouble is that even this fairly simple observation is below the level of visibilty of many managers. Why is it that so many managers fail to notice it?
One reason, I think, is that they’re used to seeing linear processes instead of organic ones, a problem that Jerry Weinberg describes in Becoming a Technical Leader. Linear models “assume that observers have a perfect understanding of the task,” as Jerry says. But software development isn’t like that at all, and it can’t be. By its nature, software development is about dealing with things that we haven’t dealt with before (otherwise there would be no need to develop a new product; we’d just reuse the one we had). We’re always dealing with the novel, the uncertain, the untried, and the untested, so our observation is bound to be imperfect. If we fail to recognize that, we won’t be able to improve the quality and value of our work.
What’s worse about managers with a linear model of development and testing is that “they filter our innovations that the observer hasn’t seen before or doesn’t understand” (again, from Becoming a Technical Leader.) As an antidote for such managers, I’d recommend Perfect Software, and Other Illusions About Testing and Lessons Learned in Software Testing as primers. But mostly I’d suggest that they observe the work of testing. In order to do that well, they may need some help from us, and that means that we need to observe the work of testing too. So over the next little while, I’ll be talking more than usual about Session-Based Test Management, developed initially by James and Jon Bach, which is a powerful set of ideas, tools and processes that aid in observing and managing testing.
Did you intend to change the title from 'why' to 'what' with part 2?
I have followed both the posts and this simple explanation helped me a lot in understanding – Why did the delay happen?
I don't know which team I belong to as I can see myself working with all the three teams.
But I really appreciated your efforts for such a simple explanation for us!
Best Regards,
Amit
testing is my passion!!!
http://bugteaser.blogspot.com/
Thanks for the posting. I saw them last november and I was looking these tables and reference to these from that time forward.
I would like to add few more things why testing takes so long. As you say: "All models are wrong. Some are useful". Any plan is a model, thus it doesn't cover every possible combination and thus it is wrong. Especially big problem is it for managers, who consider plans more-less accurate. This usually comes from their ability to do budgeting. In budgeting plan you can't vary so much and thus they tend to expect all other plans to work the same way. However in testing which envolves much more unknowns (I think that it mostly consits of unknowns) these plans fail for that reason. If you have big unknown then you can't plan based on it. It is like when I throw handful on needles into haystack of unknown size, give you limited time, won't tell you how many I through in and tell you to find all of them with that time. You start looking but whenever you find one you don't know if this is the last one. If you don't know the size of haystack and amount of needles you can't quess how long it might take to find them all.
Another things that tends to affect time used for testing. As new development goes on and new features are added the amount of testing needed to be done increases exponentially. If we have to test how many ways are to place numbers 1,2,3,4 in sequence then there would be 24 possibilities. If we add 5, then there would be 120 possibilites. thus increasing total size of , the amount of testing would increase 500%. Of course this is again just a model with very big simplifications.
Will be waiting for your next posting
Oliver Vilson
Interesting post, it made me think on how I evaluate my own work (in particular my own progress) as a tester.
It also helps me understand why it is so hard to estimate how long it will take to test something.
I like how you’ve used numbers(!) to explain what not to do with numbers, it’s a clever way to deal with people who focus on numbers too much.
[…] http://www.developsense.com/blog/2009/11/what-does-testing-take-so-long-part-2/ […]