Wednesday, January 27, 2010
Exploratory Testing IS Accountable
In this blog post, my colleague James Bach talks about logging and its importance in support of exploratory testing. Logging takes care of one part of the accountability angle, and in an approach like session-based test management (developed by James and his brother Jon), the test notes and the debrief take care of another part of it.
Logging records what happened from the perspective of the test system. Good logging relieves the tester from having to record specific actions in detail; the machine does that. The tester is thereby free to record test notes—a running account of the tester's ideas, questions, and results as he tested, or what happened from the perspective of the tester. Those notes form the meat of the session sheet, which also includes
The session sheet is structured in such a way that it can be scanned by a text-parsing tool written in Perl. The measurements (in particular the coverage measurements) are collected and collated automatically into reports in the form of sortable HTML tables. Session sheets are kept for later review, if they're needed.
If logging in the program isn't available right away, screen recording tools (like BB Test Assistant, Camtasia, Spector, ...) can provide a retrospective account of what happened. (An over-the-shoulder video camera works too.) Note that these tools simply record video (and, optionally, sound—which is good for narration). Programmatic repetition of the session isn't the point. Nor is the point to have a supervisor review the screen capture obsessively; that wastes time, and besides, nobody likes working for Big Brother. The idea is to use the video only when necessary—to aid in recollection where it's needed, and to help in troubleshooting hard-to-reproduce bugs.
We suggest, where it doesn't get in the way, taking the test notes on the same machine as the application under test, and using the text editor window popping up as a way to link the execution of the application with bugs, test ideas or questions. For bugs that don't appear to be state-critical you can also take very brief notes for later followup. Include a time stamp, where the time stamp is an index into the recording; then revisit the recording later if more detail is called for. (In Notepad, you can press F5; in TextPad, Edit/Insert/Time, and it's macroable; other text editors almost certainly have a similar feature.)
Between a charter, the session sheet, the oral report, data files, and the logs and the debrief, it's hard for me to imagine a more accountable way of working. Each aspect of the reporting structure reinforces the others. This is why I get confused when test managers talk about exploratory testing being "unaccountable" or "unmanageable" or "unstructured": when I ask them what accountability and management means to them, they point lamely to a pile of scripts or spreadsheets full of overspecified actions that were written weeks or months before the software was built, or they mumble something about not knowing what goes on in a tester's head.
Any testing approach is manageable when you choose to manage it. If you want structure think about what you mean (maybe this guide to the structures of exploratory testing will help), identify the structures that are important to you, and develop those structures in your testers, in your team, and in your approaches. If you want accountability, provide structures for it (like session-based test management), and then require accountability. If you find that your testers aren't sufficiently skilled, train them and mentor them. (And if you don't know how to do that rapidly and effectively, we can help you.)
If there's something you don't like about the results you're getting, manage: observe what's going on in your system of testing, and put in a control action where you want to change something. If you want to know what's going on in a tester's head, observe her directly and interview her as she's testing; have her pair with another tester or a test lead; critique her notes; debrief her and coach her, until you get the results that you seek. If you want to supercharge the efficiency of your testers, work with the programmers and their managers to focus on testability, with special attention paid to scriptable interfaces, logging, and at least some programmer testing. (It might help to identify the information-hiding and feedback-loop-lengthening costs of the absence of testability). If you find individual debriefs taking too long, or if you want to share information more broadly within the test team, try group debriefs at the end of one day or the beginning of the next. If you want to add features to the reporting protocol, add them; if you want to drop them, drop them. Experiment, re-evaluate, and tune your testing as you see fit.
And if you have a more manageable and accountable approach than this for fostering the discovery of important problems in the product, please let us know (me, or James, or Jon). We'd really like to hear about it.
Logging records what happened from the perspective of the test system. Good logging relieves the tester from having to record specific actions in detail; the machine does that. The tester is thereby free to record test notes—a running account of the tester's ideas, questions, and results as he tested, or what happened from the perspective of the tester. Those notes form the meat of the session sheet, which also includes
- coverage data
- who did the testing
- when they started
- how long it took
- the proportion of time spent on test design and execution, bug investigation and reporting, and setup
- the proportion of the time spent on on-charter work vs. opportunity work
- references to log files, data files, and related material such as scenarios, help files, specifications, standards, and so forth
- and, of course, bugs discovered and issues identified.
The session sheet is structured in such a way that it can be scanned by a text-parsing tool written in Perl. The measurements (in particular the coverage measurements) are collected and collated automatically into reports in the form of sortable HTML tables. Session sheets are kept for later review, if they're needed.
If logging in the program isn't available right away, screen recording tools (like BB Test Assistant, Camtasia, Spector, ...) can provide a retrospective account of what happened. (An over-the-shoulder video camera works too.) Note that these tools simply record video (and, optionally, sound—which is good for narration). Programmatic repetition of the session isn't the point. Nor is the point to have a supervisor review the screen capture obsessively; that wastes time, and besides, nobody likes working for Big Brother. The idea is to use the video only when necessary—to aid in recollection where it's needed, and to help in troubleshooting hard-to-reproduce bugs.
We suggest, where it doesn't get in the way, taking the test notes on the same machine as the application under test, and using the text editor window popping up as a way to link the execution of the application with bugs, test ideas or questions. For bugs that don't appear to be state-critical you can also take very brief notes for later followup. Include a time stamp, where the time stamp is an index into the recording; then revisit the recording later if more detail is called for. (In Notepad, you can press F5; in TextPad, Edit/Insert/Time, and it's macroable; other text editors almost certainly have a similar feature.)
Between a charter, the session sheet, the oral report, data files, and the logs and the debrief, it's hard for me to imagine a more accountable way of working. Each aspect of the reporting structure reinforces the others. This is why I get confused when test managers talk about exploratory testing being "unaccountable" or "unmanageable" or "unstructured": when I ask them what accountability and management means to them, they point lamely to a pile of scripts or spreadsheets full of overspecified actions that were written weeks or months before the software was built, or they mumble something about not knowing what goes on in a tester's head.
Any testing approach is manageable when you choose to manage it. If you want structure think about what you mean (maybe this guide to the structures of exploratory testing will help), identify the structures that are important to you, and develop those structures in your testers, in your team, and in your approaches. If you want accountability, provide structures for it (like session-based test management), and then require accountability. If you find that your testers aren't sufficiently skilled, train them and mentor them. (And if you don't know how to do that rapidly and effectively, we can help you.)
If there's something you don't like about the results you're getting, manage: observe what's going on in your system of testing, and put in a control action where you want to change something. If you want to know what's going on in a tester's head, observe her directly and interview her as she's testing; have her pair with another tester or a test lead; critique her notes; debrief her and coach her, until you get the results that you seek. If you want to supercharge the efficiency of your testers, work with the programmers and their managers to focus on testability, with special attention paid to scriptable interfaces, logging, and at least some programmer testing. (It might help to identify the information-hiding and feedback-loop-lengthening costs of the absence of testability). If you find individual debriefs taking too long, or if you want to share information more broadly within the test team, try group debriefs at the end of one day or the beginning of the next. If you want to add features to the reporting protocol, add them; if you want to drop them, drop them. Experiment, re-evaluate, and tune your testing as you see fit.
And if you have a more manageable and accountable approach than this for fostering the discovery of important problems in the product, please let us know (me, or James, or Jon). We'd really like to hear about it.
Sunday, January 17, 2010
Disposable Time
In our Rapid Testing class, James Bach and I like to talk about an underappreciated tester resource: disposable time. Disposable time is the time that you can afford to waste without getting into trouble.
Now, we want to be careful about what we mean by "waste", here. It's not that you want to waste the time. You probably want to spend it wisely. It's just that you won't suffer harm if you do happen to waste it. Disposable time is to your working hours what disposable income is to your total personal income. (In fact, even that's not quite correct, strictly speaking; we actually mean discretionary income: the money that's left over after you've paid for all of the things that you must pay for—food, shelter, basic clothing, medical, and tax expenses. The money that people call disposable income is more properly called discretionary income; as Wikipedia says, "the amount of 'play money' left to spend or save." Oh well. We'll go with the incorrect but popular interpretation of "disposable" here. )
You're never being scrutinized every minute of every day. Practically everyone has a few moments when no one important is watching. In that time, you might
try a tiny test that hasn't been prescribed.
try putting in a risky value instead of a safe value.
pretend to change your mind, or to make a mistake, and go back a step or two; users make mistakes, and error handling and recovery are often the most vulnerable parts of the program.
take a couple of moments to glance at some background information relevant to the work that you're doing.
write in your journal.
see if any of your colleagues in technical support have a hot issue that can inform some test ideas.
steal a couple of moments to write a tiny, simple program that will save you some time; use the saved time and the learning to extend your programming skills so that you can solve increasingly complex programming problems.
spend an extra couple of minutes at the end of a coffee break befriending the network support people.
sketch a workflow diagram for your product, and at some point show it to an expert, and ask if you've got it right.
snoop around in the support logs for the product.
add a few more lines to a spreadsheet of data values
help someone else solve a problem that they're having.
chat with a programmer about some aspect of the technology.
even if you do nothing else, at least pause and look around the screen as you're testing. Take a moment or two to recognize a new risk and write down a new question or a new test idea. Report on that idea later on; ask your test lead, your manager, or a programmer, or a product owner if it's a risk worth investigating. Hang on to your notes. When someone asks "Why didn't you find that bug," you may have an answer for them.
If it turns out that you've made a bad investment, oh well. By defintion, however large or small the period, disposable is time that you can afford to blow without suffering consequences.
On the other hand, you may have made a good investment. You may have found a bug, or recognized a new risk, or learned something important, or helped someone out of a jam, or built on a professional relationship, or surprised and impressed your manager. You may have done all of these things at once. Even if you feel like you've wasted your time, you've probably learned enough to insulate yourself from wasting more time in the same way. When you discover that an alley is blind, you're unlikely to return there when there are other things to explore.
In The Black Swan, Nassim Nicholas Taleb proposes an investment strategy wherein you put the vast bulk of your money, your nest egg, in very safe securities. You then invest a small amount—an amount that you can afford to lose—in very speculative bets that have a chance of providing a spectacular return. He call that very improbable high-return event a positive Black Swan. Your nest egg is like the part of your job that you must accomplish. Disposable time is like your Black Swan fund; you may lost it all, but you have a shot at a big payoff. But there's an important difference, too: since learning is an almost inevitable product of using your disposable time, there's almost always some modest positive outcome.
We encourage test managers to allow disposable time explicitly for their testers. As an example, Google provides its staff with Innovation Time Off. Engineers are encouraged to spend 20% of their time pursuing projects that interest them. That sounds like a waste, until one learns that Google projects like Gmail, Google News, Orkut, and AdSense came of these investments.
What Google may not know is that even within the other 80% of the time that's ostensibly on mission, people still have, and are still using, non-explicit disposable time. People have that almost everywhere, whether they have explicit disposable time or not.
If you're working in an environment where you're being watched so closely that none of this is possible, and where you're punished for learning or seeking problems, my advice is to make sure that slavery has been abolished in your jurisdiction. Then find a job where your testing skills are valued and your managers aren't wasting their time by watching your work instead of doing theirs. But when you've got a few moments to fill, fill them and learn something!
Now, we want to be careful about what we mean by "waste", here. It's not that you want to waste the time. You probably want to spend it wisely. It's just that you won't suffer harm if you do happen to waste it. Disposable time is to your working hours what disposable income is to your total personal income. (In fact, even that's not quite correct, strictly speaking; we actually mean discretionary income: the money that's left over after you've paid for all of the things that you must pay for—food, shelter, basic clothing, medical, and tax expenses. The money that people call disposable income is more properly called discretionary income; as Wikipedia says, "the amount of 'play money' left to spend or save." Oh well. We'll go with the incorrect but popular interpretation of "disposable" here. )
You're never being scrutinized every minute of every day. Practically everyone has a few moments when no one important is watching. In that time, you might
If it turns out that you've made a bad investment, oh well. By defintion, however large or small the period, disposable is time that you can afford to blow without suffering consequences.
On the other hand, you may have made a good investment. You may have found a bug, or recognized a new risk, or learned something important, or helped someone out of a jam, or built on a professional relationship, or surprised and impressed your manager. You may have done all of these things at once. Even if you feel like you've wasted your time, you've probably learned enough to insulate yourself from wasting more time in the same way. When you discover that an alley is blind, you're unlikely to return there when there are other things to explore.
In The Black Swan, Nassim Nicholas Taleb proposes an investment strategy wherein you put the vast bulk of your money, your nest egg, in very safe securities. You then invest a small amount—an amount that you can afford to lose—in very speculative bets that have a chance of providing a spectacular return. He call that very improbable high-return event a positive Black Swan. Your nest egg is like the part of your job that you must accomplish. Disposable time is like your Black Swan fund; you may lost it all, but you have a shot at a big payoff. But there's an important difference, too: since learning is an almost inevitable product of using your disposable time, there's almost always some modest positive outcome.
We encourage test managers to allow disposable time explicitly for their testers. As an example, Google provides its staff with Innovation Time Off. Engineers are encouraged to spend 20% of their time pursuing projects that interest them. That sounds like a waste, until one learns that Google projects like Gmail, Google News, Orkut, and AdSense came of these investments.
What Google may not know is that even within the other 80% of the time that's ostensibly on mission, people still have, and are still using, non-explicit disposable time. People have that almost everywhere, whether they have explicit disposable time or not.
If you're working in an environment where you're being watched so closely that none of this is possible, and where you're punished for learning or seeking problems, my advice is to make sure that slavery has been abolished in your jurisdiction. Then find a job where your testing skills are valued and your managers aren't wasting their time by watching your work instead of doing theirs. But when you've got a few moments to fill, fill them and learn something!
Thursday, January 07, 2010
Defect Detection Efficiency: An Evaluation of a Research Study
Over the last several months, B.J. Rollison has been delivering presentations and writing articles and blog posts in which he cites a paper Defect Detection Efficiency: Test Case Based vs. Exploratory Testing, [DDE2007] by Juha Itkonen, Mika V. Mäntylä and Casper Lassenius (First International Symposium on Empirical Software Engineering and Measurement, pp. 61-70; the paper can be found here).
I appreciate the authors’ intentions in examining the efficiency of exploratory testing. That said, the study and the paper that describes it have some pretty serious problems.
Yet the approaches can be blended. James points out that the distinguishing attribute in exploratory and scripted approaches is the presence or absence of loops. The most extreme scripted testing would follow a strictly linear approach; design would be done at the beginning of the project; design would be followed by execution; tests would be performed in a prescribed order; later cycles of testing would use exactly the same tests for regression
Let's get more realistic, though. Consider a tester with a list of tests to perform, each using a data-focused automated script to address a particular test idea. A tester using a highly scripted approach would run that script, observe and record the result, and move on to the next test. A tester using a more exploratory approach would use the list as a point of departure, but upon observing an interesting result might choose to perform a different test from the next one on the list; to alter the data and re-run the test; to modify the automated script; or to abandon that list of tests in favour of another one. That is, the tester's actions in the moment would not be directed by earlier ideas, but would be informed by them. Scripted approaches set out the ideas in advance, and when new information arrives, there's a longer loop between discovery and the incorporation of that new information into the testing cycle. The more exploratory the approach, the shorter the loop. Exploratory approaches do not preclude the use of prepared test ideas, although both James and I would argue that our craft, in general, places excessive emphasis on test cases and focusing techniques at the expense of more general heuristics and defocusing techniques.
The point of all this is that neither exploratory testing nor scripted approaches are testing techniques, nor bodies of testing techniques. They're approaches that can be applied to any testing technique.
To be fair to the authors of [DDE2007], since publication of their paper there has been ongoing progress in the way that many people—in particular Cem Kaner, James Bach, and I—articulate these ideas, but the fundamental notions haven’t changed significantly.
The authors of [DDE2007] appear also to have omitted literature on the subject of exploration and its role in learning. Yet there is significant material on the subject, in both popular and more academic literature. Examples here include Collaborative Discovery in a Scientific Domain (Okada and Simon; note that the subjects are testing software); Exploring Science: The Cognition and Development of Discovery Processes (David Klahr and Herbert Simon); Plans and Situated Actions (Lucy Suchman); Play as Exploratory Learning (Mary Reilly); How to Solve It (George Polya); Simple Heuristics That Make Us Smart (Gerg Gigerenzer); Sensemaking in Organizations (Karl Weick); Cognition in the Wild (Edward Hutchins); The Social Life of Information (Paul Duguid and John Seely Brown); Sciences of the Artificial (Herbert Simon); all the way back to A System of Logic, Ratiocinative and Inductive (John Stuart Mill, 1843).
These omissions are reflected in the study and the analysis of the experiment, and that leads to a common problem in such studies: heuristics and other important cognitive structures in exploration are treated as mysterious and unknowable. For example, the authors say, “For the exploratory testing sessions we cannot determine if the subjects used the same testing principles that they used for designing the documented test cases or if they explored the functionality in pure ad-hoc manner. For this reason it is safer to assume the ad-hoc manner to hold true.” [DDE2007, p. 69] Why assume? At the very least, one could at least observe the subjects and debrief them, asking about their approaches. In fact, this is exactly the role that the test lead fulfills in the practice of skilled exploratory testing. And why describe the principles only as "ad-hoc"? It's not like the principles can't be articulated. I talk about oracle heuristics in this article, and talk about stopping heuristics here; Kaner's Black Box Software Testing course talks about test design heuristics; James Bach's work talks about test strategy heuristics (especially here); James Whittaker's books talk about heuristics for finding vulnerabilities...
Finding bugs is important, finding many bugs is important, and finding important bugs is especially important. Yet bugs and bug reports are by no means the only products of testing. The study largely ignores the other forms of information that testing may provide.
However, even if we decide that bug-finding is the only worthwhile effect of a test, two equally effective tests might not be equally efficient. I would argue that efficiency is a relationship between effectiveness and cost. An activity is more efficient if it has the same effectiveness at lower cost in terms of time, money, or resources. This leads to what is by far the most serious problem in the paper...
“First, we identify a lack of research on manual test execution from other than the test case design point of view. It is obvious that focusing only on test case design techniques does not cover many important aspects that affect manual testing. Second, our data showed no benefit in terms of defect detection efficiency of using predesigned test cases in comparison to an exploratory testing approach. Third, there appears to be no big differences in the detected defect types, severities, and in detection difficulty. Fourth, our data indicates that test case based testing produces more false defect reports.”
I would offer to add a few other conclusions. The first is from the authors themselves, but is buried on page 68: “Based on the results of this study, we can conclude that an exploratory approach could be efficient, especially considering the average 7 hours of effort the subjects used for test case design activities.” Or, put another way,
The authors say, “We could not analyze how good test case designers our subjects were and how much the quality of the test cases affected the results and how much the actual test execution aproach.” Actually, they could have analyzed that. It’s just that they didn’t. Pity.
I appreciate the authors’ intentions in examining the efficiency of exploratory testing. That said, the study and the paper that describes it have some pretty serious problems.
Some Background on Exploratory Testing
It is common for people writing about exploratory testing to consider it a technique, rather than an approach. “Exploratory” and “scripted” are opposite poles on a continuum. At one pole, exploratory testing integrates test design, test execution, result interpretation, and learning into a single person at the same time. At the other, scripted testing separates test design and test execution by time, and typically (although not always) by tester, and mediates information about the designer’s intentions by way of a document or a program.As James Bach has recently pointed out, the exploratory and scripted poles are like “hot” and “cold”. Just as there can be warmer or cooler water, there are intermediate gradations to testing approaches. The extent to which an approach is exploratory is the extent to which the tester, rather than the script, is in immediate control of the activity. A strongly scripted approach is one in which ideas from someone else, or ideas from some point in the past, govern the tester’s actions. Test execution can be very scripted, as when the tester is given an explicit set of steps to follow and observations to make; somewhat scripted, as when the tester is given explicit instruction but is welcome or encouraged to deviate from it; or very exploratory, in which the tester is given a mission or charter, and is mandated to use whatever information and ideas are available, even those that have been discovered in the present moment.Yet the approaches can be blended. James points out that the distinguishing attribute in exploratory and scripted approaches is the presence or absence of loops. The most extreme scripted testing would follow a strictly linear approach; design would be done at the beginning of the project; design would be followed by execution; tests would be performed in a prescribed order; later cycles of testing would use exactly the same tests for regression
Let's get more realistic, though. Consider a tester with a list of tests to perform, each using a data-focused automated script to address a particular test idea. A tester using a highly scripted approach would run that script, observe and record the result, and move on to the next test. A tester using a more exploratory approach would use the list as a point of departure, but upon observing an interesting result might choose to perform a different test from the next one on the list; to alter the data and re-run the test; to modify the automated script; or to abandon that list of tests in favour of another one. That is, the tester's actions in the moment would not be directed by earlier ideas, but would be informed by them. Scripted approaches set out the ideas in advance, and when new information arrives, there's a longer loop between discovery and the incorporation of that new information into the testing cycle. The more exploratory the approach, the shorter the loop. Exploratory approaches do not preclude the use of prepared test ideas, although both James and I would argue that our craft, in general, places excessive emphasis on test cases and focusing techniques at the expense of more general heuristics and defocusing techniques.
The point of all this is that neither exploratory testing nor scripted approaches are testing techniques, nor bodies of testing techniques. They're approaches that can be applied to any testing technique.
To be fair to the authors of [DDE2007], since publication of their paper there has been ongoing progress in the way that many people—in particular Cem Kaner, James Bach, and I—articulate these ideas, but the fundamental notions haven’t changed significantly.
Literature Review
While the authors do cite several papers on testing and test design techniques, they do not cite some of the more important and relevant publications on the exploratory side. Examples of such literature include “Measuring the Effectiveness of Software Testers” (Kaner, 2003; slightly updated in 2006); and “Software engineering metrics: What do they measure and how do we know?” (Kaner & Bond, 2004); and "Inefficiency and Ineffectiveness of Software Testing: A Key Problem in Software Engineering” (Kaner 2006; to be fair to the authors, this paper may have been published too late to inform [DDE2007]), General Functionality and Stability Test Procedure (for Microsoft Windows 2000 Application Certification) (Bach, 2000); Satisfice Heuristic Test Strategy Model (Bach, 2000); How To Break Software (Whittaker, 2002).The authors of [DDE2007] appear also to have omitted literature on the subject of exploration and its role in learning. Yet there is significant material on the subject, in both popular and more academic literature. Examples here include Collaborative Discovery in a Scientific Domain (Okada and Simon; note that the subjects are testing software); Exploring Science: The Cognition and Development of Discovery Processes (David Klahr and Herbert Simon); Plans and Situated Actions (Lucy Suchman); Play as Exploratory Learning (Mary Reilly); How to Solve It (George Polya); Simple Heuristics That Make Us Smart (Gerg Gigerenzer); Sensemaking in Organizations (Karl Weick); Cognition in the Wild (Edward Hutchins); The Social Life of Information (Paul Duguid and John Seely Brown); Sciences of the Artificial (Herbert Simon); all the way back to A System of Logic, Ratiocinative and Inductive (John Stuart Mill, 1843).
These omissions are reflected in the study and the analysis of the experiment, and that leads to a common problem in such studies: heuristics and other important cognitive structures in exploration are treated as mysterious and unknowable. For example, the authors say, “For the exploratory testing sessions we cannot determine if the subjects used the same testing principles that they used for designing the documented test cases or if they explored the functionality in pure ad-hoc manner. For this reason it is safer to assume the ad-hoc manner to hold true.” [DDE2007, p. 69] Why assume? At the very least, one could at least observe the subjects and debrief them, asking about their approaches. In fact, this is exactly the role that the test lead fulfills in the practice of skilled exploratory testing. And why describe the principles only as "ad-hoc"? It's not like the principles can't be articulated. I talk about oracle heuristics in this article, and talk about stopping heuristics here; Kaner's Black Box Software Testing course talks about test design heuristics; James Bach's work talks about test strategy heuristics (especially here); James Whittaker's books talk about heuristics for finding vulnerabilities...
Tester Experience
The study was performed using testers who were, in the main, novices. “27 subjects had no previous experience in software engineering and 63 had no previous experience in testing. 8 subjects had one year and 4 subjects had two years testing experience. Only four subjects reported having some sort of training in software testing prior to taking the course.” ([DDE2007], p. 65 my emphasis) Testing—especially testing using an exploratory approach—is a complex cognitive activity. If one were to perform a study on novice jugglers, one would likely find that they drop an approximately equal number of objects, whether they were juggling balls or knives.Tester Training
The paper notes that “subjects were trained to use the test case design techniques before the experiment.” However, the paper does not make note of any specific training in heuristics or exploratory approaches. That might not be surprising in light of the weaknesses on the exploratory side of the literature review. My experience, that of James Bach, and anecdotal reports from our clients suggests that even a brief training session can greatly increase the effectiveness of an exploratory approach.Cycles of Testing
Testing happens in cycles. In a strongly scripted testing, the process tends to the linear. All tests are designed up front; then those tests are executed; then testing for that area is deemed to be done. In subsequent cycles, the intention is to repeat the original tests to make sure that bugs are fixed to check for regression. By contrast, exploratory testing is an organic and iterative process. In an exploratory approach, the same area might be visited several times, such that learning from early “reconnaissance” sessions informs further exploration in subsequent “deep coverage” sessions. The learning from those (and from ideas about bugs that have been found and fixed) informs “wrap-up sessions”, in which tests may be repeated, varied, or cut from new cloth. No allowance is made for information and learning obtained during one round of testing to inform later rounds. Yet such information and learning is typically of great value.Quantitative vs. Qualitative Analysis
In the study, there is a great deal of emphasis placed on quantifying results, on experimental and on mathematical rigour. However, such rigour may be misplaced when the products of testing are qualitative, rather than quantitative.Finding bugs is important, finding many bugs is important, and finding important bugs is especially important. Yet bugs and bug reports are by no means the only products of testing. The study largely ignores the other forms of information that testing may provide.
- The tester might learn something about test design, and feed that learning into her approach toward test execution, or vice versa. The value of that learning might be realized immediately (as in an exploratory approach) or over time (as in a scripted approach).
- The tester, upon executing a test, might recognize a new risk or missing coverage. That recognition might inform ideas about the design and choices of subsequent tests. In a scripted approach, that’s a relatively long loop. In an exploratory approach, upon noticing a new risk, the tester might choose to note findings for later on. On the other hand, the discovery could be cashed immediately: she might choose to repeat the test, she might perform a variation on the same test, or might alter her strategy to follow a different line of investigation. Compared to a scripted approach, the feedback loop between discovery and subsequent action is far shorter. The study ignores the length of the feedback loops.
- In addition to discovering bugs that threaten the value of the product, the tester might discover issues—problems that threaten the value of the testing effort or the development project overall.
- The tester who takes an exploratory approach may choose to investigate a bug or an issue that she has found. This may reduce the total bug count, but in some contexts may be very important to the tester’s client. In such cases, the quality of the investigation, rather than the number of bugs found, would be important.
“Efficiency” vs. “Effectiveness”
The study takes a very parsimonious view of “efficiency”, and further confuses “efficiency” with “effectiveness”. Two tests are equally effective if they produce the same effects. The discovery of a bug is certainly an important effect of a test. Yet there are other important effects too, as noted above, but they're not considered in the study.However, even if we decide that bug-finding is the only worthwhile effect of a test, two equally effective tests might not be equally efficient. I would argue that efficiency is a relationship between effectiveness and cost. An activity is more efficient if it has the same effectiveness at lower cost in terms of time, money, or resources. This leads to what is by far the most serious problem in the paper...
Script Preparation Time Is Ignored
The authors’ evaluation of “efficiency” leaves out the preparation time for the scripted tests! The paper says that the exploratory testing sessions took 90 minutes for design, preparation, and execution. The preparation for the scripted tests took seven hours, where the scripted test execution sessions took 90 minutes, for a total of 8.5 hours. This fact is not highlighted; indeed, it is not mentioned until the eighth of ten pages. (page 68). In journalism, that would be called burying the lead. In terms of bug-finding alone, the authors suggest that the results were of equivalent effectiveness, yet the scripted approach took, in total, 5.6 times longer than the exploratory approach. What other problems could the exploratory testing approaches find given seven additional hours?Conclusions
The authors offer these four conclusions at the end of the paper:“First, we identify a lack of research on manual test execution from other than the test case design point of view. It is obvious that focusing only on test case design techniques does not cover many important aspects that affect manual testing. Second, our data showed no benefit in terms of defect detection efficiency of using predesigned test cases in comparison to an exploratory testing approach. Third, there appears to be no big differences in the detected defect types, severities, and in detection difficulty. Fourth, our data indicates that test case based testing produces more false defect reports.”
I would offer to add a few other conclusions. The first is from the authors themselves, but is buried on page 68: “Based on the results of this study, we can conclude that an exploratory approach could be efficient, especially considering the average 7 hours of effort the subjects used for test case design activities.” Or, put another way,
- During test execution
- unskilled testers found the same number of problems, irrespective of the approach that they took, but
- preparation of scripted tests increased testing time approximately by a factor of five
- and appeared to add no significant value.
The authors say, “We could not analyze how good test case designers our subjects were and how much the quality of the test cases affected the results and how much the actual test execution aproach.” Actually, they could have analyzed that. It’s just that they didn’t. Pity.
Saturday, December 26, 2009
Handling an Overstructured Mission
Excellent testers recognize that excellent testing is not merely a process of confirmation, verification, and validation. Excellent testing is a process of exploration,discovery, investigation, and learning.
A correspondent that I consider to be an excellent tester (let's call him Al) works in an environment where he is obliged by his managers to execute overly structured, highly confirmatory scripted tests. Al wrote to me recently, saying that he now realizes why that's frustrating for him: every time he runs through a scripted test, he gets five new ideas that he wants to act upon. I think that's a wonderful thing, but when he acts on those ideas and fulfills his implicit mission (finding important problems in the product), it diverts him from his explicit mission (to complete some number of scripted tests per day), and he gets heat from his manager about that. At the end of a couple of days, the manager wants to know why Al is behind schedule—even if Al has revealed important problems along the way—because the manager is focused on test effort in terms of test cases completed, rather than test ideas explored.
I suggested to Al (as I suggest to you, if you're in that kind of situation) a workaround: don't act on the new test ideas; but do note them. Jot them down in handwritten notes or a text file, and especially note your motivation for them—ideas about risk, coverage, oracles, strategies, and the like. Tell your test manager or test lead that you didn't run tests associated with those ideas, and then ask, "Are you okay with us NOT running them?"
In addition, check in with your manager more often than once every two days. Deliver a report, including new ideas, at one- to two-hour intervals. If direct personal contact isn't available, try instant messages or email. If those don't work, batch them, but note the time at which you started and/or stopped a burst of testing activity.
Al was excited about that. "Wow!" he said. "That also means defects arising from the new ideas are noted down. Currently, my management is under the impression that test cases are the things that reveal problems, but it's my acting on my test ideas that really reveals the problems." He also noted, "There's another bad that comes from that. If the test cases don't reveal problems, we take the problems that we've found and create a test case for them so that those problems aren't missed next time." I've seen that happen a lot, too. On the face of it, it doesn't sound like a bad idea—except that specific problems that are fixed and verified tend to remain fixed. Repeating those tests is an opportunity cost against new tests that would reveal previously undiscovered problems.
So: the idea here is to make certain aspects of our work visible. Scripted test cases often reveal problems as those cases are developed. When those problems get fixed, the script loses its power. Thus variation on the script, rather than following the script rigourously, tends to reveal the actual problem. However, unless we're clear that this is happening, managers will mistakenly give credit to the wrong thing—namely, the script—rather than to the mindset and the skill set of the tester.
A correspondent that I consider to be an excellent tester (let's call him Al) works in an environment where he is obliged by his managers to execute overly structured, highly confirmatory scripted tests. Al wrote to me recently, saying that he now realizes why that's frustrating for him: every time he runs through a scripted test, he gets five new ideas that he wants to act upon. I think that's a wonderful thing, but when he acts on those ideas and fulfills his implicit mission (finding important problems in the product), it diverts him from his explicit mission (to complete some number of scripted tests per day), and he gets heat from his manager about that. At the end of a couple of days, the manager wants to know why Al is behind schedule—even if Al has revealed important problems along the way—because the manager is focused on test effort in terms of test cases completed, rather than test ideas explored.
I suggested to Al (as I suggest to you, if you're in that kind of situation) a workaround: don't act on the new test ideas; but do note them. Jot them down in handwritten notes or a text file, and especially note your motivation for them—ideas about risk, coverage, oracles, strategies, and the like. Tell your test manager or test lead that you didn't run tests associated with those ideas, and then ask, "Are you okay with us NOT running them?"
In addition, check in with your manager more often than once every two days. Deliver a report, including new ideas, at one- to two-hour intervals. If direct personal contact isn't available, try instant messages or email. If those don't work, batch them, but note the time at which you started and/or stopped a burst of testing activity.
Al was excited about that. "Wow!" he said. "That also means defects arising from the new ideas are noted down. Currently, my management is under the impression that test cases are the things that reveal problems, but it's my acting on my test ideas that really reveals the problems." He also noted, "There's another bad that comes from that. If the test cases don't reveal problems, we take the problems that we've found and create a test case for them so that those problems aren't missed next time." I've seen that happen a lot, too. On the face of it, it doesn't sound like a bad idea—except that specific problems that are fixed and verified tend to remain fixed. Repeating those tests is an opportunity cost against new tests that would reveal previously undiscovered problems.
So: the idea here is to make certain aspects of our work visible. Scripted test cases often reveal problems as those cases are developed. When those problems get fixed, the script loses its power. Thus variation on the script, rather than following the script rigourously, tends to reveal the actual problem. However, unless we're clear that this is happening, managers will mistakenly give credit to the wrong thing—namely, the script—rather than to the mindset and the skill set of the tester.
Friday, December 18, 2009
Selena Delesie on Exploratory Test Chartering
A little while ago, I mentioned that I'd be writing more about session-based test management (SBTM). For me, one thing that's great about having a community of students and colleagues is that they can save me lots of time and work.
Selena Delesie took the Rapid Software Testing course from me a few years back (that is, she was a student). Since then, she has taken Rapid Testing and its practices, including SBTM, and made them her own. This is exactly what James Bach and I aim for. We want to help testers, test leads, and managers realize the the most important factor in excellent testing, bar none, is the mindset and the skill set of the individual tester. This means taking the ideas in the course and internalizing them, adopting them, developing them, experimenting with them, altering them to fit your context. We get people started by making them feel powerful, mostly by helping them to recognize the power and skills that they already have. Then, after the class, they can feel confident in doing the heavy lifting on their own. Selena is by no means our only student who has done that, but she's a paradigmatic example of what's possible.
This post from her blog is a nice account of her appreciation of exploratory testing and of her career growth. That on its own would be good enough, but she's now blogged a post on chartering sessions, and it's excellent. It identifies some of the common traps and misconceptions about chartering, and provides some sharp advice on how to avoid them. It talks not merely about how to charter, but how to do it in a way that affords the tester the freedom and responsibility to do his or her best work. Highest recommendation.
Selena Delesie took the Rapid Software Testing course from me a few years back (that is, she was a student). Since then, she has taken Rapid Testing and its practices, including SBTM, and made them her own. This is exactly what James Bach and I aim for. We want to help testers, test leads, and managers realize the the most important factor in excellent testing, bar none, is the mindset and the skill set of the individual tester. This means taking the ideas in the course and internalizing them, adopting them, developing them, experimenting with them, altering them to fit your context. We get people started by making them feel powerful, mostly by helping them to recognize the power and skills that they already have. Then, after the class, they can feel confident in doing the heavy lifting on their own. Selena is by no means our only student who has done that, but she's a paradigmatic example of what's possible.
This post from her blog is a nice account of her appreciation of exploratory testing and of her career growth. That on its own would be good enough, but she's now blogged a post on chartering sessions, and it's excellent. It identifies some of the common traps and misconceptions about chartering, and provides some sharp advice on how to avoid them. It talks not merely about how to charter, but how to do it in a way that affords the tester the freedom and responsibility to do his or her best work. Highest recommendation.
Monday, December 14, 2009
Structures of Exploratory Testing: Resources
In a Webinar that he did for uTest on December 10, James Whittaker mused aloud about what a great idea it would be to structure exploratory testing and capture ideas about it in a repository for sharing with others. It seems to me that one ideal version of that would take the form of a bibliography in a book about exploratory testing, but apparently that's not available. Yet I digress.
The fact is, people have been doing exactly that for years. And I do like the idea of having a repository and sharing, so here's a survey of some exploratory testing structures and some writing about them that I hope people will find helpful. There are some excellent books out there, but for now, these ones are all online and free. Expect updates.
The fact is, people have been doing exactly that for years. And I do like the idea of having a repository and sharing, so here's a survey of some exploratory testing structures and some writing about them that I hope people will find helpful. There are some excellent books out there, but for now, these ones are all online and free. Expect updates.
- Evolving Work Products, Skills and Tactics, ET Polarities, and Test Strategy. James Bach, Jon Bach, and I authored the latest version of the Exploratory Skills and Dynamics list. This is a kind of evolving master list of exploratory testing structures. James describes it here.
- Oracles. The HICCUPPS consistency heuristics, which James Bach initiated and which I wrote about in this article for Better Software in 2005. (Actually, at the time it was only HICCUPP—History, Image, Comparable Products, Claims, User Expectations, Purpose, Product—but since then we've also added S, for Standards and Statutes. Mike Kelly also talks about HICCUPP here.
- Test Strategy. James Bach's Heuristic Test Strategy Model isn't restricted to exploratory approaches, but certainly helps to guide and structure them.
- Data Type Attacks, Web Tests, Testing Wisdom, Heuristics, and Frameworks. Elisabeth Hendrickson's Test Heuristics Cheat Sheet is a rich set of guideword heuristics and helpful reference information.
- Context Factors, Information Objectives. Cem Kaner most recently delivered his Tutorial on Exploratory Testing for the QAI Quest Conference in Chicago, 2008. There's a similar, but not identical talk here.
- Quick Tests. In our Rapid Software Testing course, James Bach and I talk about quick tests. The course notes are available for free. Fire up Acrobat and search for "Quick Tests".
- Coverage (specific). Michael Hunter's You Are Not Done Yet is a detailed set of coverage ideas to help prompt further exploration when you think you're done.
- Coverage (general). James Bach wrote this article in 2001, in which he summarizes test coverage ideas under the mnemonic "San Francisco Depot."—Structure, Function, Data, Platform, and Operations. Several years later, I convinced him to add an element to the list, so now it's "San Francisco Depot. The last T is for...
- Time. I realized a few years ago that some guideword heuristics might help us to pay attention to the ways in which products related to time, and vice versa. That turned into a Better Software article called "Time for New Test Ideas".
- Tours. Mike Kelly's FCC CUTS VIDS Touring Heuristics (note the date) provides a set of structured approaches for touring the application.
- Stopping Heuristics. There are structures to deciding when to stop a given test, a line of investigation, or a test cycle. I catalogued them here, and Cem Kaner made a vital addition here.
- Accountability, Reporting Progress. James and Jon Bach's description of Session-Based Test Management is a set of structures for making exploratory testing sessions more accountable.
- Procedure. The General Functionality and Stability Test Procedure. It was designed for Microsoft in the late 1990s by James Bach, and may be the first documented procedure to guide exploratory test execution and investigation.
- Emotions. I gave a talk on emotions as powerful pointers to test oracles at STAR West in 2007. That helped to inspire some ideas about...
- Noticing, Observation. At STAR East 2009, I did a keynote talk on noticing, which can be important for exploratory test execution. The talk introduces a number of areas in which we might notice, and some patterns to sharpen noticing.
- Leadership. For the 2009 QAI Conference in Bangalore, India, I did a plenary talk in which I noted several important structural similarities between exploratory testing and leadership.
Wednesday, December 09, 2009
Best Bug... or Bugs?
And now for the immodest part of the EuroSTAR 2009 Test Lab report: I won the Best Bug award, although it's not clear to me which bug got the nod, since I reported several fairly major problems.
I tested OpenEMR. For me, one candidate for the most serious problem would have been a consistent pattern of inconsistency in input handling and error checking. I observed over a dozen instances of some kind of sloppiness.
This reminded me of a problem that we testers often see in project work, the problem of measuring by counting things—counting bugs, counting bug reports, counting requirements. When the requirement is to defend the application against overflowing text fields and vulnerability to input constraint attacks by hackers, how should we count? How many mentions of that should there be? One, in a statement of general principles at the beginning of a requirements document? Hundreds, in a statement of specific purpose for each input field in a functional specification? How many requirements are there to make sure that fields don't overflow? How many requirements that they support only the characters, numbers, or date ranges that they're supposed to? What about traceability? If this is a genuine problem, and the requirements documents don't mention a particular requirement explicitly, should we refrain from reporting on a problem with that implicit requirement?
When I report an issue—for example, that practically all of the input fields in OpenEMR have some kind of problem with them—should that count as one bug report? Since it applies to hundreds of fields, should it count as hundreds of bug reports? When such a pervasive overall problem exists, should the tester make a report for each and every field in which he observes a problem? And if you want to answer Yes, to that question: is it worth the opportunity cost to do that when there are plenty of other problems in the product?
So again, there were so many instances of unconstrained and unchecked input that I stopped recording specifics and instead reported a general pattern in the bug tracking system. My decision to do this was an instance of the Dead Horse stopping heuristic; reporting yet another instance of the same class of problem would be like flogging a dead horse. I could have wasted a lot of time and energy reporting each instance of each problem I observed, along with specific symptoms and possible ramifications of each one. Yet I'm very skeptical that this would serve the project well. In my experience as a program manager for a product whose code was being developed outside our company, I found that there was steadily diminishing return in value for many reports of the same ilk. When, in testing, we identified a general pattern of failure, we stopped looking for more instances. We sent the product back to the development shop, and required the programmers and their testers to review the product through-and-through for that kind of problem.
If I were to be evaluated on the number of bugs that I found, I'd find it hard to resist the easy pickings of yet another input constraint attack bug report. Yet when I'm testing, every moment of bug investigation and reporting is, by some reckoning, another moment that I can't spend on obtaining more test coverage (more about that here). By focusing on investigating and reporting on input problems (and thereby increasing my bug count), am I missing opportunities to design and perform tests on scheduling conflict-resolution algorithms, workflows, database integrity,...?
There were two other fairly serious problems that I observed. One was that the Chinese version of the product showed a remarkable number of English words, presumably untranslated, interspersed among the ideograms; I expected to see no English at all. I treated that problem in the same way as the input constraint problem: with a single report of a general problem.
The second serious problem was that searches of various kinds would place a link in the address bar. The link represented a command to a CGI script of some kind, which evidently constructed and forwarded a query to an underlying SQL database. Backspacing over the last digit in the address bar and replacing it with a slash caused a lovely SQL error message to appear on the screen, unhandled by any of OpenEMR's code. The message could have been used, said our local product owner, to expose the structure of the database to snoops or hackers. I found that problem by a defocusing heuristic—looking at the browser, rather than the browser window.
I don't know which of these problems took Best Bug honours. I'm not sure that the presenters specified which bug they were crediting with Best Bug. That makes a certain kind of sense, since I can't tell which of these problems is the most serious either. After all, a problem isn't its own thing; it's a relationship between a person and a product or a situation. There are plenty of ways to address a problem. You could fix the product or the situation. You could change the perspective or the perception of the person observing the problem, say by keeping the problem as it is but providing a workaround. You could choose to ignore the problem yourself, which underscores the fact that a problem for some person might not be a problem for you. That's why it's not helpful to count problems.
Managers: do you see how evaluating testers based on test cases or bug counts, rather than the value of reporting, will lead to distortion at best, and more likely to dysfunction? Do you see how providing overstructured test scripts or test cases could reduce the diversity—and therefore the quality—of testing? Do you see how the notion of "one test per requirement" or "one positive and one negative test per requirement" is misleading?
Testers: do you see how being evaluated on bug counts could lead to inattentional blindness with respect to the more serious problems than the low-hanging fruit affords? Do you see how focusing on bugs, rather than focusing on test coverage, could reduce the value of your testing?
Instead of counting things, let's consider evaluating testing work in a different way. Let's consider the overall testing story and its many dimensions. Let's think about the story around each bug, and each bug report—not just the number of reports, but the meaning and significance of each one. Let's look at the value of the information to stakeholders, primarily to programmers and to product owners. Let's think about the extent to which the tester makes things easier for others on the team, including other testers. Let's look at the diversity of problems discovered, the diversity of approaches used, and the diversity of tools and techniques applied. And rather than using this information to reward or punish testers, let's use it to guide coaching, mentoring, and training such that the focus is on developing skill for everyone.
The dimensions above are qualitative, rather than quantitative. Yet if our mission is to provide information to inform decisions about quality, we of all people should recognize that expressing value in terms of numbers often removes important information rather than adding it.
Additional reading:
Measuring and Managing Performance in Organizations (Robert D. Austin)
Software Engineering Metrics: What Do They Measure and How Do We Know? (Kaner and Bond)
Quality Software Management, Vol. 2: First Order Measurement (Weinberg)
Perfect Software (and Other Illusions About Testing) (Weinberg)
I tested OpenEMR. For me, one candidate for the most serious problem would have been a consistent pattern of inconsistency in input handling and error checking. I observed over a dozen instances of some kind of sloppiness.
This reminded me of a problem that we testers often see in project work, the problem of measuring by counting things—counting bugs, counting bug reports, counting requirements. When the requirement is to defend the application against overflowing text fields and vulnerability to input constraint attacks by hackers, how should we count? How many mentions of that should there be? One, in a statement of general principles at the beginning of a requirements document? Hundreds, in a statement of specific purpose for each input field in a functional specification? How many requirements are there to make sure that fields don't overflow? How many requirements that they support only the characters, numbers, or date ranges that they're supposed to? What about traceability? If this is a genuine problem, and the requirements documents don't mention a particular requirement explicitly, should we refrain from reporting on a problem with that implicit requirement?
When I report an issue—for example, that practically all of the input fields in OpenEMR have some kind of problem with them—should that count as one bug report? Since it applies to hundreds of fields, should it count as hundreds of bug reports? When such a pervasive overall problem exists, should the tester make a report for each and every field in which he observes a problem? And if you want to answer Yes, to that question: is it worth the opportunity cost to do that when there are plenty of other problems in the product?
So again, there were so many instances of unconstrained and unchecked input that I stopped recording specifics and instead reported a general pattern in the bug tracking system. My decision to do this was an instance of the Dead Horse stopping heuristic; reporting yet another instance of the same class of problem would be like flogging a dead horse. I could have wasted a lot of time and energy reporting each instance of each problem I observed, along with specific symptoms and possible ramifications of each one. Yet I'm very skeptical that this would serve the project well. In my experience as a program manager for a product whose code was being developed outside our company, I found that there was steadily diminishing return in value for many reports of the same ilk. When, in testing, we identified a general pattern of failure, we stopped looking for more instances. We sent the product back to the development shop, and required the programmers and their testers to review the product through-and-through for that kind of problem.
If I were to be evaluated on the number of bugs that I found, I'd find it hard to resist the easy pickings of yet another input constraint attack bug report. Yet when I'm testing, every moment of bug investigation and reporting is, by some reckoning, another moment that I can't spend on obtaining more test coverage (more about that here). By focusing on investigating and reporting on input problems (and thereby increasing my bug count), am I missing opportunities to design and perform tests on scheduling conflict-resolution algorithms, workflows, database integrity,...?
There were two other fairly serious problems that I observed. One was that the Chinese version of the product showed a remarkable number of English words, presumably untranslated, interspersed among the ideograms; I expected to see no English at all. I treated that problem in the same way as the input constraint problem: with a single report of a general problem.
The second serious problem was that searches of various kinds would place a link in the address bar. The link represented a command to a CGI script of some kind, which evidently constructed and forwarded a query to an underlying SQL database. Backspacing over the last digit in the address bar and replacing it with a slash caused a lovely SQL error message to appear on the screen, unhandled by any of OpenEMR's code. The message could have been used, said our local product owner, to expose the structure of the database to snoops or hackers. I found that problem by a defocusing heuristic—looking at the browser, rather than the browser window.
I don't know which of these problems took Best Bug honours. I'm not sure that the presenters specified which bug they were crediting with Best Bug. That makes a certain kind of sense, since I can't tell which of these problems is the most serious either. After all, a problem isn't its own thing; it's a relationship between a person and a product or a situation. There are plenty of ways to address a problem. You could fix the product or the situation. You could change the perspective or the perception of the person observing the problem, say by keeping the problem as it is but providing a workaround. You could choose to ignore the problem yourself, which underscores the fact that a problem for some person might not be a problem for you. That's why it's not helpful to count problems.
Managers: do you see how evaluating testers based on test cases or bug counts, rather than the value of reporting, will lead to distortion at best, and more likely to dysfunction? Do you see how providing overstructured test scripts or test cases could reduce the diversity—and therefore the quality—of testing? Do you see how the notion of "one test per requirement" or "one positive and one negative test per requirement" is misleading?
Testers: do you see how being evaluated on bug counts could lead to inattentional blindness with respect to the more serious problems than the low-hanging fruit affords? Do you see how focusing on bugs, rather than focusing on test coverage, could reduce the value of your testing?
Instead of counting things, let's consider evaluating testing work in a different way. Let's consider the overall testing story and its many dimensions. Let's think about the story around each bug, and each bug report—not just the number of reports, but the meaning and significance of each one. Let's look at the value of the information to stakeholders, primarily to programmers and to product owners. Let's think about the extent to which the tester makes things easier for others on the team, including other testers. Let's look at the diversity of problems discovered, the diversity of approaches used, and the diversity of tools and techniques applied. And rather than using this information to reward or punish testers, let's use it to guide coaching, mentoring, and training such that the focus is on developing skill for everyone.
The dimensions above are qualitative, rather than quantitative. Yet if our mission is to provide information to inform decisions about quality, we of all people should recognize that expressing value in terms of numbers often removes important information rather than adding it.
Additional reading:
Measuring and Managing Performance in Organizations (Robert D. Austin)
Software Engineering Metrics: What Do They Measure and How Do We Know? (Kaner and Bond)
Quality Software Management, Vol. 2: First Order Measurement (Weinberg)
Perfect Software (and Other Illusions About Testing) (Weinberg)
EuroSTAR's Test Lab: Bravo!
One of the coolest things about EuroSTAR 2009 was the test lab set up by James Lyndsay and Bart Knaack.
James and Bart (who self-identified as Test Lab Rats) provided testers with the opportunity to have a go at two applications, FreeMind (an open-source mind-mapping program) and OpenEMR (an open-source product for tracking medical records). The Lab Rats did a splendid job of setting things up and providing the services and information that participants needed to get up and running quickly.
Sponsorship in the form of five laptop computers was provided through the good graces of Steve Green at Test Partners, Stuart Noakes at Transition Consulting Ltd., and Bart Knaack at Logica. James Lyndsay also lent a server and a router to the event.
Sponsorship was also provided by tool vendors (here in alphabetical order) Andagon, Microsoft, MicroFocus, Neotys, and Testing Technologies. These sponsors had their tools installed on the laptops, and presented their demos by applying them to OpenEMR and FreeMind as they were installed in the Test Lab. On a loose schedule, some of the presenters did talks and demonstrations of how they tested.
The aforementioned Stuart Noakes and Mieke Gievers gave advice and assistance to the Lab Rats.
Well, that's all very nice, but what was it like?
As someone who spent a couple of hours in the lab, exploring the applications and listening in on the presentations, I'd say it was terrific (although the prospect that OpenEMR is being used in actual medical practices seemed faintly alarming). Both applications were sophisticated enough for some reasonably serious testing, and had interesting problems to discover and report.
Interestingly, none of the certificationists or the standardization folks sat in the lab and tested, to my knowledge.
Bravo to James and Bart, to the sponsors, to the conference organizers and to the program committee for putting this together. Let's see more actual testing at testing conferences!
James and Bart (who self-identified as Test Lab Rats) provided testers with the opportunity to have a go at two applications, FreeMind (an open-source mind-mapping program) and OpenEMR (an open-source product for tracking medical records). The Lab Rats did a splendid job of setting things up and providing the services and information that participants needed to get up and running quickly.
Sponsorship in the form of five laptop computers was provided through the good graces of Steve Green at Test Partners, Stuart Noakes at Transition Consulting Ltd., and Bart Knaack at Logica. James Lyndsay also lent a server and a router to the event.
Sponsorship was also provided by tool vendors (here in alphabetical order) Andagon, Microsoft, MicroFocus, Neotys, and Testing Technologies. These sponsors had their tools installed on the laptops, and presented their demos by applying them to OpenEMR and FreeMind as they were installed in the Test Lab. On a loose schedule, some of the presenters did talks and demonstrations of how they tested.
The aforementioned Stuart Noakes and Mieke Gievers gave advice and assistance to the Lab Rats.
Well, that's all very nice, but what was it like?
As someone who spent a couple of hours in the lab, exploring the applications and listening in on the presentations, I'd say it was terrific (although the prospect that OpenEMR is being used in actual medical practices seemed faintly alarming). Both applications were sophisticated enough for some reasonably serious testing, and had interesting problems to discover and report.
Interestingly, none of the certificationists or the standardization folks sat in the lab and tested, to my knowledge.
Bravo to James and Bart, to the sponsors, to the conference organizers and to the program committee for putting this together. Let's see more actual testing at testing conferences!
