Note: this post is long from the perspective of the kitten-like attention spans that modern social media tends to encourage. Fear not. Reading it could help you to recognize how you might save you hours, weeks, months of excess and unnecessary work, especially if you’re working as a tester or manager in a regulated environment.
Testers frequently face problems associated with excessive emphasis on formal, procedurally scripted testing. Politics, bureaucracy, and paperwork combine with fixation on test cases. Project managers and internal auditors mandate test cases structured and written in a certain form “because FDA”. When someone tells you this, it’s a pretty good indication that they haven’t read the FDA’s guidance documentation.
Because here’s what it really says:
For each of the software life cycle activities, there are certain “typical” tasks that support a conclusion that the software is validated. However, the specific tasks to be performed, their order of performance, and the iteration and timing of their performance will be dictated by the specific software life cycle model that is selected and the safety risk associated with the software application. For very low risk applications, certain tasks may not be needed at all. However, the software developer should at least consider each of these tasks and should define and document which tasks are or are not appropriate for their specific application. The following discussion is generic and is not intended to prescribe any particular software life cycle model or any particular order in which tasks are to be performed.
General Principles of Software Validation;
Final Guidance for Industry and FDA Staff, 2002
The General Principles of Software Validation document is to some degree impressive for its time, 2002. It describes some important realities. Software problems are mostly due to design and development, far less to building and reproduction. Even trivial programs are complex. Testing can’t find all the problems in a product. Software doesn’t wear out like physical things do, and so problems often manifest without warning. Little changes can have big, wide-ranging, and unanticipated effects. Using standard and well-tested software components addresses one kind of risk, but integrating those components requires careful attention.
There are lots of problems with General Principles of Software Validation document, too. I’ll address several of these, I hope, in future posts.
Apropos of the present discussion, the document doesn’t describe what a test case is, nor how it should be documented. By my count, the document mentions “test case” or “test cases” 30 times. Here’s one instance:
“Test plans and test cases should be created as early in the software development process as feasible.”
Here are two more:
“A software product should be challenged with test cases based on its internal structure and with test cases based on its external specification.”
If you choose to interpret “test case” as an artifact, and consider that challenge sufficient, this would be pretty terrible advice. It would be analogous to saying that children should be fed with recipes, or that buildings should be constructed with blueprints. A shallow reading could suggest that the artifact and the performance guided by that artifact are the same thing; that you prepare the recipe before you find out what the kids can and can’t eat, and what’s in the fridge; that you evaluate the building by comparing it to the blueprints and then you’re done.
On the other hand, if you substitute “test cases” with “tests” or “testing”, it’s pretty great advice. It’s a really good idea to challenge a software product with tests, with testing, based on internal and external perspectives.
The FDA does not define “test case” in the guidance documentation. A definition does appear in Glossary of Computer System Software Development Terminology (8/95).
test case. (IEEE) Documentation specifying inputs, predicted results, and a set of execution conditions for a test item. Syn: test case specification. See: test procedure
Okay, let’s see “test procedure”:
test procedure (NIST) A formal document developed from a test plan that presents detailed instructions for the setup, operation, and evaluation of the results for each defined test. See: test case.
So it is pretty terrible advice after all.
(Does that “8/95” refer to August 1995? Yes, it does. None of the source documents for the Glossary of Computer System Software Development Terminology (8/95) is dated after 1994. For some perspective, that’s before Windows 95; before Google; before smartphones and tablets; before the Manifesto for Agile Software Development; before the principles of context-driven testing…)
But happily, in Section 2 of General Principles of Software Validation, before any of the guidance on testing itself, is the Principle of the Least Burdensome Approach:
We believe we should consider the least burdensome approach in all areas of medical device regulation. This guidance reflects our careful review of the relevant scientific and legal requirements and what we believe is the least burdensome way for you to comply with those requirements. However, if you believe that an alternative approach would be less burdensome, please contact us so we can consider your point of view.
The “careful review” happened in the period leading up to 2002, which is the publication date of this guidance document. In testing community of those days, anything other than ponderously scripted procedural test cases were viewed with great suspicion in writing and conference talks. Thanks to work led by Cem Kaner, James Bach, and other prominent voices in the testing community, the world is now a safer place for exploration in testing. And, as noted in the previous post in this series, the FDA itself has acknowledged the significance and importance of exploratory work.
Test documentation may take many forms more efficient and effective than formally scripted procedures, and the Least Burdensome Approach appears to allow a lot of leeway as long as evidence is sufficient and the actual regulations are followed. (For those playing along at home, the regulations include Title 21 Code of Federal Regulations (CFR) Part 11.10 and 820, and 61 Federal Register (FR) 52602.)
Several years ago, James Bach began some consulting work with a company that made medical devices. They had hired him to analyze, report on, and contribute to the testing work being done for a particular Class III device. (I have also done some work for this company.)
The device consisted of a Control Box, operated by a technician. The Control Box was connected to a Zapper Box that delivered Healing Energy to the patient’s body. (We’ve modified some of the specific words and language here to protect confidentiality and to summarize what the devices do.) Insufficient Healing Energy is just Energy. Too much Healing Energy, or the right amount for too long, turns into Hurting Energy or Killing Energy.
When James arrived, he examined the documentation being given to testers. He found more than a hundred pages of stuff like this:
9.8.1 To verify Power Accuracy
9.8.1.1 Connect the components according to the General Setup document. 9.8.1.2 Power on and connect Power Monitor (instead of electrodes). 9.8.1.3 Power on the Zapper Box. 9.8.1.4 Power on the Control Box. 9.8.1.5 Set default settings of temperature and power for zapping. 9.8.1.6 Set test jig load to nominal value. 9.8.1.7 Select nominal duration and nominal power setting. 9.8.1.8 Press the Start button. 9.8.1.9 Verify Zapper reports the power setting value ±10% on display.
Is this good formal testing?
It’s certainly a formal procedure to follow, but where’s the testing part? The closest thing is that little molecule of actual testing in the last line: the tester is instructed to apply an oracle by comparing the power setting on the Control Box with what the Zapper reports on its display. There’s nothing to suggest examining the actual power being delivered by noting the results from the Power Monitor. There’s nothing about inducing variation to obtain and extend coverage, either.
At one point, James and another tester defrosted this procedure. They tried turning on the Control Box first, and then waited for a variety of intervals to turn on the Zapper Box. To their amazement, the Zapper Box could end up in one of four different states, depending on how long they waited to start it—and at least a couple of those states were potentially dangerous to the patient or to the operator.
James replaced 50 pages of this kind of stuff with two paragraphs containing things that had not been covered previously. He started by describing the test protocol:
3.1 General testing protocol
In the test descriptions that follow, the word “verify” is used to highlight specific items that must be checked. In addition to those items a tester shall, at all times, be alert for any unexplained or erroneous behavior of the product. The tester shall bear in mind that, regardless of any specific requirements for any specific test, there is the overarching general requirement that the product shall not pose an unacceptable risk of harm to the patient, including any unacceptable risks due to reasonably foreseeable misuse.
Read that paragraph carefully, sentence by sentence, phrase by phrase. Notice the emphasis on looking for problems and risks—especially on the risk of human error.
Then he described the qualifications necessary for testers to work on this product:
3.2 Test personnel requirements
The tester shall be thoroughly familiar with the Zapper Box and Control Box Functional Requirements Specification, as well as with the working principles of the devices themselves. The tester shall also know the working principles of the Power Monitor Box test tool and associated software, including how to configure and calibrate it, and how to recognize if it is not working correctly. The tester shall have sufficient skill in data analysis and measurement theory to make sense of statistical test results. The tester shall be sufficiently familiar with test design to complement this protocol with exploratory testing, in the event that anomalies appear that require investigation. The tester shall know how to keep test records to credible, professional standard.
In summary: Be a scientist. Know the domain, know the tools, be an analyst, be an investigator, keep good lab notes.
Then James provided some concise test ideas, leaving plenty of room for variation designed to shake out bugs. Here’s an example like something from the real thing:
3.2.2 Fields and Screens
3.2.2.1 With the Power Monitor test tool already running, start the Zapper Box and the Control Box. Vary the order and timing in which you start them, retain the Control Box and Power Monitor log files, and note any inconsistent or unexpected behaviour. 3.2.2.2 Visually inspect the displays and verify conformance to the requirements and for the presence of any behaviour or attribute that could impair the performance or safety of the product in any material way. 3.2.2.3 With the system settings at default values change the contents of every user-editable field through the range of all possible values for that field. (e.g. Use the knob to change the session duration from 1 to 300 seconds.) Visually verify that appropriate values appear and that everything that happens on the screen appears normal and acceptable. 3.2.2.4 Repeat 3.2.2.3 with system settings changed to their most extreme possible values. 3.2.2.5 Select at least one field and use the on-screen keyboard, knob, and external keyboard respectively to edit that field. 3.2.2.6 Scan the Control Box and Power Monitor log files for any recorded error conditions or anomalies.
To examine certain aspects of the product and its behaviour, sometimes very specific test design matters. Here’s a representative snippet based on James’ test documentation:
3.5.2 Single Treatment Session Power Accuracy Measurement
3.5.2.3 From the Power Monitor log file, extract the data for the measured electrode. This sample should comprise the entire power session, including cooldown, as well as the stable power period with at least 50 measurements (i.e., taken at least five times per second over 10 seconds of stable period data). 3.5.2.4 From the Control Box log file, extract the corresponding data for the stable power period of the measured electrode. 3.5.2.5 Calculate the deviation by subtracting the reported power for the measured electrode from the corresponding Power Monitor reading (use interpolation to synchronize the time stamp of the power meter and generation logs). 3.5.2.6 Calculate the mean of the power sample X (bar) and its standard deviation (s). 3.5.2.7 Find the 99% confidence and 99% two-sided tolerance interval k for the sample. (Use Table 5 of SOP-QAD-10, or use the equation below for large samples.) 3.5.2.8 The equation for calculating the tolerance interval k is: where χ2γ,N-1 is the critical value of the chi-square distribution with degrees of freedom N -1 that is exceeded with probability γ; and Z2(1-p)/2 is the critical value of the normal distribution which is exceeded with probability (1-p)/2. (See NIST Engineering Statistics Handbook.)
Now, that’s some real formal testing. And it was accepted just fine by the organization and the FDA auditors. Better yet, following this protocol revealed some surprising behaviours and outright bugs that prompted more careful evaluation of the requirements for the product.
What are some lessons we could learn from this? One key point, it seems to me, is that when you’re working as a tester in a regulated environment, it’s crucial that you read the regulations and the guidance documentation. If you don’t, you run the risk of being pushed around by people who haven’t read them, and who are working on the basis of mythology and folklore.
Our over-arching mission as testers is to seek and find problems that threaten the value of the product. In contexts where human life, health, or safety are on the line, the primary job at hand is to learn about the product and problems that post risks and hazards to people. Excessive bureaucracy and paperwork can distract us from that mission; even displace it. Therefore, we must find ways to do the best testing possible, while still providing the best and least evidence that still completely satisfies auditors and regulators that we’ve done it.
Back in our coaching session, Frieda, acting the part of the manager, replied, “But… we don’t have the time to train testers to do that kind of stuff. We need them to be up to speed ASAP.”
“What does ‘up to speed’ actually mean?” I asked.
Frieda, still in character, replied “We want them to be banging on keys as quickly as possible.”
Uh huh. Imagine a development manager responsible for a medical device saying, “We don’t have time for the developers to learn what they’re developing. We want them up to speed as quickly as possible. (And, as we all know, programming is really just banging on keys.)”
The error in this line of thinking is that testing is about pushing buttons; producing widgets on a production line; flipping testburgers. If you treat testing as flipping testburgers, then there’s a risk that testers will flip whatever vaguely burger-shaped thing comes their way… burgers, frisbees, cow pies, hockey pucks… You may not get the burger you want.
If you think of testing as an investigation of the product, testers must be investigators, and skillful ones at that. Upon engaging with the product and the project, testers set to learning about the product they’re investigating and the domain in which it operates. Testers keep excellent lab notes and document their work carefully, but not to the degree that documentation displaces the goal of testing the system and finding problems in it. Testers are focused on risk, and trained to be aware of problems that they might encounter as they’re testing (per CFR Title 21 Part 820.25 (b)(2)) .
If they’re not sufficiently skilled when you hire them, you’ll supervise and train them until they are. And if they’re unskilled and can’t be trained… are you really sure you want them testing a device that could deliver Killing Energy?
How else might you guide testing work, whether in projects in regulated contexts or not? That’s a topic for next time.
[…] Breaking the Test Case Addiction (Part 4) Written by: Michael Bolton […]
[…] Breaking the Test Case Addiction (Part 4) – Michael Bolton – http://www.developsense.com/blog/2019/01/breaking-the-test-case-addiction-part-4/ […]
Another excellent and thoughtful post. It really is akin to giving a man a fish or teaching him how to catch one of his own that people be lured and baited into independent thought and analysis. If all you want out of the world is mindless automatons that blindly push buttons, then that’s all you’re ever going to get. But to empower people with just taking a step back and asking the why and not the “where does this go?” you can see the bigger picture and not trapped in the minutiae.
Michael replies: Thanks for the comment, Jeremy.
This is a great instalment of that series. I am personally involved in the testing in ‘regulated’ environment. I can confirm all these ‘myths’.In our case, I’ve heard dozen of times sth like “our factory will be closed because of lack fully detailed test steps’
I like the idea of simplified test ideas, it gives a kind of balance between the freedom /space for investigation and tangible verification point for auditors.
From my experience what is also crucial that is the recorded results after testing. In my company, I failed at ‘braking test case addiction’ mainly because of not sufficient test results.
@Michael do you have any tips or example how such test results could look like so it is accepted by auditors in case of inspection/audit?
Michael replies: That’s still coming! Sorry about the delay.