DevelopsenseLogo

Gaming the Tests

Let’s imagine, for a second, that you had a political problem at work. Your CEO has promised his wife that their feckless son Ambrose, having flunked his university entrance exams, will be given a job at your firm this fall. Company policy is strict: in order to prevent charges of nepotism, anyone holding a job must be qualified for it. You know, from having met him at last year’s Christmas party, that Ambrose is (how to put this gently?) a couple of tomatoes short of a thick sauce. Yet the policy is explicit: every candidate must not only pass a multiple choice test, but must get every answer right. The standard number of correct answers required is (let’s say) 40.

So, the boss has a dilemma. He’s not completely out to lunch. He knows that Ambrose is (how can I say this?) not the sharpest razor in the barbershop. Yet the boss adamantly wants his son to get a job with the firm. At the same time, the boss doesn’t want to be seen to be violating his own policy. So he leaves it to you to solve the problem. And if you solve the problem, the boss lets you know subtly that you’ll get a handsome bonus. Equally subtly, he lets you know that if Ambrose doesn’t pass, your career path will be limited.

You ponder for a while, and you realize that, although you have to give Ambrose an exam, you have the authority to set the content and conditions of the exam. This gives you some possibilities.

A. You could give a multiple choice test in which all the answers were right. That way, anyone completing the test would get a perfect score.

B. You could give a multiple choice test for which the answers were easy to guess, but irrelvant to the work Ambrose would be asked to do. For example, you could include questions like, “What is the very bright object in the sky that rises in the morning and sets in the evening?” and provide “The Sun” as choice of answer, and the names of hockey players for the other choices.

C. You could find out what questions Ambrose might be most likely to answer correctly in the domain of interest, and then craft an exam based on that.

D. You could give a multiple choice test in which, for every question, one of A, B, or C was the correct answer, and answer D was always “One of the above.”

E. You might give a reasonably difficult multiple-choice exam, but when Ambrose got an answer wrong, you could decide that there’s another way to interpret the answer, and quietly mark it right.

F. You might give Ambrose a very long set of multiple-choice questions (say 400 of them), and then, of his answers, pick 40 correct ones. You then present those questions and answers as the completed exam.

G. You could give Ambrose a set of questions, but give him as much time as he wanted to provide an answer. In addition, you don’t watch him carefully (although not watching carefully is a strategy that nicely supports most of these options).

H. You could ask Ambrose one multiple choice question. If he got it wrong, correct him until he gets it right. Then you could develop another question, ask that, and if he gets it wrong, correct him until he gets it right. Then continue in a loop until you get to 40 questions.

I. This approach is like H, but instead you could give a multiple choice test for which you had chosen an entire set of 40 questions in advance. If Ambrose didn’t get them all right, you could correct him, and then give him the same set of questions again. And again. And over and over again, until he finally gets them all right. You don’t have to publicize the failed attempts; only the final, successful one. That might take some time and effort, and Ambrose wouldn’t really be any more capable of anything except answering these specific questions. But, like all the other approaches above, you could effect a perfect score for Ambrose.

When the boss is clamoring for a certain result, you feel under pressure and you’re vulnerable. You wouldn’t advise anyone to do any of the things above, and you wouldn’t do them yourself. Or at least, you wouldn’t do them consciously. You might even do them with the best of intentions.

There’s an obvious parallel here—or maybe not. You may be thinking of the exam in terms of a certain kind of certification scheme that uses only multiple-choice questions, the boss as the hiring manager for a test group, and Ambrose as a hapless tester that everyone wants to put into a job for different reasons, even though no one is particularly thrilled about the idea. Some critical outsider might come along and tell you point-blank that your exam wasn’t going to evaluate Ambrose accurately. Even a sympathetic observer might offer criticism. If that were to happen, you’d want to keep the information under your hat—and quite frankly, the other interested parties would probably be complacent too. Dealing with the critique openly would disturb the idea that everyone can save face by saying that Ambrose passed a test.

Yet that’s not what I had in mind—not specifically, at least. I wanted to point out some examples of bad or misleading testing, which you can find in all kinds of contexts if you put your mind to it. Imagine that the exam is a set of tests—checks, really. The boss is a product owner who wants to get the product released. The boss’ wife is a product marketing manager. Hapless Ambrose is a program—not a very good program to be sure, but one that everyone wants to release for different reasons, even though no one is particularly thrilled by the idea. You, whether a programmer or a tester or a test manager, are responsible for “testing”, but you’re really setting up a set of checks. And you’re under a lot of pressure. How might your judgement—consciously or subconsciously—be compromised? Would your good intentions bend and stretch as you tried to please your stakeholders and preserve your integrity? Would you admit to the boss that your testing was suspect? If you were under enough pressure, would you even notice that your testing was suspect?

So this story is actually about any circumstance in which someone might set up a set of checks that provide some illusion of success. Can you think of any more ways that you might game the tests… or worse, fool yourself?

22 replies to “Gaming the Tests”

  1. Poor poor ambrose. He’s rally caught between a rock and a hard place this time. If he fails, daddy will disown him; if he passes, he will either
    a) make a fool of himself later on
    b) perform as expected, making daddy proud but becoming more and more unhappy in the process. Is thtis it? He never expected all this rote check… ehm factory work in his first paid job.

    With regards to the multiple choice test, you could also organize one-on-one coaching/training sessions, in which Ambrose is thoroughly prepared for the actual exam:
    – Give him the most important definitions and stress that these are really important – high probability of being asked on the exam
    – Give him some exercices, let him practice till he gets it right, and make sure the exercises asked on the exam are surprisingly similar.
    – Provide a lot of “examples” of exam questions. Tell him these are samples from the previous exams. Make sure that these examples constitute a big part of the exam.

    Et voilà. Watch that probability of passing skyrocket.

    Michael replies: Yes, you could do that. And, as you point out, you’d eventually run afoul of the difference between a certified applicant (or application, I emphasize) and a qualified one. One presents a stamp of approval; the other would present a more circumspect and more nuanced picture.

    Reply
  2. – You might fool yourself into thinking that working at a company with this boss is worthwhile enduring for much longer.
    – You might fool yourself that you don’t have a choice and, since you’re the bread earner in the family you need to play the game.
    – You might fool yourself in thinking that you’re only making a compromise and not sending your integrity to Nirvana never to be seen again.

    Some other ideas about gaming – you could release the test sheet to “the son” before he takes the test (including answers). Or, to make it seem more robust to criticism, send him 3 different ones with slightly different questions. Memorising 120 questions and answers surely shows knowledge in the subject area?
    Another alternative is, similar to driving lessons, is to let him sit the test until he get’s all 40 right, a variation of H which takes a bit more time (but as the driving tests show is popular in practice). Some countries have what’s lovingly called an idiots test to be done if you fail x number of times, that might not such a good idea in this context.

    Integrity is the key word here in my opinion. It takes years for people to respect you and can be gone in a flash.
    I was once asked to change a test report which was too honest to be sent to a customer. I declined on grounds of integrity. The report was sent out changed but not carrying my name.
    People will do what they believe is the right thing to do at the time. I can choose to support others in theirs or decline. The choice is always there.

    Reply
  3. This is nice, and I saw this happening very often already. The problem of course is the pressure, and the professional attitude that we as professionals have to maintain. We need to be able to say the truth when we see it, provide this information reasonably well, but still leave room for us to say “no” to temptations of pleasing the big boss. This might be hard, but on the other hand, what might change, if we found ourselves on a new job one year from now? Who knows….

    Reply
  4. I would have resigned. No ifs ands or butts about it. I work morally, or not at all.

    In the case of Ambrose being a program, I would open communication with all parties involved and have everyone decide what the critical areas are that they must release with. I would test around that, keeping communication open and give reslts daily or hourly (if necessary). Policy or not, they are the ones to make the final decisions – I merely provide truthful information. If they do not like the information I provide, and I see I’m slamming my head against the wall with them, I refuse to play numbers games – and I will be out of there. They can find someone else to do that.

    By the sounds of it, the father works at a larger corporate company that is slow to change and thrives on policy instead of what’s best for the group, product, and customers. I personally wouldn’t choose to work there.

    Reply
  5. Also to consider:
    What if candidate is so dumb, that he messes up the test in any combination OR he is malevolent, just to put you in trouble.

    The “best” option (if going that route; moral laid aside) is to just make a fake exam result, put his name on it and present that.
    Still uncertainty is there, if the candidate plays along and keeps his trap shut.

    Regards,
    MaikNog

    Reply
  6. If we testers fail to provide the information we’re supposed to deliver (whether the boss likes it or not) we failed doing the job: To analyze/test/check software (or what it is you’re testing) *and* provide information about the usability for the intended purpose in a production environment.
    The ‘and’ is very important in my opinion: Why bother doing all the work, if you’re not going to learn something from it — or at least pass the information to decision makers?

    Pleasing bosses may be tempting at times, but we testers might be the *only* ones in a company who know about the real state of affairs.

    As Vesna indicated before: Not a place where you love to work anyway…

    Reply
  7. “Here are the results of our checks. And here are my suggestions for what they might tell us about which job duties Ambrose may be able to perform successfully. I don’t create job vacancies, so I can’t make the decision on whether there is a current or possible future role here that is a good match for his abilities, but this is the information I’ve been able to discover about what he might be able to do for us.”

    Michael replies: That’s the kind of answer I’d give, along with “Over and above the checks, here is some information that I’ve discovered about Ambrose, are here some other ideas that you might like to consider in your evaluation.”

    This is really just another variant of answer A, where no answers are wrong, they just give you a different piece of information – but one which puts the final decision back in the hands of the product owner. This is what the system under test is capable of. Are you happy with that? Does that fit your needs? Can you make use of it?

    Yes. The central question in testing is often touted as “pass or fail?”, when the really important questions to me are “Is this (dear Programmer, dear Product Owner) the product that you you thought you had?” and “Is this the product that you want?”

    Now, I fully expect that a) it would be rather difficult to create a multiple choice test of 40 questions that would really provide much useful detail on what Ambrose might be capable of, and b) that Ambrose’s father is likely to put the pressure on for some sort of decision anyway. So, this is the sort of professional situation where I would be thinking about getting my CV up to date.

    If I’m working at the company, then does that imply I passed the multiple choice test myself in order to work there? Or is it possible that it was brought in after I started? Nowadays I think that application process would sift *me* out at a relatively early stage.

    That’s a serious concern for me, because I know you, Anna. You’re not like Ambrose. It seems preposterous to me that a company would refrain from hiring you on the basis of the fact that you hadn’t passed the Ambrose Test. Similarly, it seems preposterous that a development group would decide to stop work on a product and throw it out simply because it’s not passing a set of unit checks or acceptance checks, when there’s ample evidence available to show that it’s an otherwise robust product. Surely one important thing to do in both cases would be to question the checks, wouldn’t it?

    However, I have worked there in the past, or at least on similar projects. I find it very hard. I don’t find it rewarding to be in an environment where the appearance of success is more important than providing a good product, or a good service, to our customers.

    Yes. That’s something that testers everywhere (including those who claim to “qualify” products or testers) should consider, in my view.

    And where honesty is considered problematic, then that also makes it hard for me to do good testing anyway – one valued defence against fooling myself is to be open about what I’m doing, why I’m doing it, and that I’m keen to hear feedback on how that could work better.

    My usefulness is ultimately constrained by the system I’m working in – if everything about that system is set up so that it prevents me from doing useful work, _and_ _nobody_ _cares_ about changing it, then I need to find a better place to work if I want to be useful.

    Yes—or empower yourself and form alliances to allow you to make a better place to work.

    Thanks for the comment, Anna.

    Reply
  8. Interesting post and so the responses. Ethically it would have been wrong for me to continue working in this company. Let’s say quitting is not an option here. So what else can be done?
    I would have talked to Ambrose’s father, would have tried to showcase the REAL picture to him based on FACTS and would have left it to him (CEO) to decide further. If he is ready to take the RISK of having his UNSKILLED son (read unfit for the post) in the company, it would have been his decision.
    Now, similar thought process goes for a faulty product as well. As a tester my job is to help people (stakeholders) make an Informed Decision and I don’t have any control on their decision making process apart from providing inputs.
    I’ll try my best to provide facts based on my investigation but stakeholders will have a final say.
    Thanks,
    Rahul Gupta
    http://BugMagnate.blogspot.com

    Reply
  9. Publish the test(s) in advance and let everybody, Ambrose included, prepare for the test. If the questions are difficult enough (3 years work experience difficult) then crap like Ambrose will not pass.

    Trust your tests and share them, and if Ambrose or anybody else fails – you did your best, but unfortunately their best was not good enough.

    P.S. If your tests are evil enough Ambrose will prefer a career in the fast food industry and won’t even bother showing up to take the test.

    Michael replies: Hi, Rasmus. Thanks for the reply.

    I have a few concerns. One is that if you publish the tests in the form of binary or multiple choice questions and answers, they turn into checks, rather than tests. A computer program, or something else non-sapient, can pass them, irrespective of the amount of experience.

    More significantly, I don’t see anything in your answer that speaks explicitly to the political problem of having to deal with pressure from the other agencies (boss, marketer, etc.) and the ensuing cognitive problem (what steps can you take to prevent the possibility that you will fool others, or even fool yourself?)—unless your suggestion is to maintain your standards and deal with whatever the consequences are. In which case, I agree. 🙂

    Reply
  10. Excellent post, Michael. We always say that testers should be the headlights on projects we are working on. A major part of this is standing our ground in the face of adversity. To me we should be the headlights on any process we are part of and professional ethics forms a major part of that.

    I therefore concur with Rahul in that I would have to highlight the risks the company was taking by employing unskilled labour regardless of family connections. Ultimately the decision rests with the CEO but I could not rest easy unless I had spoken my mind.

    These are the same principles that I apply to my daily testing.

    Stephen

    Reply
  11. Interesting post!

    I recognise both situations – the exam and the work situation. I’ve been put under pressure before to make a call that isn’t mine to make, namely whether the product is ready for release.

    I can imagine it won’t be the last time I’m asked for such an opinion /ever/. In the past situations (as well as any future situations) I’ll describe the activity I’ve performed, what the conditions, constraints, data config and user profiles used and give a summary based on that – including areas not covered, areas for further investigation and potential improvement areas.

    If the product manager uses only my input into the ship/no-ship decision then he’s (how can I say this?) missing a couple of books from his bookshelf.

    How not to game the test: Tell the story, the whole story (including conditions & constraints), and nothing but the story!

    Reply
  12. Hi Michael,

    I thought this was a brilliant post. I especially like when you flipped it from Ambrose as a person to Ambrose as a program…

    Michael replies: Thanks, Michele.

    “Can you think of any more ways that you might game the tests… or worse, fool yourself?”

    One thing I have seen that I would consider “gaming the tests” is when test coverage becomes about the number of tests run and not the quality of tests run.

    Yes; apparently very frequent theme.

    I worked for one lead that had test cases that asked me to verify the spelling of things in the application. Evidently at one point development must have spelled the words wrong and perhaps she became paranoid that they might change the spelling again at some point, just to see if she was still “checking” it. And, yes, I am serious…

    Just to be clear, there’s no harm in checking for spelling problems; that’s a good thing to do. Things start to go pear-shaped, though, when a) the tests are counted, and b) the counts are used as an indication of coverage or test quality. Those problems are rife in our little craft, so it seems. Thanks for the example.

    Reply
  13. As I am in charge of the exam I do miss option K:
    – I ask the 40 questions I *want* to ask and see what happens.
    Maybe my career gets crippled, maybe I get fired, maybe there will be another response I didn’t anticipate.
    I might choose one of the other option in the end… but will I ever be taken seriously after that?

    Nice post 😉

    Reply
  14. Nice post michael
    My understanding goes like this…In this story,’the program’ sorry Ambrose…is supposed to he hired irrespective of what skills he has…and the person who is preparing exam for him has constraints {ambrose must not fail} –
    Expected result must be Ambrose should pass and
    Actual result is Ambrose should pass

    To match the Expected and Actual, you need to design tests(not tests but variety of tests) that satisfy this one and ONLY criteria so that report looks as if ambrose has been tested throughly.

    If we do this consciously – it fake testing or fooling
    If we do it unconsciously – it is bad testing

    conscious tester has skills which he is using to fake
    unconscious tester lack skills which is causing bad testing

    we can find both type of testers in companies, it is very hard to identify, especially in projects that have 100’s and 1000’s of them.

    when the product fails, it would be hard to identify if it was due to fake testing or bad testing

    I think Importance of not compromising under pressure should be part of testers training skills, for some it’s natural but for few due to lack of skills, exp and hierarchy pressure its hard.

    Reply
  15. I think the best way to avoid that trap would be to ignore the pressure to pass or fail, and simply run the best tests you can. After all, it’s not our position to state whether or not a piece of software SHOULD be released, but rather to report on the state of it as we see it at a particular moment, right?

    Reply
  16. To put it in the context of the hiring scenario, Ambrose’s test should be appropriate for the position. If the results come back poor, that should be reported to the people who make the business decision on whether the test results should be ignored (the software gets released) or heeded (the software’s release gets re-evaluated).

    Reply
  17. Ok, one more idea, and this applies to both the software and person Ambrose. What’s to say that while they fail in one area, they aren’t extremely well-suited to some other use? Human Ambrose might be exceptionally gifted in the arts and be hired in to Marketing. Software Ambrose might fail miserably at one function, but excel in another.

    Reply
  18. Great post and interesting metaphor. I think this is a good catalyst for thought and discussion.

    I think this can illustrate both why testers can and can not make release decisions.

    At the Christmas party in this thought experiment, our first interactions with Ambrose – we tested Ambrose. This built up some good evidence that Ambrose should not ship. If you decided to question him with integrity, instead of faking, and also gained more evidence that he should not ship, then we can ask:

    Why can a tester make a decision to ship or hire Ambrose?
    – The evidence gathered at the Christmas party and the interview would lead me to believe that he should not be hired.

    Why can a tester not make a decision to ship or hire Ambrose?
    – Maybe the stakeholders are aware of Ambrose’s limitations, but he may still have value in some other area.

    I think there is value in a tester’s opinion of a software, and if it is fit for release. The ultimate decision really rests in the CEO’s hands anyway whether you are used by him to make the decision or he has integrity and takes your information and makes the decision.

    I guess it is important to understand the mission. The CEO wanted you to provide fake evidence that Ambrose would be fit for a task that he was not fit for, not determine if Ambrose was fit for a particular task.

    I would stop right at the beginning, I can do the following for you:

    Test Ambrose to see if he is fit for task A
    – after doing this I think I would be comfortable saying – I think he is fit for task A.

    Or

    Test Ambrose to see what he may be fit for
    – after doing this I would be comfortable saying I think he provides value here instead of A.

    What I would really like to understand is why is there such strong polarity favoring testers not making release decisions?

    Michael replies: Do you believe that

    • camera operators should decide whether to release a motion picture?
    • investigative reporters should decide whether to jail someone that they’ve profiled?
    • waiters should decide what you should have for dinner?
    • financial advisors should decide what furniture you should buy?
    • lab technicians should decide whether you have surgery?
    • air traffic controllers should have remote control over the throttle and the rudder of airplanes?
    • network adminstrators should decide on corporate sales strategies?

    Isn’t our view of the system valuable – I think it is? I kind of understand the polarity, but I would like to understand with more clarity. Please let me know your thoughts.

    Do you understand the difference between providing insight and making a decision? Therein lies the answer to your question.

    Reply
  19. Thanks, this really helps clarify to my grownup side, but try explaining the difference between providing insight and making a decision to Ralph… (google video “from a to zzzz”).

    I also think there is a difference between providing insight with caution, and providing insight with honesty.

    Providing insight does have power and a real effect on the decision making process. When you offer your informed insight you are putting your blood in the cup.

    I get a sense testers really fear when stakeholders ask for a decision to ship – maybe stakeholders are just asking for honest insight and not a decision. They would probably be quite irresponsible if they were just going to fall into the arms of the testers opinion. I haven’t picked up irresponsibility when asked this, just that someone trusts me and wants my honest opinion – because they know I care and dig and question and on some level understand the system.

    I think when you give insight with fear and caution your integrity is questioned. Wouldn’t you question the integrity of a waiter if you asked for his recommendation and he gave it with caution for fear you wouldn’t be happy. The waiter can really say I would recommend A because, or I would not recommend A because.

    As testers does having our blood in the cup help our integrity or harm it?

    Reply
  20. I found a principle this morning that I think very much applies to this situation, so much so that I wrote a whole post about it on my blog. Bagdikian’s Observation states:

    Trying to be a first-rate reporter on the average American newspaper is like trying to play Bach’s ‘St. Matthew’s Passion’ on a ukulele.

    I like what a lot of the comments have said about integrity. If we want to be great, first-rate testers, we have to have integrity and uncover the truth (be it about software or new hires). If we are stuck in a company with Ambrose’s dad that is acting like a ukulele (which actually means ‘jumping flea’ if anyone cares) then we need to either work with him to identify what he really expects from us, or move on to another company where we can act with integrity.

    In my situation, I have found that open communication with management about what testers can do is the best approach. Most people are just trying to do the best they can to reach their goals, and if we sit and talk about what the real goals are we can all reach them without compromising our integrity. A little education and clarification can go a long way.

    Reply

Leave a Reply to Michele Smith Cancel reply