The risk benefit misdirection

Peter Selvey, October 2018

While auditing a hearing aid company several years ago, I started to get a little concerned about device with a peak output of 140dB: surely that sound level was moving into a seriously dangerous region? As a first defence, the manufacturer simply said “that’s what the doctors are asking for”, itself an interesting side question of who should should take responsibility in such cases. But the case was also an interesting study of the whole concept of risk and benefit: the manufacturer eventually provided a clinical study showing around 6% of users suffered hearing loss even from normal 120dB type hearing aids – in other words, the medical device that was meant to help the patient was actually causing a significant amount of injury.

Mounting more defence, the manufacturer quickly turned to other literature that showed that without the hearing aid, the brain allocates the unused neural networks to other tasks – making the patient effectively deaf. So those 6% of patients would end up deaf anyway, while the other 94% were happy. The benefits far outweighed the risks, making the overall situation acceptable.

Highlighting benefit to justify risks is an obvious defence for those that provide devices, services, goods or facilities. Google the phrase “balancing risk and benefit” and you will receive “about 170,000,000” hits (a slightly suspicious figure, given that’s one article for every 45 people on the planet, but nevertheless indicates it is a rather popular phrase).

An obvious implication of using benefit to justify risk is that the higher the benefit, the higher the risk that we can tolerate.

Medical devices provide significant benefit. It follows then that, for example, we can accept significantly higher risks of for a dialysis machine than, say, a toaster.

Another obvious conclusion: since medical devices vary greatly with the benefits they provide, it follows that the criteria for acceptable risk should also vary greatly with each type of medical device. The risk we can accept from a tongue depressor is vastly different from the risk we can accept from a life saving cancer killing gamma-ray therapy machine.

Or not.

The logic in paragraphs 3, 4, 5 above is at best misleading, and logic in 2 and 6 is utterly wrong. Incorrect. Mistaken. Erroneous. Not stupid, mind you … because we have all been seduced at some stage by the argument of balancing risk and benefit. But definitely barking up the wrong tree.

Some 20 years ago, while preparing material for internal risk management training, I decided to analyse my wallet. My target was to find a mathematical model using risk and benefit that could derive the optimal amount of cash to withdraw from the ATM. Withdraw too much (say, $1,000), and the risks of a lost or stolen wallet would be excessive. Withdraw too little (say $10) and the lack of benefits of having cash to pay for purchases would become apparent. It was a good case study since it was easy to quantify risk in dollars, meaning it should be possible to build a mathematical model and find a specific value. It was just a matter of figuring out the underlying math.

 It turned out to be a seminal exercise in understanding risk. Among several light bulb moments was the realisation that the optimal point had nothing to do with the benefit. The mantra of “balancing risk and benefit” (or mathematically “Risk = Benefit”) was so far embedded in my skull that it took some effort to prise away. Eventually I realised the target is to find the point of “maximum net benefit”, a value that took into account three parameters: benefit, risks and costs. The benefit, though, was constant in the region of interest, so ultimately it did not play any role in the decision. That meant the optimal value was simply a matter of minimising risks and costs.

From there the equations tumbled out like a dam released: risks (lost or stolen wallet) and costs (withdrawal fees, my time) could both be modelled as a function of the cash withdrawn. Solve the equations to find a point at which risk and costs are minimised (mathematically, the point where the derivative of the sum is zero; surprisingly needing quadratic equations!). I don’t remember the exact result, it was something like $350. But I do remember the optimal point had nothing to do with benefit.

Mathematics aside, it’s fairly easy to generate case studies showing that using benefit to justify risk doesn’t make sense. It’s not rocket science. Consider for example the hearing aid case: yes, benefits far exceed the risk. But what if there was a simple, low cost software solution that reduces the injury rate from 6% to 1%? Say a software feature that monitors the accumulated daily exposure and reduces the output accordingly, or minimises hearing aid feedback events (that noisy ringing, which if pumped out at 140dB would surely be destructive). If such software existed, and only adds $0.05 to the cost of each unit, obviously it makes sense to use it. But how do you arrive at that decision using risk and benefit?

You can’t. There is no logical reason to accept risk because of the benefit a device provides, whether a toaster, hearing aid, tongue depressor or dialysis machine. Instead, the analysis should focus on whether further risk reduction is practical. If risk reduction is not practical, we can use risk/benefit to justify going ahead anyway. But the literal decision on whether to reduce risk is not based on the risk/benefit ratio; rather the decision is based purely on whether risk reduction is “practical”. Risk/benefit is like a gate which is applied at the end of the process, after all reasonable efforts to minimise risk have been taken.

So the hearing aid manufacturer was wrong to point to risk/benefit as a first justification. They should have pointed at solutions A, B C which they were already doing, and perhaps X, Y, Z which they considered but rejected as being impractical.

The authors of ISO 14971 obviously knew this, and careful inspection of the standard shows that the word “benefit” appears only twice in the normative text, and each time it is preceded by a qualifier that further risk reduction was “not practicable”.

The standard though is a little cheeky: there are no records required to document why risk reduction was “not practicable”. This can easily lead manufacturers to overlook this part and jump directly to risk/benefit, as did our hearing aid manufacturer.

Beyond this, another key reason why manufacturers (and to some extent regulators) might want to skip over documenting the “not practicable” part is that it’s an ethical minefield. Reasons for not taking action can include:

  • the absence of suitable technology

  • risk controls that increase other risks or reduce the benefit

  • high cost of risk control

  • competitive pressure

The latter two items (cost, competition) are valid concerns: the healthcare budget is not unlimited, and making your product more expensive than the rest does not reduce the risk if nobody buys it. Despite being valid concerns, using cost and competition to justify no action is a tricky thing to write up in a regulatory file. ISO 14971 seems to have quietly (deliberately?) sidestepped the issue by not requiring any records.

Even so, skipping over the “not practicable” part and jumping straight to risk/benefit can lead to reasonable risk controls being overlooked. And that not only places patients at risk, it also exposes the manufacturer to legal negligence when things go wrong. The risk/benefit argument might work for the media, but end up in court and the criteria will be whether or not reasonable action was taken.

There is one more aspect to this story: even before we get to stage of deciding if further risk reduction is “not practicable”, there is an earlier point in the process of determining if risk reduction is necessary in the first place. For this we need to establish the “criteria for acceptable risk” which is used in the main part of risk management.

Thanks to the prevalence of risk/benefit, there remains a deep held belief that the criteria should vary significantly with different devices. But as the above discussion shows, the criteria should be developed independently from any benefit, and as such should be fairly consistent from device to device, for the same types of harm.

Instead, common sense (supported by literature) suggests that the initial target should be to make the risk “negligible”. What constitutes as “negligible” is of course debatable, and a separate article looks into a possible theory which justifies fairly low rates such as the classic “one in a million” events per year for death.  Aside from death, medical devices vary greatly in the type of harm that can occur, so we should expect to see criteria which is tuned for each particular device. But it’s not based on benefit, but rather the different types of harm.

At least for death, we can find evidence in standards that the “negligible” criteria is well respected, irrespective of the benefits.

Dialysis machines, for example, are devices that arguably provide huge benefit: extending the patient’s life for several years. Despite this benefit, designers still work towards a negligible criteria for death wherever possible. Close attention is given to the design of independent control and protection systems associated with temperature, flow, pressure, air and dialysate using double sensors, two CPUs and double switching, and even applying periodic start up testing of the protection systems. The effective failure rates are far below the 1/1,000,000 events per year criteria that is normally applied for death.

Other high risk devices such as infant incubators, infusion pumps and surgical lasers follow the same criteria: making the risk of death from device failure negligible wherever possible.

Ironically, it is often the medium risk devices (such as our hearing aid manufacturer) that are most likely to misunderstand the role of risk/benefit.

Regardless, the next time you hear someone reach for risk/benefit as a justification: tell them to stop, take a step back and explain how they concluded that further risk reduction was “not practicable”. ISO 14971 may not require any records, but it’s still a requirement.  

ISO 14971: a madmans's criteria for acceptable risk

When auditing risk management files it can be a surprise to see a wide divergence in what companies deem to be acceptable risk. Some companies say a high severity event should be less than the proverbial one-in-a-million per year while others say one event in 100,000 uses is OK. Put on the same time scale, these limits have roughly a factor of around 1000 - something an industry outsider would scratch their head trying to understand.

Perhaps at the heart of this discrepancy is our inability to measure risk, which in turn means we can't test the decision making system to see if it makes any sense. Experienced designers know that anytime we create a “system” (software, hardware, mechanical), as much as 90% of initial ideas fail in practice. These failures get weeded out by actually trying the idea out - set up a prototype, apply some inputs, looks for the expected outputs, figure out what went wrong, feedback and try it all again until it works. We should be doing the same for the systems we use in risk management, but we can't: there are no reliable inputs. That allows many illogical concepts and outcomes to persist in everyday risk management.

But what if we ignore the problem on the measurement side, and tried to use maths and logic to establish a reasonable, broadly acceptable probability for, say, death? Would such an analysis lean towards the 1 in 100,000 per use or the more conservative 1 in 1,000,000 per year?  

It turns out that maybe neither figure is right - probably (pun intended)  the rate should be more like 1 in 100,000,000 events per year. Shocked? Sounds crazy? Impossible? Safety gone mad? Read on to find out how a madman thinks. 

Let’s start with the high end: “1 in 100,000 per use” - this looks reasonable from the point of view that 99.999% of patients will be fine. The raw probability of 0.001% is a tiny, tiny amount. And the patients are receiving significant benefit, so they should be prepared to wear a tiny bit of risk along the way. Every medical procedure involves some risk. 

Yes ... but ... no. It is seductive reasoning that falls apart with a bit of a bench test.

The first mistake is to consider the individual "risk" - that is, the individual sequence of events leading to harm - as a single case in isolation. From the patient's point of view that's irrelevant - what they perceive is the sum of all the risks in the device, together with all the risks from the many other devices that get used in the course of treatment. If every manufacturer used a limit of 1 in 100,000 per use for each hazardous situation associated with a high severity event, the cumulative risk would easily exceed what society would consider reasonable and even what is commercially viable. If the figure of 1 in 100,000 per use per sequence was accurate, a typical hospital would be dealing with equipment triggered high severity events on a daily basis.

Some might still feel that 0.001% is nevertheless an extremely small number and struggle to see why it's fundamentally wrong. To help grasp the cumulative concept it can be useful to consider an equivalent situation - tax. The amount of tax an individual pays is so tiny they might argue: what's the point in paying? And, to be honest it would not make any difference. It is a fraction of a fraction of a rounding error. Of course, we know that's wrong - a responsible person would not view their contribution but the cumulative result assuming everyone took the same action. It’s the same deal with a criteria of 0.001% per use: for an individual line item in a risk management file it is genuinely tiny and plausibly acceptable, but if every manufacturer used the same figure the cumulative result would be unacceptable.

The second mistake manufacturers (and just about everyone) does is to consider the benefit - as a tangible quantity - in the justification for acceptable risk. A manufacturer might say, OK yeah, there is a tiny bit of residual risk, but hey, look over here at all this wonderful benefit we are providing! Again a seductive argument, but fails to pass a plausibility test when thrown on the bench and given some light.

As detailed in a related article, benefit should not play any role in the criteria for acceptable risk: it’s a misdirection. Instead, our initial target should be to try and make all risks “negligible”. If, after this phase, significant risk remains and it is confirmed that further risk reduction is “not practicable”, we can turn to risk/benefit to justify releasing the device to market anyway. At this point the risk/benefit ratio might look important, but on close inspection the ratio turns out not to play any role in the decision: it’s just an end stage gate after all reasonable efforts in risk reduction have been applied. And in the real world, the benefit always far exceeds the risk, so the ratio itself is irrelevant.

So manufacturers often make two significant mistakes in determining acceptable risk (1) failure to appreciate cumulative risk, and (2) using benefit to justify higher rates.

Before we take our pitchforks out we need to keep in mind a mitigating factor - the tendency to overestimate probabilities. A common mistake is to record the probability of a key event in the sequence rather than the overall probability of harm. On top of this, safety experts often encourage manufacturers to overestimate probabilities, such as the failure rates for electronics. And when things get complicated, we opt for simplified models, such as assuming that all faults in a system lead to harm, even though this is clearly not the case. These practices often lead to probability estimates 10, 100 or even 1000 times higher than are actually observed in practice.

So the two mistakes often cancel each other out. But not always: every now and then a situation occurs where conflicts of interest (cost, competition, complexity, complacency … ) can push manufacturers genuinely into a higher probability zone which is unreasonable given that risk reduction is still feasible. The absence of good criteria then allows the decision to be deemed “acceptable”. So keep the pitchforks on hand just in case. 

In summary, the correct approach is first to try and make risks “negligible”, against criteria that takes into account the cumulative risk to the patient (operator, environment). If the residual risk is still significant, and further risk reduction is not practical, we can use the risk/benefit ratio to justify marketing the device anyway.

What, then, is "negligible" for death? Surely 1 in 1,000,000 per year is more than enough? Why would a mad-man suggest 1 in 100,000,000? 

Before delving into this question, there’s one more complication to address: direct and indirect harm. Historically, safety has been related to direct harm - from sources such as electric shock, mechanical movement, thermal, high energy, flow or radiation. This was even included in the definition of safety in the 1988 edition of IEC 60601-1 . One of the quiet changes in the 2005 edition was to adopt the broader definition of safety from ISO 14971 , which does not refer to direct or indirect, just “harm”. This change makes sense, as indirect harm such as failure to diagnose or treat are also valid concerns for society.

One problem though: acceptable risk for indirect harm is vastly more complex. This type of harm generally involves a large number of factors external to the medical device, including pre-existing illness, decisions by healthcare professionals, treatment parameters, other medical devices, drugs and patient action. The cumulative logic above is sound, but incredibly messy to extract a figure for, say, an appropriate failure rate for parts in a particular medical device that are associated with diagnosis and treatment.

This article is dealing a far simpler situation - say an infant incubator where the temperature control system goes crazy and kills the patient - and boils down to a simpler question: what is an acceptable probability of death for events which are 100% caused by the equipment?

It turns out that for high severity direct harm from electrical devices - electric shock, burn, fire, mechanical - the actual rates of death per device, per year, per situation are well below 1 in 100,000,000. Manufacturers (and regulators, standard writers, test agencies) are doing a pretty good job. And closer study of the events that do occur finds that few are due to random failure, but rather illegal imports that never met standards in the first place, or devices used far (far) beyond their designed lifetime, or use/modified far outside the intended purpose. In any case, evidence indicates that the 1 in 100,000,000 per year figure, while perhaps crazy, is absolutely achievable.

You can also turn the figures around and estimate the cumulative number of incidents if the proverbial one-in-a-million was the true rate. And it's not good news. For example, In the USA, there are 350 million people, assume 20 electrical devices per person, and each device has 10 high severity hazardous situations (for shock, fire, mechanical, thermal). That adds up to 70,000 deaths per year - just for electrical devices - far higher than society would consider reasonable if cost effective risk controls are available. Which obviously there are based on rates observed in practice.

So in general a target of 1 in 100,000,000 per year for death might not be such a crazy point of view after all.  

But to be honest, the precise targets are probably irrelevant - whether it is 1 in 1,000,000 or 100,000,000, the numbers are far too small to measure or control. It's great if we can get 1 in 100,000,000 in practice, but that seems to be more by luck than controlled design. 

Or is it?

One of the magical yet hidden aspects of statistics is how easily infinitesimally small probabilities can be created without much effort. All you need is a pack of cards or a handful of dice to demonstrate how this is done. Shuffle a deck of cards and you can be confident that no else in the history of mankind has or ever will ever order the pack in the same way. The number of combinations are just too big - a staggering 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000 - yet there you are, holding it in your hand. 

In engineering it could be called the "sigma effect", the number of standard deviations we are away from the point of 50% failure. For a typical medium complexity device you need to be working around 5 sigma for individual parts to make the overall design commercially viable. Moving up a couple of notches to 7 sigma usually requires little in the way of resources, but failure rates drop to fantastically small values. By 8 sigma, Microsoft’s excel has heartburn even trying to calculate the failure rates, yet 8 sigma it is easily obtained and often used in practical design. If course, nobody actually measures the sigma directly - rather it is built into the margins of the design, using a 100mW resistor when the actual dissipation is 15mW, using a 5V±3% regulator for a microprocessor that needs 5V±10% . Good engineers roughly know where the “knee point” is (the point at which things start to cause trouble), and then use a good margin that puts it well into the 7+ sigma region.

In complex systems the large number of discrete parts can bring the system failure rate back into the realm of reality again, despite good design. Even so, negligible probabilities as appropriate to high severity events (e.g. death) can still be readily achieved by using redundant systems (independent protection) and other strategies.

Overall, engineers easily achieve these rates every day as evidenced by the low rates of serious events recorded in practice. .

But there is a catch: the biggest headache designers face is the whack a mole phenomena: try to fix problem A and problem B occurs. Fix B and C pops up. You could fix C but problem A partly sticks it's head up again. The designer then has to try and find a solution that minimises the various issues, trading off parameters in a spirit of compromise. In those situations, obtaining 7+ sigma can be impossible.

Non medical devices have both “whack a mole” problems and high risk issues, but it’s rare that they are related, So, designers usually have a clear separation of thinking: for the functional side compromise when needed, but for the high risk stuff don’t mess around, always go for the 7+ sigma solution.

In contrast, the “whack a mole” issues for medical devices are often related to the high risk functions, requiring compromise. As such it’s easy to get mixed up and assume that having been forced to accept a “low sigma” solution in one particular situation we can accept low sigma solutions elsewhere. In other words, compromise in one area easily bleeds into all decisions, ultimately leading to an organisational culture of justifying every decision on risk/benefit.

That’s the misdirection of risk/benefit raising it’s head again: the cart before the horse. We can justify low sigma based on risk/benefit only if there are no other feasible solutions around. And remember - feasible solutions are not always expensive - experience working with manufacturers of high risk devices frequently found that with a little push designers were able to create simple low cost solutions that achieve 7+ sigma without breaking the bank. The only thing holding them back was a false assumption that low sigma solution was acceptable in the first place.

For designers of high risk active medical devices, it takes good training and constant reminders to look for 7+ sigma solutions in the first instance, and only accept less when it’s “not practical” to do otherwise.