ISO 14971: a madmans's criteria for acceptable risk

When auditing risk management files it can be a surprise to see a wide divergence in what companies deem to be acceptable risk. Some companies say a high severity event should be less than the proverbial one-in-a-million per year while others say one event in 100,000 uses is OK. Put on the same time scale, these limits have roughly a factor of around 1000 - something an industry outsider would scratch their head trying to understand.

Perhaps at the heart of this discrepancy is our inability to measure risk, which in turn means we can't test the decision making system to see if it makes any sense. Experienced designers know that anytime we create a “system” (software, hardware, mechanical), as much as 90% of initial ideas fail in practice. These failures get weeded out by actually trying the idea out - set up a prototype, apply some inputs, looks for the expected outputs, figure out what went wrong, feedback and try it all again until it works. We should be doing the same for the systems we use in risk management, but we can't: there are no reliable inputs. That allows many illogical concepts and outcomes to persist in everyday risk management.

But what if we ignore the problem on the measurement side, and tried to use maths and logic to establish a reasonable, broadly acceptable probability for, say, death? Would such an analysis lean towards the 1 in 100,000 per use or the more conservative 1 in 1,000,000 per year?  

It turns out that maybe neither figure is right - probably (pun intended)  the rate should be more like 1 in 100,000,000 events per year. Shocked? Sounds crazy? Impossible? Safety gone mad? Read on to find out how a madman thinks. 

Let’s start with the high end: “1 in 100,000 per use” - this looks reasonable from the point of view that 99.999% of patients will be fine. The raw probability of 0.001% is a tiny, tiny amount. And the patients are receiving significant benefit, so they should be prepared to wear a tiny bit of risk along the way. Every medical procedure involves some risk. 

Yes ... but ... no. It is seductive reasoning that falls apart with a bit of a bench test.

The first mistake is to consider the individual "risk" - that is, the individual sequence of events leading to harm - as a single case in isolation. From the patient's point of view that's irrelevant - what they perceive is the sum of all the risks in the device, together with all the risks from the many other devices that get used in the course of treatment. If every manufacturer used a limit of 1 in 100,000 per use for each hazardous situation associated with a high severity event, the cumulative risk would easily exceed what society would consider reasonable and even what is commercially viable. If the figure of 1 in 100,000 per use per sequence was accurate, a typical hospital would be dealing with equipment triggered high severity events on a daily basis.

Some might still feel that 0.001% is nevertheless an extremely small number and struggle to see why it's fundamentally wrong. To help grasp the cumulative concept it can be useful to consider an equivalent situation - tax. The amount of tax an individual pays is so tiny they might argue: what's the point in paying? And, to be honest it would not make any difference. It is a fraction of a fraction of a rounding error. Of course, we know that's wrong - a responsible person would not view their contribution but the cumulative result assuming everyone took the same action. It’s the same deal with a criteria of 0.001% per use: for an individual line item in a risk management file it is genuinely tiny and plausibly acceptable, but if every manufacturer used the same figure the cumulative result would be unacceptable.

The second mistake manufacturers (and just about everyone) does is to consider the benefit - as a tangible quantity - in the justification for acceptable risk. A manufacturer might say, OK yeah, there is a tiny bit of residual risk, but hey, look over here at all this wonderful benefit we are providing! Again a seductive argument, but fails to pass a plausibility test when thrown on the bench and given some light.

As detailed in a related article, benefit should not play any role in the criteria for acceptable risk: it’s a misdirection. Instead, our initial target should be to try and make all risks “negligible”. If, after this phase, significant risk remains and it is confirmed that further risk reduction is “not practicable”, we can turn to risk/benefit to justify releasing the device to market anyway. At this point the risk/benefit ratio might look important, but on close inspection the ratio turns out not to play any role in the decision: it’s just an end stage gate after all reasonable efforts in risk reduction have been applied. And in the real world, the benefit always far exceeds the risk, so the ratio itself is irrelevant.

So manufacturers often make two significant mistakes in determining acceptable risk (1) failure to appreciate cumulative risk, and (2) using benefit to justify higher rates.

Before we take our pitchforks out we need to keep in mind a mitigating factor - the tendency to overestimate probabilities. A common mistake is to record the probability of a key event in the sequence rather than the overall probability of harm. On top of this, safety experts often encourage manufacturers to overestimate probabilities, such as the failure rates for electronics. And when things get complicated, we opt for simplified models, such as assuming that all faults in a system lead to harm, even though this is clearly not the case. These practices often lead to probability estimates 10, 100 or even 1000 times higher than are actually observed in practice.

So the two mistakes often cancel each other out. But not always: every now and then a situation occurs where conflicts of interest (cost, competition, complexity, complacency … ) can push manufacturers genuinely into a higher probability zone which is unreasonable given that risk reduction is still feasible. The absence of good criteria then allows the decision to be deemed “acceptable”. So keep the pitchforks on hand just in case. 

In summary, the correct approach is first to try and make risks “negligible”, against criteria that takes into account the cumulative risk to the patient (operator, environment). If the residual risk is still significant, and further risk reduction is not practical, we can use the risk/benefit ratio to justify marketing the device anyway.

What, then, is "negligible" for death? Surely 1 in 1,000,000 per year is more than enough? Why would a mad-man suggest 1 in 100,000,000? 

Before delving into this question, there’s one more complication to address: direct and indirect harm. Historically, safety has been related to direct harm - from sources such as electric shock, mechanical movement, thermal, high energy, flow or radiation. This was even included in the definition of safety in the 1988 edition of IEC 60601-1 . One of the quiet changes in the 2005 edition was to adopt the broader definition of safety from ISO 14971 , which does not refer to direct or indirect, just “harm”. This change makes sense, as indirect harm such as failure to diagnose or treat are also valid concerns for society.

One problem though: acceptable risk for indirect harm is vastly more complex. This type of harm generally involves a large number of factors external to the medical device, including pre-existing illness, decisions by healthcare professionals, treatment parameters, other medical devices, drugs and patient action. The cumulative logic above is sound, but incredibly messy to extract a figure for, say, an appropriate failure rate for parts in a particular medical device that are associated with diagnosis and treatment.

This article is dealing a far simpler situation - say an infant incubator where the temperature control system goes crazy and kills the patient - and boils down to a simpler question: what is an acceptable probability of death for events which are 100% caused by the equipment?

It turns out that for high severity direct harm from electrical devices - electric shock, burn, fire, mechanical - the actual rates of death per device, per year, per situation are well below 1 in 100,000,000. Manufacturers (and regulators, standard writers, test agencies) are doing a pretty good job. And closer study of the events that do occur finds that few are due to random failure, but rather illegal imports that never met standards in the first place, or devices used far (far) beyond their designed lifetime, or use/modified far outside the intended purpose. In any case, evidence indicates that the 1 in 100,000,000 per year figure, while perhaps crazy, is absolutely achievable.

You can also turn the figures around and estimate the cumulative number of incidents if the proverbial one-in-a-million was the true rate. And it's not good news. For example, In the USA, there are 350 million people, assume 20 electrical devices per person, and each device has 10 high severity hazardous situations (for shock, fire, mechanical, thermal). That adds up to 70,000 deaths per year - just for electrical devices - far higher than society would consider reasonable if cost effective risk controls are available. Which obviously there are based on rates observed in practice.

So in general a target of 1 in 100,000,000 per year for death might not be such a crazy point of view after all.  

But to be honest, the precise targets are probably irrelevant - whether it is 1 in 1,000,000 or 100,000,000, the numbers are far too small to measure or control. It's great if we can get 1 in 100,000,000 in practice, but that seems to be more by luck than controlled design. 

Or is it?

One of the magical yet hidden aspects of statistics is how easily infinitesimally small probabilities can be created without much effort. All you need is a pack of cards or a handful of dice to demonstrate how this is done. Shuffle a deck of cards and you can be confident that no else in the history of mankind has or ever will ever order the pack in the same way. The number of combinations are just too big - a staggering 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000 - yet there you are, holding it in your hand. 

In engineering it could be called the "sigma effect", the number of standard deviations we are away from the point of 50% failure. For a typical medium complexity device you need to be working around 5 sigma for individual parts to make the overall design commercially viable. Moving up a couple of notches to 7 sigma usually requires little in the way of resources, but failure rates drop to fantastically small values. By 8 sigma, Microsoft’s excel has heartburn even trying to calculate the failure rates, yet 8 sigma it is easily obtained and often used in practical design. If course, nobody actually measures the sigma directly - rather it is built into the margins of the design, using a 100mW resistor when the actual dissipation is 15mW, using a 5V±3% regulator for a microprocessor that needs 5V±10% . Good engineers roughly know where the “knee point” is (the point at which things start to cause trouble), and then use a good margin that puts it well into the 7+ sigma region.

In complex systems the large number of discrete parts can bring the system failure rate back into the realm of reality again, despite good design. Even so, negligible probabilities as appropriate to high severity events (e.g. death) can still be readily achieved by using redundant systems (independent protection) and other strategies.

Overall, engineers easily achieve these rates every day as evidenced by the low rates of serious events recorded in practice. .

But there is a catch: the biggest headache designers face is the whack a mole phenomena: try to fix problem A and problem B occurs. Fix B and C pops up. You could fix C but problem A partly sticks it's head up again. The designer then has to try and find a solution that minimises the various issues, trading off parameters in a spirit of compromise. In those situations, obtaining 7+ sigma can be impossible.

Non medical devices have both “whack a mole” problems and high risk issues, but it’s rare that they are related, So, designers usually have a clear separation of thinking: for the functional side compromise when needed, but for the high risk stuff don’t mess around, always go for the 7+ sigma solution.

In contrast, the “whack a mole” issues for medical devices are often related to the high risk functions, requiring compromise. As such it’s easy to get mixed up and assume that having been forced to accept a “low sigma” solution in one particular situation we can accept low sigma solutions elsewhere. In other words, compromise in one area easily bleeds into all decisions, ultimately leading to an organisational culture of justifying every decision on risk/benefit.

That’s the misdirection of risk/benefit raising it’s head again: the cart before the horse. We can justify low sigma based on risk/benefit only if there are no other feasible solutions around. And remember - feasible solutions are not always expensive - experience working with manufacturers of high risk devices frequently found that with a little push designers were able to create simple low cost solutions that achieve 7+ sigma without breaking the bank. The only thing holding them back was a false assumption that low sigma solution was acceptable in the first place.

For designers of high risk active medical devices, it takes good training and constant reminders to look for 7+ sigma solutions in the first instance, and only accept less when it’s “not practical” to do otherwise.