Knowing what you do not Know

Measure twice cut once

As readers will have noticed, my wife and I have spent a lot of time talking to medical practitioners in recent months. The same readers will also know that my wife is a Structural Biologist, whose work I have featured before in Data Visualisation – A Scientific Treatment [1]. Some of our previous medical interactions had led to me thinking about the nexus between medical science and statistics [2]. More recently, my wife had a discussion with a doctor which brought to mind some of her own previous scientific work. Her observations about the connections between these two areas have formed the genesis of this article. While the origins of this piece are in science and medicine, I think that the learnings have broader applicability.

So the general context is a medical test, the result of which was my wife being told that all was well [3]. Given that humans are complicated systems (to say the very least), my wife was less than convinced that just because reading X was OK it meant that everything else was also necessarily OK. She contrasted the approach of the physician with something from her own experience and in particular one of the experiments that formed part of her PhD thesis. I’m going to try to share the central point she was making with you without going in to all of the scientific details [4]. However to do this I need to provide at least some high-level background.

Structural Biology is broadly the study of the structure of large biological molecules, which mostly means proteins and protein assemblies. What is important is not the chemical make up of these molecules (how many carbon, hydrogen, oxygen, nitrogen and other atoms they consist of), but how these atoms are arranged to create three dimensional structures. An example of this appears below:

The 3D structure of a bacterial Ribosome

This image is of a bacterial Ribosome. Ribosomes are miniature machines which assemble amino acids into proteins as part of the chain which converts information held in DNA into useful molecules [5]. Ribosomes are themselves made up of a number of different proteins as well as RNA.

In order to determine the structure of a given protein, it is necessary to first isolate it in sufficient quantity (i.e. to purify it) and then subject it to some form of analysis, for example X-ray crystallography, electron microscopy or a variety of other biophysical techniques. Depending on the analytical procedure adopted, further work may be required, such as growing crystals of the protein. Something that is generally very important in this process is to increase the stability of the protein that is being investigated [6]. The type of protein that my wife was studying [7] is particularly unstable as its natural home is as part of the wall of cells – removed from this supporting structure these types of proteins quickly degrade.

So one of my wife’s tasks was to better stabilise her target protein. This can be done in a number of ways [8] and I won’t get into the technicalities. After one such attempt, my wife looked to see whether her work had been successful. In her case the relative stability of her protein before and after modification is determined by a test called a Thermostability Assay.

Sigmoidal Dose Response Curve A
© University of Cambridge – reproduced under a Creative Commons 2.0 licence

In the image above, you can see the combined results of several such assays carried out on both the unmodified and modified protein. Results for the unmodified protein are shown as a green line [9] and those for the modified protein as a blue line [10]. The fact that the blue line (and more particularly the section which rapidly slopes down from the higher values to the lower ones) is to the right of the green one indicates that the modification has been successful in increasing thermostability.

So my wife had done a great job – right? Well things were not so simple as they might first seem. There are two different protocols relating to how to carry out this thermostability assay. These basically involve doing some of the required steps in a different order. So if the steps are A, B, C and D, then protocol #1 consists of A ↦ B ↦ C ↦ D and protocol #2 consists of A ↦ C ↦ B ↦ D. My wife was thorough enough to also use this second protocol with the results shown below:

Sigmoidal Dose Response Curve B
© University of Cambridge – reproduced under a Creative Commons 2.0 licence

Here we have the opposite finding, the same modification to the protein seems to have now decreased its stability. There are some good reasons why this type of discrepancy might have occurred [11], but overall my wife could not conclude that this attempt to increase stability had been successful. This sort of thing happens all the time and she moved on to the next idea. This is all part of the rather messy process of conducting science [12].

I’ll let my wife explain her perspective on these results in her own words:

In general you can’t explain everything about a complex biological system with one set of data or the results of one test. It will seldom be the whole picture. Protocol #1 for the thermostability assay was the gold standard in my lab before the results I obtained above. Now protocol #1 is used in combination with another type of assay whose efficacy I also explored. Together these give us an even better picture of stability. The gold standard shifted. However, not even this bipartite test tells you everything. In any complex system (be that Biological or a complicated dataset) there are always going to be unknowns. What I think is important is knowing what you can and can’t account for. In my experience in science, there is generally much much more that can’t be explained than can.

Belt and Braces [or suspenders if you are from the US, which has quite a different connotation in the UK!]

As ever translating all of this to a business context is instructive. Conscientious Data Scientists or business-focussed Statisticians who come across something interesting in a model or analysis will always try (where feasible) to corroborate this by other means; they will try to perform a second “experiment” to verify their initial findings. They will also realise that even two supporting results obtained in different ways will not in general be 100% conclusive. However the highest levels of conscientiousness may be more honoured in breach than observance [13]. Also there may not be an alternative “experiment” that can be easily run. Whatever the motivations or circumstances, it is not beyond the realm of possibility that some Data Science findings are true only in the same way that my wife thought she had successfully stabilised her protein before carrying out the second assay.

I would argue that business will often have much to learn from the levels of rigour customary in most scientific research [14]. It would be nice to think that the same rigour is always applied in commercial matters as academic ones. Unfortunately experience would tend to suggest the contrary is sometimes the case. However, it would also be beneficial if people working on statistical models in industry went out of their way to stress not only what phenomena these models can explain, but what they are unable to explain. Knowing what you don’t know is the first step towards further enlightenment.


Indeed this previous article had a sub-section titled Rigour and Scrutiny, echoing some of the themes in this piece.
See More Statistics and Medicine.
As in the earlier article, apologies for the circumlocution. I’m both looking to preserve some privacy and save the reader from boredom.
Anyone interested in more information is welcome to read her thesis which is in any case in the public domain. It is 188 pages long, which is reasonably lengthy even by my standards.
They carry out translation which refers to synthesising proteins based on information carried by messenger RNA, mRNA.
Some proteins are naturally stable, but many are not and will not survive purification or later steps in their native state.
G Protein-coupled Receptors or GPCRs.
Chopping off flexible sections, adding other small proteins which act as scaffolding, getting antibodies or other biological molecules to bind to the protein and so on.
Actually a sigmoidal dose-response curve.
For anyone with colour perception problems, the green line has markers which are diamonds and the blue line has markers which are triangles.
As my wife writes [with my annotations]:

A possible explanation for this effect was that while T4L [the protein she added to try to increase stability – T4 Lysozyme] stabilised the binding pocket, the other domains of the receptor were destabilised. Another possibility was that the introduction of T4L caused an increase in the flexibility of CL3, thus destabilising the receptor. A method for determining whether this was happening would be to introduce rigid linkers at the AT1R-T4L junction [AT1R was the protein she was studying, angiotensin II type 1 receptor], or other placements of T4L. Finally AT1R might exist as a dimer and the addition of T4L might inhibit the formation of dimers, which could also destabilise the receptor.

© University of Cambridge – reproduced under a Creative Commons 2.0 licence

See also Toast.
Though to be fair, the way that this phrase is normally used today is probably not what either Hamlet or Shakespeare intended by it back around 1600.
Of course there are sadly examples of specific scientists falling short of the ideals I have described here.



More Statistics and Medicine

Weighing Medicine in the balance

I wrote last on the intersection of these two disciplines back in March 2011 (Medical Malpractice). What has prompted me to return to the subject is some medical tests that I was offered recently. If the reader will forgive me, I won’t go into the medical details – and indeed have also obfuscated some of the figures I was quoted – but neither are that relevant to the point that I wanted to make. This point relates to how statistics are sometimes presented in medical situations and – more pertinently – the disconnect between how these may be interpreted by the man or woman in the street, as opposed to what is actually going on.

Rather than tie myself in knots, let’s assume that the test is for a horrible disease called PJT Syndrome [1]. Let’s further assume that I am told that the test on offer has an accuracy of 80% [2]. This in and of itself is a potentially confusing figure. Does the test fail to detect the presence of PJT Syndrome 20% of the time, or does it instead erroneously detect PJT Syndrome, when the patient is actually perfectly healthy, 20% of the time? In this case, after an enquiry, I was told that a negative result was a negative result, but that a positive one did not always mean that the subject suffered from PJT Syndrome; so the issue is confined to false positives, not false negatives. This definition of 80% accuracy is at least a little clearer.

So what is a reasonable person to deduce from the 80% figure? Probably that if they test positive, that there is an 80% certainty that they have PJT Syndrome. I think that my visceral reaction would probably be along those lines. However, such a conclusion can be incorrect, particularly where the incidence of PJT Syndrome is low in a population. I’ll try to explain why.

If we know that PJT Syndrome occurs in 1 in every 100 people on average, what does this mean for the relevance of our test results? Let’s take a graphical look at a wholly representative population of exactly 100 people. The PJT Syndrome sufferer appears in red at the bottom right.

1 in 100

Now what is the result of the 80% accuracy of our test, remembering that this means that 20% of people taking it will be falsely diagnosed as having PJT Syndrome? Well 20% of 100 is – applying a complex algorithm – approximately 20 people. Let’s flag these up on our population schematic in grey.

20 in 100

So 20 people have the wrong diagnosis. One is correctly identified as having PJT Syndrome and 79 are correctly identified as not having PJT Syndrome; so a total of 80 have the right diagnosis.

What does this mean for those 21 people who have been unfortunate enough to test positive for PJT Syndrome (the one person coloured red and the 20 coloured grey)? Well only one of them actually has the malady. So, if I test positive, my chances of actually having PJT Syndrome are not 80% as we originally thought, but instead 1 in 21 or 4.76%. So my risk is still low having tested positive. It is higher than the risk in the general population, which is 1 in 100, or 1%, but not much more so.

The problem arises if having a condition is rare (here 1 in 100) and the accuracy of a test is low (here it is wrong for 20% of people taking it). If you consider that the condition that I was being offered a test for actually has an incidence of around 1 in 20,000 people, then with an 80% accurate test we would get the following:

  1. In a population of 20,000 one 1 person has the condition
  2. In the same population a test with our 80% accuracy means that 20% of people will test positive for it when they are perfectly healthy, this amounts to 4,000 people
  3. So in total, 4,001 people will test positive, 1 correctly, 4,000 erroneously
  4. Which means that a positive test tells me my odds of having the condition being tested for are 1 in 4,001, or 0.025%; still a pretty unlikely event

Low accuracy tests and rare conditions are a very bad combination. As well as causing people unnecessary distress, the real problem is where the diagnosis leads potential suffers to take actions (e.g. undergoing further diagnosis, which could be invasive, or even embarking on a course of treatment) which may themselves have the potential to cause injury to the patient.

I am not of course suggesting that people ignore medical advice, but Doctors are experts in medicine and not statistics. When deciding what course of action to take in a situation similar to one I recently experienced, taking the time to more accurately assess risks and benefits is extremely important. Humans are well known to overestimate some risks (and underestimate others), there are circumstances when crunching the numbers and seeing what they tell you is not only a good idea, it can help to safeguard your health.

For what it’s worth, I opted out of these particular tests.


A terrible condition which renders sufferers unable to express any thought in under 1,000 words.
Not the actual figure quoted, but close to it.



Medical malpractice

8 plus 7 equals 15, carry one, er...

I was listening to a discussion with two medical practitioners on the radio today while driving home from work. I’ll remove the context of the diseases they were debating as the point I want to make is not specifically to do with this aspect and dropping it removes a degree of emotion from the conversation. The bone of contention between the two antagonists was the mortality rate from a certain set of diseases in the UK and whether this was to do with the competency of general practitioners (GPs, or “family doctors” for any US readers) and the diagnostic procedures they use, or to do with some other factor.

In defending her colleagues from the accusations of the first interviewee, the general practitioner said that the rate of mortality for sufferers of these diseases in other European countries (she specifically cited Belgium and France) was greater than in the UK. I should probably pause at this point to note that this comment seemed the complete opposite of every other European health survey I have read in recent years, but we will let that pass and instead focus on the second part of her argument. This was that that better diagnoses would be made if the UK hired more doctors (like her), thereby allowing them to spend more time with each patient. She backed up this assertion by then saying that France has many more doctors per 1,000 people than the UK (the figures I found were 3.7 per 1,000 for France and 2.2 per 1,000 for the UK; these were totally different to the figures she quoted, but again I’ll let that pass as she did seem to at least have the relation between the figures in each country the right way round this time).

What the GP seemed to be saying is summarised in the following chart:

Vive la difference

I have no background in medicine, but to me the lady in question made the opposite point to the one she seemed to want to. If there are fewer doctors per capita in the UK than in France, but UK mortality rates are better, it might be more plausible to argue that less doctors implies better survival rates; this is what the above chart suggests. Of course this assertion is open to challenge and – as with most statistical phenomena – there are undoubtedly many other factors. There is also of course the old chestnut of correlation not implying causality (not that the above chart even establishes correlation). However, at the very least, the “facts” as presented did not seem to be a prima facie case for hiring more UK doctors.

Sadly for both the GP in question and for inhabitants of the UK, I think that the actual graph is more like:

This exhibit could perhaps suggest that the second doctor had a potential point, but such simplistic observations, much as we may love to make them, do not always stand up to rigorous statistical analysis. Statistical findings can be as counter-intuitive as many other mathematical results.

Speaking of statistics, when challenged on whether she had the relative mortality rates for France and the UK the right way round, the same GP said, “well you can prove anything with statistics.” We hear this phrase so often that I guess many of us come to believe it. In fact it might be more accurate to say, “selection bias is all pervasive”, or perhaps even “innumeracy will generally lead to erroneous conclusions being drawn.”

When physicians are happy to appear on national radio and exhibit what is at best a tenuous grasp of figures, one can but wonder about the risk of numerically-based medical decisions sometimes going awry. With doctors also increasingly involved in public affairs (either as expert advisers or – in the UK at least – often as members of parliament), perhaps these worries should also be extended into areas of policy making.

Even more fundamentally (but then as an ex-Mathematician I would say this), perhaps the UK needs to reassess how it teaches mathematics. Also maybe UK medical schools need to examine numeric proficiency again just before students graduate as well as many years earlier when candidates apply; just in case something in the process of producing new doctors has squeezed their previous mathematical ability out of them.

Before I begin to be seen as an opponent of the medical profession, I should close by asking a couple of questions that are perhaps closer to home for some readers. How many of the business decisions that are taken using information lovingly crafted by information professionals such as you and me are marred by an incomplete understanding of numbers on the part of [hopefully] a small subsection of users? As IT professionals, what should we be doing to minimise the likelihood of such an occurrence in our organisations?