Clinicians' Misunderstanding of Probabilities Makes Them Like Backwards Probabilities Such As Sensitivity, Specificity, and Type I Error
Optimum decision making in the presence of uncertainty comes from probabilistic thinking. The relevant probabilities are of a predictive nature: P(the unknown given the known). Thresholds are not helpful and are completely dependent on the utility/cost/loss function.
Corollary: Since p-values are P(someone else's data are more extreme than mine if H0 is true) and we don't know whether H0 is true, it is a non-predictive probability that is not useful for decision making.
Clinicians are people trained in the science and practice of medicine, and most of them are very good at it. They are also very good at many aspects of research. But they are generally not taught probability, and this can limit their research skills. Many excellent clinicians even let their limitations in understanding probability make them believe that their clinical decision making is worse than it actually is. I have taught many clinicians who say “I need a hard and fast rule so I know how to diagnose or treat patients. I need a hard cutoff on blood pressure, HbA1c, etc. so that I know what to do, and the fact that I either treat or not treat the patient means that I don’t want to consider a probability of disease but desire a simple classification rule.” This makes the clinician try to influence the statistician to use inefficient, arbitrary methods such as categorization, stratification, and matching.
In reality, clinicians do not act that way when treating patients. They are smart enough to know that if a patient has cholesterol just over someone’s arbitrary threshold they may not start statin therapy right away if the patient has no other risk factors (e.g., smoking) going against him. They know that sometimes you start a patient on a lower dose and see how she responds, or start one drug and try it for a while and then switch drugs if the efficacy is unacceptable or there is a significant side effect.
So I emphasize the need to understand probabilities when I’m teaching clinicians. A probability is a self-contained summary of the current information, except for the patient’s risk aversion and other utilities. Clinicians need to be comfortable with a probability of 0.5 meaning “we don’t know much” and not requesting a classification of disease/normal that does nothing but cover up the problem. A classification does not account for gray zones or patient and physician utility functions.
Even physicians who understand the meaning of a probability are often not understanding conditioning. Conditioning is all important, and conditioning on different things massively changes the meaning of the probabilities being computed. Every physician I’ve known has been taught probabilistic medical diagnosis by first learning about sensitivity (sens) and specificity (spec). These are probabilities that are in backwards time- and information flow order. How did this happen? Sensitivity, specificity, and receiver operating characteristic curves were developed for radar and radio research in the military. It was a important to receive radio signals from distant aircraft, and to detect an incoming aircraft on radar. The ability to detect something that is really there is definitely important. In the 1950s, virologists appropriated these concepts to measure the performance of viral cultures. Virus needs to be detected when it’s present, and not detected when it’s not. Sensitivity is the probability of detecting a condition when it is truly present, and specificity is the probability of not detecting it when it is truly absent. One can see how these probabilities would be useful outside of virology and bacteriology when the samples are retrospective, as in a case-control studies. But I believe that clinicians and researchers would be better off if backward probabilities were not taught or were mentioned only to illustrate how not to think about a problem.
But the way medical students are educated, they assume that sens and spec are what you first consider in a prospective cohort of patients! This gives the professor the opportunity of teaching Bayes’ rule and requires the use of a supposedly unconditional probability known as prevalence which is actually not very well defined. The students plugs everything into Bayes’ rule and fails to notice that several quantities cancel out. The result is the following: the proportion of patients with a positive test who have disease, and the proportion with a negative test who have disease. These are trivially calculated from the cohort data without knowing anything about sens, spec, and Bayes. Instead of computing the obvious, the previous backwards way of thinking harms the student’s understanding for years to come and influences those who later engage in clinical and pharmaceutical research to believe that type I error and p-values are directly useful.
The situation in medical diagnosis gets worse when referral bias (also called workup bias) is present. When certain types of patients do not get a final diagnosis, sens and spec are biased. For example, younger women with a negative test may not get the painful procedure that yields the final diagnosis. There are formulas that must be used to correct sens and spec. But wait! When Bayes’ rule is used to obtain the probability of disease we needed in the first place, these corrections completely cancel out when the usual correction methods are used! Using forward probabilities in the first place means that one just conditions on age, sex, and result of the initial diagnostic test and no special methods other than (sometimes) logistic regression are required.
There is an analogy to statistical testing. p-values and type I error are affected by sequential testing and a host of other factors, but forward-time probabilities (Bayesian posterior probabilities) are not. Posterior probabilities condition on what is known and does not have to imagine alternate paths to getting to what is known (as do sens and spec when workup bias exists). p-values and type I errors are backwards-information-flow measures, and clinical researchers and regulators come to believe that type I error is the error of interest. They also very frequently misinterpret p-values. Type I error is one minus spec, and power is sens. The posterior probability is exactly analogous to the probability of disease.
Sens and spec are so pervasive in medicine, bioinformatics, and biomarker research that we don’t question how silly they would be in other contexts. Do we dichotomize a response variable so that we can compute the probability that a patient is on treatment B given a “positive” response? On the contrary we want to know the full continuous distribution of the response given the assigned treatment. Again this represents forward probabilities.
Some Useful Resources
- Properties of diagnostic data distributions by AP Dawid
- The (in)validity of sensitivity and specificity by Irene Guggenmoos‐Holzmann and Hans C. van Houwelingen
- The Flawed Reasoning Behind the Replication Crisis by Aubrey Clayton
- Damage Caused by Classification Accuracy and Other Discontinuous Improper Accuracy Scoring Rules
- Statistical Errors in the Medical Literature
- Statistically Efficient Ways to Quantify Added Predictive Value of New Measurements
- Musings on Multiple Endpoints in RCTs
- Is Medicine Mesmerized by Machine Learning?