Researchers have used contorted, inefficient, and arbitrary analyses to demonstrated added value in biomarkers, genes, and new lab measurements. Traditional statistical measures have always been up to the task, and are more powerful and more flexible. It’s time to revisit them, and to add a few slight twists to make them more helpful.
This article provides my reflections after the PCORI/PACE Evidence and the Individual Patient meeting on 2018-05-31. The discussion includes a high-level view of heterogeneity of treatment effect in optimizing treatment for individual patients.
This article discusses issues related to alpha spending, effect sizes used in power calculations, multiple endpoints in RCTs, and endpoint labeling. Changes in endpoint priority is addressed. Included in the the discussion is how Bayesian probabilities more naturally allow one to answer multiple questions without all-too-arbitrary designations of endpoints as “primary” and “secondary”. And we should not quit trying to learn.
Deep learning and other forms of machine learning are getting a lot of press in medicine. The reality doesn’t match the hype, and interpretable statistical models still have a lot to offer.
Professor of Biostatistics
Vanderbilt University School of Medicine
Professor of Psychiatry and, by courtesy, of Medicine (Cardiovascular Medicine) and of Biomedical Data Science
Stanford University School of Medicine
Revised July 17, 2017 It is often said that randomized clinical trials (RCTs) are the gold standard for learning about therapeutic effectiveness. This is because the treatment is assigned at random so no variables, measured or unmeasured, will be truly related to treatment assignment.
Misinterpretation of P-values and Main Study Results Dichotomania Problems With Change Scores Improper Subgrouping Serial Data and Response Trajectories Cluster Analysis As Doug Altman famously wrote in his Scandal of Poor Medical Research in BMJ in 1994, the quality of how statistical principles and analysis methods are applied in medical research is quite poor. According to Doug and to many others such as Richard Smith, the problems have only gotten worse.
What clinicians learn from clinical practice, unless they routinely do n-of-one studies, is based on comparisons of unlikes. Then they criticize like-vs-like comparisons from randomized trials for not being generalizable. This is made worse by not understanding that clinical trials are designed to estimate relative efficacy, and relative efficacy is surprisingly transportable. Many clinicians do not even track what happens to their patients to be able to inform their future patients.
Optimum decision making in the presence of uncertainty comes from probabilistic thinking. The relevant probabilities are of a predictive nature: P(the unknown given the known). Thresholds are not helpful and are completely dependent on the utility/cost/loss function.
Corollary: Since p-values are P(someone else’s data are more extreme than mine if H0 is true) and we don’t know whether H0 is true, it is a non-predictive probability that is not useful for decision making.