I have been critical of a number of articles, authors, and journals in this growing blog article. Linking the blog with Twitter is a way to expose the blog to more readers. It is far too easy to slip into hyperbole on the blog and even easier on Twitter with its space limitations. Importantly, many of the statistical problems pointed out in my article, are very, very common, and I dwell on recent publications to get the point across that inadequate statistical review at medical journals remains a serious problem.
(In a Bayesian analysis) It is entirely appropriate to collect data until a point has been
proven or disproven, or until the data collector runs out of time, money, or patience.
Edwards, Lindman, Savage (1963)
Introduction Bayesian inference, which follows the likelihood principle, is not affected by the experimental design or intentions of the investigator. P-values can only be computed if both of these are known, and as been described by Berry (1987) and others, it is almost never the case that the computation of the p-value at the end of a study takes into account all the changes in design that were necessitated when pure experimental designs encounter the real world.
The following examples are intended to show the advantages of Bayesian reporting of treatment efficacy analysis, as well as to provide examples contrasting with frequentist reporting. As detailed here, there are many problems with p-values, and some of those problems will be apparent in the examples below. Many of the advantages of Bayes are summarized here. As seen below, Bayesian posterior probabilities prevent one from concluding equivalence of two treatments on an outcome when the data do not support that (i.
Professor of Biostatistics
Vanderbilt University School of Medicine
Professor of Psychiatry and, by courtesy, of Medicine (Cardiovascular Medicine) and of Biomedical Data Science
Stanford University School of Medicine
Revised July 17, 2017 It is often said that randomized clinical trials (RCTs) are the gold standard for learning about therapeutic effectiveness. This is because the treatment is assigned at random so no variables, measured or unmeasured, will be truly related to treatment assignment.
Misinterpretation of P-values and Main Study Results Dichotomania Problems With Change Scores Improper Subgrouping Serial Data and Response Trajectories As Doug Altman famously wrote in his Scandal of Poor Medical Research in BMJ in 1994, the quality of how statistical principles and analysis methods are applied in medical research is quite poor. According to Doug and to many others such as Richard Smith, the problems have only gotten worse.
Type I error for smoke detector: probability of alarm given no fire=0.05
Bayesian: probability of fire given current air data
Frequentist smoke alarm designed as most research is done:
Set the alarm trigger so as to have a 0.8 chance of detecting an inferno
Advantage of actionable evidence quantification:
Set the alarm to trigger when the posterior probability of a fire exceeds 0.02 while at home and at 0.
Randomized clinical trials (RCT) have long been held as the gold standard for generating evidence about the effectiveness of medical and surgical treatments, and for good reason. But I commonly hear clinicians lament that the results of RCTs are not generalizable to medical practice, primarily for two reasons:
Patients in clinical practice are different from those enrolled in RCTs Drug adherence in clinical practice is likely to be lower than that achieved in RCTs, resulting in lower efficacy.