This article discusses issues related to alpha spending, effect sizes used in power calculations, multiple endpoints in RCTs, and endpoint labeling. Changes in endpoint priority is addressed. Included in the the discussion is how Bayesian probabilities more naturally allow one to answer multiple questions without all-too-arbitrary designations of endpoints as "primary" and "secondary". And we should not quit trying to learn.
What are the major elements of learning from data that should inform the research process? How can we prevent having false confidence from statistical analysis? Does a Bayesian approach result in more honest answers to research questions? Is learning inherently subjective anyway, so we need to stop criticizing Bayesians' subjectivity? How important and possible is pre-specification? When should replication be required? These and other questions are discussed.
To avoid "false positives" do away with "positive".
A good poker player plays the odds by thinking to herself "The probability I can win with this hand is 0.91" and not "
Professor of Biostatistics
Vanderbilt University School of Medicine
Professor of Psychiatry and, by courtesy, of Medicine (Cardiovascular Medicine) and of Biomedical Data Science
Stanford University School of Medicine
Misinterpretation of P-values and Main Study Results Dichotomania Problems With Change Scores Improper Subgrouping Serial Data and Response Trajectories Cluster Analysis As Doug Altman famously wrote in his Scandal of Poor Medical Research in BMJ in 1994, the quality of how statistical principles and analysis methods are applied in medical research is quite poor.
The difference between Bayesian and frequentist inference in a nutshell:
With Bayes you start with a prior distribution for θ and given your data make an inference about the θ-driven process generating your data (whatever that process happened to be), to quantify evidence for every possible value of θ.
What clinicians learn from clinical practice, unless they routinely do n-of-one studies, is based on comparisons of unlikes. Then they criticize like-vs-like comparisons from randomized trials for not being generalizable.