Statistically Efficient Ways to Quantify Added Predictive Value of New Measurements

Researchers have used contorted, inefficient, and arbitrary analyses to demonstrated added value in biomarkers, genes, and new lab measurements. Traditional statistical measures have always been up to the task, and are more powerful and more flexible. It’s time to revisit them, and to add a few slight twists to make them more helpful.

In Machine Learning Predictions for Health Care the Confusion Matrix is a Matrix of Confusion

The performance metrics chosen for prediction tools, and for Machine Learning in particular, have significant implications for health care and a penetrating understanding of the AUROC will lead to better methods, greater ML value, and ultimately, benefit patients.

Data Methods Discussion Site

This article lays out the rationale and overall design of a new discussion site about quantitative methods.

Viewpoints on Heterogeneity of Treatment Effect and Precision Medicine

This article provides my reflections after the PCORI/PACE Evidence and the Individual Patient meeting on 2018-05-31. The discussion includes a high-level view of heterogeneity of treatment effect in optimizing treatment for individual patients.

Navigating Statistical Modeling and Machine Learning

This article elaborates on Frank Harrell’s post providing guidance in choosing between machine learning and statistical modeling for a prediction project.

Road Map for Choosing Between Statistical Modeling and Machine Learning

This article provides general guidance to help researchers choose between machine learning and statistical modeling for a prediction project.

Musings on Multiple Endpoints in RCTs

This article discusses issues related to alpha spending, effect sizes used in power calculations, multiple endpoints in RCTs, and endpoint labeling. Changes in endpoint priority is addressed. Included in the the discussion is how Bayesian probabilities more naturally allow one to answer multiple questions without all-too-arbitrary designations of endpoints as “primary” and “secondary”. And we should not quit trying to learn.

Improving Research Through Safer Learning from Data

What are the major elements of learning from data that should inform the research process? How can we prevent having false confidence from statistical analysis? Does a Bayesian approach result in more honest answers to research questions? Is learning inherently subjective anyway, so we need to stop criticizing Bayesians’ subjectivity? How important and possible is pre-specification? When should replication be required? These and other questions are discussed.

Is Medicine Mesmerized by Machine Learning?

Deep learning and other forms of machine learning are getting a lot of press in medicine. The reality doesn’t match the hype, and interpretable statistical models still have a lot to offer.

Information Gain From Using Ordinal Instead of Binary Outcomes

This article gives examples of information gained by using ordinal over binary response variables. This is done by showing that for the same sample size and power, smaller effects can be detected