Data Methods Discussion Site

This article lays out the rationale and overall design of a new discussion site about quantitative methods.

Viewpoints on Heterogeneity of Treatment Effect and Precision Medicine

This article provides my reflections after the PCORI/PACE Evidence and the Individual Patient meeting on 2018-05-31. The discussion includes a high-level view of heterogeneity of treatment effect in optimizing treatment for individual patients.

Navigating Statistical Modeling and Machine Learning

This article elaborates on Frank Harrell’s post providing guidance in choosing between machine learning and statistical modeling for a prediction project.

Road Map for Choosing Between Statistical Modeling and Machine Learning

This article provides general guidance to help researchers choose between machine learning and statistical modeling for a prediction project.

Musings on Multiple Endpoints in RCTs

This article discusses issues related to alpha spending, effect sizes used in power calculations, multiple endpoints in RCTs, and endpoint labeling. Changes in endpoint priority is addressed. Included in the the discussion is how Bayesian probabilities more naturally allow one to answer multiple questions without all-too-arbitrary designations of endpoints as “primary” and “secondary”. And we should not quit trying to learn.

Improving Research Through Safer Learning from Data

What are the major elements of learning from data that should inform the research process? How can we prevent having false confidence from statistical analysis? Does a Bayesian approach result in more honest answers to research questions? Is learning inherently subjective anyway, so we need to stop criticizing Bayesians’ subjectivity? How important and possible is pre-specification? When should replication be required? These and other questions are discussed.

Is Medicine Mesmerized by Machine Learning?

Deep learning and other forms of machine learning are getting a lot of press in medicine. The reality doesn’t match the hype, and interpretable statistical models still have a lot to offer.

Information Gain From Using Ordinal Instead of Binary Outcomes

This article gives examples of information gained by using ordinal over binary response variables. This is done by showing that for the same sample size and power, smaller effects can be detected

Why I Don't Like Percents

I prefer fractions and ratios over percents. Here are the reasons.

How Can Machine Learning be Reliable When the Sample is Adequate for Only One Feature?

It is easy to compute the sample size N1 needed to reliably estimate how one predictor relates to an outcome. It is next to impossible for a machine learning algorithm entertaining hundreds of features to yield reliable answers when the sample size < N1.