# Posts

### Why I Don't Like Percents

I prefer fractions and ratios over percents. Here are the reasons.

### How Can Machine Learning be Reliable When the Sample is Adequate for Only One Feature?

It is easy to compute the sample size N1 needed to reliably estimate how one predictor relates to an outcome. It is next to impossible for a machine learning algorithm entertaining hundreds of features to yield reliable answers when the sample size < N1.

### New Year Goals

Methodologic goals and wishes for research and clinical practice for 2018

### Scoring Multiple Variables, Too Many Variables and Too Few Observations: Data Reduction

This post will grow to cover questions about data reduction methods, also known as unsupervised learning methods. These are intended primarily for two purposes: collapsing correlated variables into an overall score so that one does not have to disentangle correlated effects, which is a difficult statistical task reducing the effective number of variables to use in a regression or other predictive model, so that fewer parameters need to be estimated The latter example is the “too many variables too few subjects” problem.

### Statistical Criticism is Easy; I Need to Remember That Real People are Involved

I have been critical of a number of articles, authors, and journals in this growing blog article. Linking the blog with Twitter is a way to expose the blog to more readers. It is far too easy to slip into hyperbole on the blog and even easier on Twitter with its space limitations. Importantly, many of the statistical problems pointed out in my article, are very, very common, and I dwell on recent publications to get the point across that inadequate statistical review at medical journals remains a serious problem.

### Continuous Learning from Data: No Multiplicities from Computing and Using Bayesian Posterior Probabilities as Often as Desired

(In a Bayesian analysis) It is entirely appropriate to collect data until a point has been proven or disproven, or until the data collector runs out of time, money, or patience. Edwards, Lindman, Savage (1963) Introduction Bayesian inference, which follows the likelihood principle, is not affected by the experimental design or intentions of the investigator. P-values can only be computed if both of these are known, and as been described by Berry (1987) and others, it is almost never the case that the computation of the p-value at the end of a study takes into account all the changes in design that were necessitated when pure experimental designs encounter the real world.

### Bayesian vs. Frequentist Statements About Treatment Efficacy

The following examples are intended to show the advantages of Bayesian reporting of treatment efficacy analysis, as well as to provide examples contrasting with frequentist reporting. As detailed here, there are many problems with p-values, and some of those problems will be apparent in the examples below. Many of the advantages of Bayes are summarized here. As seen below, Bayesian posterior probabilities prevent one from concluding equivalence of two treatments on an outcome when the data do not support that (i.

### Integrating Audio, Video, and Discussion Boards with Course Notes

As a biostatistics teacher I’ve spent a lot of time thinking about inverting the classroom and adding multimedia content. My first thought was to create YouTube videos corresponding to sections in my lecture notes. This typically entails recording the computer screen while going through slides, adding a voiceover. I realized that the maintenance of such videos is difficult, and this also creates a barrier to adding new content. In addition, the quality of the video image is lower than just having the student use a pdf viewer on the original notes.

### EHRs and RCTs: Outcome Prediction vs. Optimal Treatment Selection

Frank Harrell Professor of Biostatistics Vanderbilt University School of Medicine Laura Lazzeroni Professor of Psychiatry and, by courtesy, of Medicine (Cardiovascular Medicine) and of Biomedical Data Science Stanford University School of Medicine Revised July 17, 2017 It is often said that randomized clinical trials (RCTs) are the gold standard for learning about therapeutic effectiveness. This is because the treatment is assigned at random so no variables, measured or unmeasured, will be truly related to treatment assignment.

### Statistical Errors in the Medical Literature

Misinterpretation of P-values and Main Study Results Dichotomania Problems With Change Scores Improper Subgrouping Serial Data and Response Trajectories As Doug Altman famously wrote in his Scandal of Poor Medical Research in BMJ in 1994, the quality of how statistical principles and analysis methods are applied in medical research is quite poor. According to Doug and to many others such as Richard Smith, the problems have only gotten worse.