This post will grow to cover questions about data reduction methods, also known as unsupervised learning methods. These are intended primarily for two purposes:
collapsing correlated variables into an overall score so that one does not have to disentangle correlated effects, which is a difficult statistical task reducing the effective number of variables to use in a regression or other predictive model, so that fewer parameters need to be estimated The latter example is the “too many variables too few subjects” problem.
I have been critical of a number of articles, authors, and journals in this growing blog article. Linking the blog with Twitter is a way to expose the blog to more readers. It is far too easy to slip into hyperbole on the blog and even easier on Twitter with its space limitations. Importantly, many of the statistical problems pointed out in my article, are very, very common, and I dwell on recent publications to get the point across that inadequate statistical review at medical journals remains a serious problem.
(In a Bayesian analysis) It is entirely appropriate to collect data until a point has been proven or disproven, or until the data collector runs out of time, money, or patience.
— Edwards, Lindman, Savage (1963) Introduction Bayesian inference, which follows the likelihood principle, is not affected by the experimental design or intentions of the investigator. P-values can only be computed if both of these are known, and as been described by Berry (1987) and others, it is almost never the case that the computation of the p-value at the end of a study takes into account all the changes in design that were necessitated when pure experimental designs encounter the real world.
To avoid “false positives” do away with “positive”.
A good poker player plays the odds by thinking to herself “The probability I can win with this hand is 0.91” and not “I’m going to win this game” when deciding the next move.
State conclusions honestly, completely deferring judgments and actions to the ultimate decision makers. Just as it is better to make predictions than classifications in prognosis and diagnosis, use the word “probably” liberally, and avoid thinking “the evidence against the null hypothesis is strong, so we conclude the treatment works” which creates the opportunity of a false positive.
As a biostatistics teacher I’ve spent a lot of time thinking about inverting the classroom and adding multimedia content. My first thought was to create YouTube videos corresponding to sections in my lecture notes. This typically entails recording the computer screen while going through slides, adding a voiceover. I realized that the maintenance of such videos is difficult, and this also creates a barrier to adding new content. In addition, the quality of the video image is lower than just having the student use a pdf viewer on the original notes.
Professor of Biostatistics
Vanderbilt University School of Medicine
Professor of Psychiatry and, by courtesy, of Medicine (Cardiovascular Medicine) and of Biomedical Data Science
Stanford University School of Medicine
Revised July 17, 2017 It is often said that randomized clinical trials (RCTs) are the gold standard for learning about therapeutic effectiveness. This is because the treatment is assigned at random so no variables, measured or unmeasured, will be truly related to treatment assignment.
Misinterpretation of P-values and Main Study Results Dichotomania Problems With Change Scores Improper Subgrouping Serial Data and Response Trajectories Cluster Analysis As Doug Altman famously wrote in his Scandal of Poor Medical Research in BMJ in 1994, the quality of how statistical principles and analysis methods are applied in medical research is quite poor. According to Doug and to many others such as Richard Smith, the problems have only gotten worse.
While being engaged in biomedical research for a few decades and watching reproducibility of research as a whole, I’ve developed my own ranking of reliability/quality/usefulness of research across several subject matter areas. This list is far from complete. Let’s start with a subjective list of what I perceive as the areas in which published research is least likely to be both true and useful. The following list is ordered in ascending order of quality, with the most problematic area listed first.
I discussed the many advantages or probability estimation over classification. Here I discuss a particular problem related to classification, namely the harm done by using improper accuracy scoring rules. Accuracy scores are used to drive feature selection, parameter estimation, and for measuring predictive performance on models derived using any optimization algorithm. For this discussion let Y denote a no/yes false/true 0/1 event being predicted, and let Y=0 denote a non-event and Y=1 the event occurred.
The difference between Bayesian and frequentist inference in a nutshell:
With Bayes you start with a prior distribution for θ and given your data make an inference about the θ-driven process generating your data (whatever that process happened to be), to quantify evidence for every possible value of θ. With frequentism, you make assumptions about the process that generated your data and infinitely many replications of them, and try to build evidence for what θ is not.