This blog is devoted to statistical thinking and its impact on science and everyday life. Emphasis is given to maximizing the use of information, avoiding statistical pitfalls, describing problems caused by the frequentist approach to statistical inference, describing advantages of Bayesian and likelihood methods, and discussing intended and unintended differences between statistics and data science. I’ll also cover regression modeling strategies, clinical trials, drug evaluation, medical diagnosis, and decision making.
Frank Harrell is a Professor of Biostatistics in the School of Medicine at Vanderbilt University. His research interests include statistical modeling, semiparametric ordinal models, predictive models and model validation, longitudinal models, Bayesian statistics, Bayesian clinical trial design, clinical trial design, analysis, and reporting, statistical computing, statistical graphics, reproducible research, drug development, medical decision making and diagnostic research, health services research, cardiology, COVID-19 therapeutic clinical trial design, and teaching.
PhD in Biostatistics, 1979
University of North Carolina
BS in Mathematics, 1973
University of Alabama in Birmingham
This is a free web course in introductory and intermediate biostatistics. Details are on the course web page.
I teach the BIOS7330 Regression Modeling Strategies course in the Biostatistics Graduate Program at Vanderbilt University in the spring semester. The course web page is here. I teach a 4-day virtual version of this course each May. Registration information for the short course may be found here.
The next scheduled offerings of the RMS short course are:
I co-teach this course at Vanderbilt each February for postdoctoral medical and surgical fellows and junior faculty in the MSCI program.
Quarto
. I start by covering the creation of annotated analysis files, discovering missing data patterns, and running descriptive statistics on with goals of understanding the data and the quality and completeness of the data. Functions in the Hmisc
package are used to annotate data frames and data tables with labels and units of measurement and to produce tabular and graphical statistical summaries. Several examples of processing and manipulating data using the data.table
package are given. Much attention is paid to the use of minimal-assumption methods for describing relationships with continuous variables, avoiding disasters such as computing mean Y as a function of quintiles of body mass index. Examples of diagramming exclusions of observations from analysis, caching results, doing parallel processing, and running simulations are presented. This article is a synopsis of the R Workflow electronic book.Statistical Thinking News is a companion site for news and opinions on data analysis and statistical modeling, prediction, statistical computing, research design and interpretation, clinical trials, and research integrity. It is updated at least weekly.
datamethods.org is a discussion site where data methodologists meet each other and subject matter experts including clinical trialists and clinical researchers. Its development is documented here. Datamethods is provided by the Department of Biostatistics, Vanderbilt University School of Medicine.
I have written some short articles on the site, listed below.