Statistical Criticism is Easy; I Need to Remember That Real People are Involved

RCT

2017

Criticism of medical journal articles is easy. I need to keep in mind that much good research is done even if there are some flaws in the design, analysis, or interpretation. I also need to remember that real people are involved.

Author

Affiliation

Frank Harrell

Vanderbilt University
School of Medicine
Department of Biostatistics

Published

November 5, 2017

I have been critical of a number of articles, authors, and journals in this growing blog article. Linking the blog with Twitter is a way to expose the blog to more readers. It is far too easy to slip into hyperbole on the blog and even easier on Twitter with its space limitations. Importantly, many of the statistical problems pointed out in my article, are very, very common, and I dwell on recent publications to get the point across that inadequate statistical review at medical journals remains a serious problem. Equally important, many of the issues I discuss, from p-values, null hypothesis testing to issues with change scores are not well covered in medical education (of authors and referees), and p-values have caused a phenomenal amount of damage to the research enterprise. Still, journals insist on emphasizing p-values. I spend a lot of time educating biomedical researchers about statistical issues and as a reviewer for many medical journals, but still am on a quest to impact journal editors.

Comments

Besides statistical issues, there are very real human issues, and challenges in keeping clinicians interested in academic clinical research when there are so many pitfalls, complexities, and compliance issues. In the many clinical trials with which I have been involved, I’ve always been glad to be the statistician and not the clinician responsible for protocol logistics, informed consent, recruiting, compliance, etc.

A recent case discussed here has brought the human issues home, after I came to know of the extraordinary efforts made by the ORBITA study’s first author, Rasha Al-Lamee, to make this study a reality. Placebo-controlled device trials are very difficult to conduct and to recruit patients into, and this was Rasha’s first effort to launch and conduct a randomized clinical trial. I very much admire Rasha’s bravery and perseverance in conducting this trial of PCI, when it is possible that many past trials of PCI vs. medical theory were affected by placebo effects.

Professor of Cardiology at Imperial College London, a co-author on the above paper, and Rasha’s mentor, Darrel Francis, elegantly pointed out to me that there is a real person on the receiving end of my criticism, and I heartily agree with him that none of us would ever want to discourage a clinical researcher from ever conducting her second randomized trial. This is especially true when the investigator has a burning interest to tackle difficult unanswered clinical questions. I don’t mind criticizing statistical designs and analyses, but I can do a better job of respecting the sincere efforts and hard work of biomedical researchers.

I note in passing that I had the honor of being a co-author with Darrel on this paper of which I am extremely proud.

Dr Francis gave me permission to include his thoughts, which are below. After that I list some ideas for making the path to presenting clinical research findings a more pleasant journey.

As the PI for ORBITA, I apologise for this trial being 40 years late, due to a staffing issue. I had to wait for the lead investigator, Rasha Al-Lamee, to be born, go to school, study Medicine at Oxford University, train in interventional cardiology, and start as a consultant in my hospital, before she could begin the trial.

Rasha had just finished her fellowship. She had experience in clinical research, but this was her first leadership role in a trial. She was brave to choose for her PhD a rigorous placebo-controlled trial in this controversial but important area.

Funding was difficult: grant reviewers, presumably interventional cardiologists, said the trial was (a) unethical and (b) unnecessary. This trial only happened because Rasha was able to convince our colleagues that the question was important and the patients would not be without stenting for long. Recruitment was challenging because it required interventionists to not apply the oculostenotic reflex. In the end the key was Rasha keeping the message at the front of all our colleagues’ minds with her boundless energy and enthusiasm. Interestingly, when the concept was explained to patients, they agreed to participate more easily than we thought, and dropped out less frequently than we feared. This means we should indeed acquire placebo-controlled data on interventional procedures.

Incidentally, I advocate the term “placebo” over “sham” for these trials, for two reasons. First, placebo control is well recognised as essential for assessing drug efficacy, and this helps people understand the need for it with devices. Second, “sham” is a pejorative word, implying deception. There is no deception in a placebo controlled trial, only pre-agreed withholding of information.

There are several ways to improve the system that I believe would foster clinical research and make peer review more objective and productive.

Have journals conduct reviews of background and methods without knowledge of results.
Abandon journals and use researcher-led online systems that invite open post-“publication” peer review and give researchers the opportunities to improve their “paper” in an ongoing fashion.
If not publishing the entire paper online, deposit the background and methods sections for open pre-journal submission review.
Abandon null hypothesis testing and p-values. Before that, always keep in mind that a large p-value means nothing more than “we don’t yet have evidence against the null hypothesis”, and emphasize confidence limits.
Embrace Bayesian methods that provide safer and more actionable evidence, including measures that quantify clinical significance. And if one is trying to amass evidence that the effects of two treatments are similar, compute the direct probability of similarity using a Bayesian model.
Improve statistical education of researchers, referees, and journal editors, and strengthen statistical review for journals.
Until everyone understands the most important statistical concepts, better educate researchers and peer reviewers on statistical problems to avoid.

On a final note, I regularly review clinical trial design papers for medical journals. I am often shocked at design flaws that authors state are “too late to fix” in their response to the reviews. This includes problems caused by improper endpoint variables that necessitated the randomization of triple the number of patients actually needed to establish efficacy. Such papers have often been through statistical review before the journal submission. This points out two challenges: (1) there is a lot of between-statistician variation that statisticians need to address, and (2) there are many fundamental statistical concepts that are not known to many statisticians (witness the widespread use of change scores and dichotomization of variables even when senior statisticians are among a paper’s authors).

Reuse

CC BY 4.0

--- title: "Statistical Criticism is Easy; I Need to Remember That Real People are Involved" date: 2017-11-05 modified: 2020-11-15 author: - name: Frank Harrell url: https://hbiostat.org affiliation: Vanderbilt University<br>School of Medicine<br>Department of Biostatistics categories: [RCT, 2017] description: "Criticism of medical journal articles is easy. I need to keep in mind that much good research is done even if there are some flaws in the design, analysis, or interpretation. I also need to remember that real people are involved." --- I have been critical of a number of articles, authors, and journals in [this](../errmed) growing blog article. Linking the blog with Twitter is a way to expose the blog to more readers. It is far too easy to slip into hyperbole on the blog and even easier on Twitter with its space limitations. Importantly, many of the statistical problems pointed out in my article, are very, very common, and I dwell on recent publications to get the point across that inadequate statistical review at medical journals remains a serious problem. Equally important, many of the issues I discuss, from p-values, null hypothesis testing to issues with change scores are not well covered in medical education (of authors and referees), and p-values have caused a phenomenal amount of damage to the research enterprise. Still, journals insist on emphasizing p-values. I spend a lot of time educating biomedical researchers about statistical issues and as a reviewer for many medical journals, but still am on a quest to impact journal editors. [[Comments](https://hbiostat.org/comment.html)]{.aside} Besides statistical issues, there are very real human issues, and challenges in keeping clinicians interested in academic clinical research when there are so many pitfalls, complexities, and compliance issues. In the many clinical trials with which I have been involved, I've always been glad to be the statistician and not the clinician responsible for protocol logistics, informed consent, recruiting, compliance, etc. A recent case discussed [here](../errmed#pcisham) has brought the human issues home, after I came to know of the extraordinary efforts made by the [ORBITA](http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(17)32714-9/fulltext) study's first author, Rasha Al-Lamee, to make this study a reality. Placebo-controlled device trials are very difficult to conduct and to recruit patients into, and this was Rasha's first effort to launch and conduct a randomized clinical trial. I very much admire Rasha's bravery and perseverance in conducting this trial of PCI, when it is possible that many past trials of PCI vs. medical theory were affected by placebo effects. Professor of Cardiology at Imperial College London, a co-author on the above paper, and Rasha's mentor, [Darrel Francis](https://www.imperial.ac.uk/people/d.francis), elegantly pointed out to me that there is a real person on the receiving end of my criticism, and I heartily agree with him that none of us would ever want to discourage a clinical researcher from ever conducting her second randomized trial. This is especially true when the investigator has a burning interest to tackle difficult unanswered clinical questions. I don't mind criticizing statistical designs and analyses, but I can do a better job of respecting the sincere efforts and hard work of biomedical researchers. I note in passing that I had the honor of being a co-author with Darrel on [this paper](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0081699) of which I am extremely proud. Dr Francis gave me permission to include his thoughts, which are below. After that I list some ideas for making the path to presenting clinical research findings a more pleasant journey. ------ **As the PI for ORBITA, I apologise for this trial being 40 years late, due to a staffing issue. I had to wait for the lead investigator, Rasha Al-Lamee, to be born, go to school, study Medicine at Oxford University, train in interventional cardiology, and start as a consultant in my hospital, before she could begin the trial.** Rasha had just finished her fellowship. She had experience in clinical research, but this was her first leadership role in a trial. She was brave to choose for her PhD a rigorous placebo-controlled trial in this controversial but important area. Funding was difficult: grant reviewers, presumably interventional cardiologists, said the trial was (a) unethical and (b) unnecessary. This trial only happened because Rasha was able to convince our colleagues that the question was important and the patients would not be without stenting for long. Recruitment was challenging because it required interventionists to not apply the oculostenotic reflex. In the end the key was Rasha keeping the message at the front of all our colleagues' minds with her boundless energy and enthusiasm. Interestingly, when the concept was explained to patients, they agreed to participate more easily than we thought, and dropped out less frequently than we feared. This means we should indeed acquire placebo-controlled data on interventional procedures. Incidentally, I advocate the term "placebo" over "sham" for these trials, for two reasons. First, placebo control is well recognised as essential for assessing drug efficacy, and this helps people understand the need for it with devices. Second, "sham" is a pejorative word, implying deception. There is no deception in a placebo controlled trial, only pre-agreed withholding of information. ------------------------------------------------------------------------ There are several ways to improve the system that I believe would foster clinical research and make peer review more objective and productive. - Have journals conduct reviews of background and methods without knowledge of results. - Abandon journals and use researcher-led online systems that invite open post-"publication" peer review and give researchers the opportunities to improve their "paper" in an ongoing fashion. - If not publishing the entire paper online, deposit the background and methods sections for open pre-journal submission review. - Abandon null hypothesis testing and p-values. Before that, always keep in mind that a large p-value means nothing more than "we don't yet have evidence against the null hypothesis", and emphasize confidence limits. - Embrace Bayesian methods that provide safer and more actionable evidence, including measures that quantify clinical significance. And if one is trying to amass evidence that the effects of two treatments are similar, compute the direct probability of similarity using a Bayesian model. - Improve statistical education of researchers, referees, and journal editors, and strengthen statistical review for journals. - Until everyone understands the most important statistical concepts, better educate researchers and peer reviewers on [statistical problems to avoid](https://discourse.datamethods.org/t/author-checklist). On a final note, I regularly review clinical trial design papers for medical journals. I am often shocked at design flaws that authors state are "too late to fix" in their response to the reviews. This includes problems caused by improper endpoint variables that necessitated the randomization of triple the number of patients actually needed to establish efficacy. Such papers have often been through statistical review before the journal submission. This points out two challenges: (1) there is a lot of between-statistician variation that statisticians need to address, and (2) there are many fundamental statistical concepts that are not known to many statisticians (witness the widespread use of change scores and dichotomization of variables even when senior statisticians are among a paper's authors).