RCT Analyses With Covariate Adjustment

This article summarizes arguments for the claim that the primary analysis of treatment effect in a RCT should be with adjustment for baseline covariates. It reiterates some findings and statements from classic papers, with illustration on the GUSTO-I trial.

Leiden University, NL


July 19, 2020

The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement outlines principles, criteria, and key considerations for applying predictive approaches to clinical trials to provide patient-centered evidence in support of decision making. The focus of PATH is on modeling of “heterogeneity of treatment effect” (HTE), which refers to the nonrandom variation in the magnitude of the absolute treatment effect (‘treatment benefit’) across individual patients. A more focused definition is that HTE refers to variation of treatment effect on a scale for which it is possible that no such variation exists, even if the treatment has a nonzero effect on the average.

The recent PATH statement lists a number of principles and guidelines. A key principle is in Fig 2:

“A risk-modeling approach to RCT analysis is likely to be most valuable when an overall treatment effect is well established; subgroup results (including risk-based subgroup results) from overall null trials should be interpreted cautiously.”

Here I discuss how we establish ‘overall treatment effect’. I reiterate some findings and statements from classic papers in favor of covariate adjustment as the key analysis.

Illustration in the GUSTO-I trial

For illustration we may analyze 30,510 patients with an acute myocardial infarction as included in the GUSTO-I trial. This illustration starts as the blog by Frank Harrell on examining HTE.

# keep only SK and tPA arms; and selected set of covariates
gusto <- upData(gusto, subset=tx %in% c('SK', 'tPA'),
                keep=Cs(day30, tx, age, Killip, sysbp, pulse, pmi, miloc, sex))

Input object size: 5241552 bytes; 29 variables 40830 observations Modified variable tx Kept variables day30,tx,age,Killip,sysbp,pulse,pmi,miloc,sex New object size: 1349744 bytes; 9 variables 30510 observations

html(describe(gusto), scroll=FALSE)
gusto Descriptives

9 Variables   30510 Observations


sex: Sex
 Value        male female
 Frequency   22795   7715
 Proportion  0.747  0.253 

Killip: Killip Class
 Value          I    II   III    IV
 Frequency  26007  3857   417   229
 Proportion 0.852 0.126 0.014 0.008 

lowest : 19.027 20.781 20.969 20.984 21.449 , highest: 91.938 92.328 96.547 108 110
pulse: Heart Rate beats/min
3051001570.99975.3819.5 50 55 62 73 86 98107
lowest : 0 1 6 9 20 , highest: 191 200 205 210 220
sysbp: Systolic Blood Pressure mmHg
3051001960.99912926.58 92.0100.0112.0129.5144.0160.0170.0
lowest : 0 36 40 43 46 , highest: 266 274 275 276 280
miloc: MI Location
 Value      Inferior    Other Anterior
 Frequency     17582     1062    11866
 Proportion    0.576    0.035    0.389 

pmi: Previous MI
 Value         no   yes
 Frequency  25452  5058
 Proportion 0.834 0.166 

 Value         SK   tPA
 Frequency  20162 10348
 Proportion 0.661 0.339 

Overall treatment effect

The simplest analysis of treatment effect is by performing an intention-to-treat analysis of the randomized patients for the primary outcome (30-day mortality). In GUSTO-I, 10,348 patients were randomized to receive tPA; 20,162 to SK and had 30-day mortality status known. The 30-day mortality was 653/10,348 = 6.3% vs 1475/20,162 = 7.3%; an absolute difference of 1.0%, or an odds ratio of 0.85 [0.78-0.94].

# simple cross-table
table1(~ as.factor(day30) | tx, data=gusto, digits=2)  
0 18687 (92.7%) 9695 (93.7%) 28382 (93.0%)
1 1475 (7.3%) 653 (6.3%) 2128 (7.0%)
tab2 <- table(gusto$day30, gusto$tx)
result <- OddsRatio(tab2, conf.level = 0.95)
names(result) <- c("Odds Ratio", "Lower CI", "Upper CI")

kable(as.data.frame(t(result))) %>% kable_styling(full_width=F, position = "left")
Odds Ratio Lower CI Upper CI
0.853 0.776 0.939
# BinomDiffCI(x1 = events1, n1 = n1, x2 = events2, n2 = n2, ...)
CI      <- BinomDiffCI(x1 = tab2[2,1], n1 = sum(tab2[,1]), x2 = tab2[2,2], n2 = sum(tab2[,2]),
                       method = "scorecc")
colnames(CI) <- c("Absolute difference", "Lower CI", "Upper CI")

result <- round(CI, 3) # absolute difference with confidence interval
kable(as.data.frame(result)) %>% kable_styling(full_width=F, position = "left")
Absolute difference Lower CI Upper CI
0.01 0.004 0.016

Adjustment for baseline covariates

The unadjusted odds ratio of 0.853 is a marginal estimate, while a lot can be said in favor of conditional estimates, where we adjust for prognostically important baseline characteristics.

There may be 3 compelling arguments in favor of conditioning on baseline covariates when we consider binary outcomes.

  1. Interpretation
  2. Statistical power
  3. Correction for baseline imbalance

Support from literature

Let’s look at some supportive points from references on these arguments.

  1. Interpretation: Hauck et al, 1998, provide strong support.

Abstract: “The analyses of the primary objectives of randomized clinical trials often are not adjusted for covariates, except possibly for stratification variables. For analyses with linear models, adjustment is a precision issue only. … For nonlinear analyses, omitting covariates from the analysis of randomized trials leads to a loss of efficiency as well as a change in the treatment effect being estimated. We recommend that the primary analyses adjust for important prognostic covariates in order to come as close as possible to the clinically most relevant subject-specific measure of treatment effect. Additional benefits would be an increase in efficiency of tests for no treatment effect and improved external validity.”
Controlled Clin Trials 1998;19:249–256.

  • So, these authors emphasize argument 1 (“to come as close as possible to the clinically most relevant subject-specific measure of treatment effect”), and argument 2 (“increase in efficiency of tests for no treatment effect”); while also recognizing a remarkable issue in nonlinear models (“a change in the treatment effect being estimated”). This change is different from linear models, where the adjusted and unadjusted effects are on expectation equal. In nonlinear models such as the logistic regression model, effect estimates are non-collapsible.
  1. Statistical power: Robinson & Jewell 1991 provide a fascinating paper on the impact of covariate adjustment in nonlinear models, such as the logistic regression model: the precision of the estimated treatment effect is worse than without adjustment, while conditioning makes that the expected effect is further from Null. Which impact is stronger? They show that efficiency is expected to increase (provided that the covariate is prognostic for the outcome).

  2. Correction for baseline imbalance In RCTs, imbalance will arise by pure chance. It may hamper the interpretation of a treatment effect in a specific RCT if one group has a better prognosis according to baseline characteristics than another.
    Of course, we can only adjust for observed baseline characteristics. We argued in a 2000 AHJ paper that potential imbalances on other, unobserved patient characteristics do not invalidate attempts to correct for observed covariates.

Practice in medical research

The statistical model for covariate adjustment can be simple or more complex. In various reviews researchers have noted that typically 5 to 10 baseline covariates are considered.
Poor practice was noted for papers published in 2007. Pocock et al, Lancet 2000 note:

FINDINGS: Most trials presented baseline comparability in a table. These tables were often unduly large, and about half the trials inappropriately used significance tests for baseline comparison. Methods of randomisation, including possible stratification, were often poorly described. There was little consistency over whether to use covariate adjustment and the criteria for selecting baseline factors for which to adjust were often unclear. Most trials emphasised the simple unadjusted results and covariate adjustment usually made negligible difference. Two-thirds of the reports presented subgroup findings, but mostly without appropriate statistical tests for interaction. Many reports put too much emphasis on subgroup analyses that commonly lacked statistical power.

INTERPRETATION: Clinical trials need a predefined statistical analysis plan for uses of baseline data, especially covariate-adjusted analyses and subgroup analyses. Investigators and journals need to adopt improved standards of statistical reporting, and exercise caution when drawing conclusions from subgroup findings.

More recent papers show that covariate-adjusted analyses are far more common:

Trials published in 2014 … reported adjusted analyses in 87% with pre-specified adjustment in analyses in 95% …

Importantly, EMA guidance is available on how to do such analyses:

6.2. Number of covariates in the analysis
No more than a few covariates should be included in the primary analysis. Even though methods of adjustment, such as analysis of covariance, can theoretically adjust for a large number of covariates it is safer to pre-specify a simple model.

Illustration in GUSTO-I: adjust for age

A simple illustration is to examine the impact of age (which is a strong prognostic factor in many diseases) for adjustment of the primary treatment effect in GUSTO-I.

# Analyses
f0 <- lrm(day30 ~ tx, data=gusto)
print(f0) # coef tpa: -0.1586

Logistic Regression Model

lrm(formula = day30 ~ tx, data = gusto)
Model Likelihood
Ratio Test
Rank Discrim.
Obs 30510 LR χ2 10.82 R2 0.001 C 0.517
0 28382 d.f. 1 R21,30510 0.000 Dxy 0.035
1 2128 Pr(>χ2) 0.0010 R21,5938.7 0.002 γ 0.079
max |∂log L/∂β| 3×10-8 Brier 0.065 τa 0.005
β S.E. Wald Z Pr(>|Z|)
Intercept  -2.5392  0.0270 -93.88 <0.0001
tx=tPA  -0.1586  0.0486 -3.26 0.0011

So, we note that the unadjusted regression coefficient for tpa was -0.159.

Let’s continue with age adjustment for the tpa effect.
1. How different was the mean age between randomized groups?
2. How much of the difference between adjusted and unadjusted effect estimate can be attributed to this imbalance?

# Examine impact of age
table1(~ age | tx, data=gusto, digits=4) # age 61.03 in tpa vs 60.86 in SK group, delta: 0.17 years
Mean (SD) 60.86 (11.87) 61.03 (11.97) 60.91 (11.90)
Median [Min, Max] 61.58 [19.03, 110.0] 61.57 [20.78, 108.0] 61.58 [19.03, 110.0]
f.age <- lrm(day30 ~ tx + age, data=gusto)

Logistic Regression Model

lrm(formula = day30 ~ tx + age, data = gusto)
Model Likelihood
Ratio Test
Rank Discrim.
Obs 30510 LR χ2 1506.21 R2 0.121 C 0.741
0 28382 d.f. 2 R22,30510 0.048 Dxy 0.482
1 2128 Pr(>χ2) <0.0001 R22,5938.7 0.224 γ 0.483
max |∂log L/∂β| 2×10-7 Brier 0.061 τa 0.063
β S.E. Wald Z Pr(>|Z|)
Intercept  -7.9008  0.1634 -48.34 <0.0001
tx=tPA  -0.1878  0.0500 -3.75 0.0002
age   0.0821  0.0023 35.23 <0.0001

So, we note that the adjusted regression coefficient for tPA was -0.188.

# Difference in tx effect by adjustment
d.tx.age <- f.age$coefficients[2] - f0$coefficients[2] 
# Impact of age difference on tx effect
d.age <- with(gusto, mean(age[tx=='SK']) - mean(age[tx=='tPA']))
d.tx.ageimpact <- f.age$coefficients[3] * d.age

# Impact of stratification on age
# d.tx.age # -0.0291 stronger effect
# d.tx.ageimpact # -0.0138 because of prognostic difference: age difference between randomized groups
# d.tx.age - d.age # -0.0154 attributable to conditioning on age: stratification effect, non-collapsibility

f.unadjusted <- c("coef"=as.vector(f0$coefficients[2]) , SE=sqrt(f0$var[2,2]), d.coef=NA, d.SE=NA, 
                  "Imbalance (%)"=NA, "Stratification (%)"=NA)
f.age.adj <- c("coef"=as.vector(f.age$coefficients[2]) , SE=sqrt(f.age$var[2,2]), 
               d.coef=f.age$coefficients[2] / f0$coefficients[2] - 1, 
               d.SE=sqrt(f.age$var[2,2]) / sqrt(f0$var[2,2]) - 1 , 
               "Imbalance (%)"= d.tx.ageimpact / f0$coefficients[2], 
               "Stratification (%)"=(d.tx.age - d.tx.ageimpact)/ f0$coefficients[2] )
kable(as.data.frame(rbind(f.unadjusted, f.age.adj)), digits=3) %>% 
  kable_styling(full_width=F, position = "left")
coef SE d.coef d.SE Imbalance (%) Stratification (%)
f.unadjusted -0.159 0.049 NA NA NA NA
f.age.adj -0.188 0.050 0.184 0.028 0.087 0.097
# As Table III in Steyerberg 2000 paper

Summary of impact of adjustment on coefficient and SE

Coefficient behavior: the unadjusted coeffient was -0.159; adjusted for age it is -0.188. This is a difference of -0.029, or +18% in estimate of the treatment effect Steyerberg, Bossuyt, Lee; AHJ 2000.
Part of this change is attributable to a difference in age at baseline: the tPA group was slightly disadvantaged by a higher age (61.03 years) compared to the SK group (60.86 years). The difference of -0.168 years accounts for a change of -0.014 in the treatment effect estimate:

d.age (in years) x f.age$coef[3] =
-0.168 x 0.082 = -0.014.

The remaining difference is:

delta coefficient - delta attributable to age imbalance =
d.tx.age - d.tx.ageimpact =
-0.029 - -0.014 = -0.015.

So, the +18% more extreme effect estimate can be attributed for 8.7% to imbalance, and 9.7% to using a conditional rather than an unconditional model: stratification, or non-collapsibility (see also Gail et al, 1984).


The GUSTO-I serves well to illustrate the impact of conditioning on baseline covariates when we consider binary outcomes. The age-adjusted estimate of the overall treatment effect has a different interpretation than the unadjusted estimate: the effect for ‘Patients with acute MI’ versus ‘A patient with an acute MI of a certain age’. The statistical power for testing of the adjusted effect is higher than that of the unadjusted effect. The required sample size is reduced by a factor of approximately (\(1 – R^2\)). For age, the Nagelkerke \(R^2\) was 12%. This implies that an analysis of the age-adjusted treatment effect with 88% of the sample size would have the same power as an unadjusted analysis in 100% of the sample. Finally, the age-adjusted treatment corrected for baseline imbalance.

Implications for estimating heterogeneity of treatment effect

The unadjusted and adjusted estimate of the overall treatment effect discussed above are effect estimates on a relative scale: odds ratios on the odds scale. The translation from relative to absolute scale can be made by explicitly considering the baseline risk. If the baseline risk is low, the treatment benefit can only be small; If the baseline risk is high, the treatment benefit can be large.
The recent Predictive Approaches to Treatment Effect Heterogeneity (PATH) Statement provides guidance on predictive approaches to heterogeneous treatment effects. The practical implementation of principles from the PATH statement is discussed in another blog.



MH Gail, S Wieand, S Piantadosi - Biometrika, 1984
Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates

SJ Senn - Statistics in medicine, 1989
Covariate imbalance and random allocation in clinical trials

LD Robinson, NP Jewell - International Statistical Review, 1991 Some surprising results about covariate adjustment in logistic regression models

WW Hauck, S Anderson, SM Marcus - Controlled clinical trials, 1998
Should we adjust for covariates in nonlinear regression analyses of randomized trials?

SJ Pocock, SE Assmann, LE Enos… - Statistics in Medicine, 2002
Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems

Illustrations focused on neurotrauma RCTs

AV Hernández, EW Steyerberg, GS Taylor… - Neurosurgery, 2005
Subgroup analysis and covariate adjustment in randomized clinical trials of traumatic brain injury: a systematic review

AV Hernández, EW Steyerberg, I Butcher… - Journal of Neurotrauma, 2006
Adjustment for strong predictors of outcome in traumatic brain injury trials: 25% reduction in sample size requirements in the IMPACT study

P Perel,…, EW Steyerberg, CRASH Trial Collaborators - Journal of clinical epidemiology, 2012
Covariate adjustment increased power in randomized controlled trials: an example in traumatic brain injury

GUSTO-I references

EW Steyerberg, PMM Bossuyt, KL Lee - American heart journal, 2000
Clinical trials in acute myocardial infarction: should we adjust for baseline characteristics?

Gusto Investigators - New England Journal of Medicine, 1993
An international randomized trial comparing four thrombolytic strategies for acute myocardial infarction

Califf R, …, ML Simoons, EJ Topol, GUSTO-I Investigators - American heart journal, 1997 Selection of thrombolytic therapy for individual patients: development of a clinical model

Other references

EMA 2015: Guideline on adjustment for baseline covariates in clinical trials

BC Kahan, V Jairath, CJ Doré, TP Morris - Trials, 2014 The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies

AV Hernández, MJC Eijkemans, EW Steyerberg - Annals of epidemiology, 2006
Randomized controlled trials with time-to-event outcomes: how much does prespecified covariate adjustment increase power?

AV Hernández, EW Steyerberg… - Journal of clinical epidemiology, 2004
Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements

DD Thompson, HF Lingsma, WN Whiteley, GD Murray, EW Steyerberg - Journal of clinical epidemiology, 2015
Covariate adjustment had similar benefits in small and large randomized controlled trials

PATH Statement references

The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement
David M. Kent, MD, MS; Jessica K. Paulus, ScD; David van Klaveren, PhD; Ralph D’Agostino, PhD; Steve Goodman, MD, MHS, PhD; Rodney Hayward, MD; John P.A. Ioannidis, MD, DSc; Bray Patrick-Lake, MFS; Sally Morton, PhD; Michael Pencina, PhD; Gowri Raman, MBBS, MS; Joseph S. Ross, MD, MHS; Harry P. Selker, MD, MSPH; Ravi Varadhan, PhD; Andrew Vickers, PhD; John B. Wong, MD; and Ewout W. Steyerberg, PhD
Ann Intern Med. 2020;172:35-45.

Annals of Internal Medicine, main text

Annals of Internal Medicine, Explanation and Elaboration

Editorial by Localio et al, 2020