e.w.steyerberg@lumc.nl
Twitter: ESteyerberg
Google scholar
ORCID
The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement outlines principles, criteria, and key considerations for applying predictive approaches to clinical trials to provide patientcentered evidence in support of decision making. The focus of PATH is on modeling of “heterogeneity of treatment effect” (HTE), which refers to the nonrandom variation in the magnitude of the absolute treatment effect (‘treatment benefit’) across individual patients. A more focused definition is that HTE refers to variation of treatment effect on a scale for which it is possible that no such variation exists, even if the treatment has a nonzero effect on the average.
The recent PATH statement lists a number of principles and guidelines. A key principle is in Fig 2:
“A riskmodeling approach to RCT analysis is likely to be most valuable when an overall treatment effect is well established; subgroup results (including riskbased subgroup results) from overall null trials should be interpreted cautiously.”
Here I discuss how we establish ‘overall treatment effect’. I reiterate some findings and statements from classic papers in favor of covariate adjustment as the key analysis.
Illustration in the GUSTOI trial
For illustration we may analyze 30,510 patients with an acute myocardial infarction as included in the GUSTOI trial. This illustration starts as the blog by Frank Harrell on examining HTE.
load(url('http://hbiostat.org/data/gusto.rda'))
# keep only SK and tPA arms; and selected set of covariates
gusto < upData(gusto, subset=tx %in% c('SK', 'tPA'),
tx=droplevels(tx),
keep=Cs(day30, tx, age, Killip, sysbp, pulse, pmi, miloc, sex))
Input object size: 5241552 bytes; 29 variables 40830 observations
Modified variable tx
Kept variables day30,tx,age,Killip,sysbp,pulse,pmi,miloc,sex
New object size: 1349744 bytes; 9 variables 30510 observations
html(describe(gusto), scroll=TRUE)
9 Variables 30510 Observations
day30
n  missing  distinct  Info  Sum  Mean  Gmd 

30510  0  2  0.195  2128  0.06975  0.1298 
sex: Sex
n  missing  distinct 

30510  0  2 
Value male female Frequency 22795 7715 Proportion 0.747 0.253
Killip: Killip Class
n  missing  distinct 

30510  0  4 
Value I II III IV Frequency 26007 3857 417 229 Proportion 0.852 0.126 0.014 0.008
age
n  missing  distinct  Info  Mean  Gmd  .05  .10  .25  .50  .75  .90  .95 

30510  0  5342  1  60.91  13.58  40.92  44.73  52.11  61.58  69.84  76.19  79.42 
pulse: Heart Rate beats/min
n  missing  distinct  Info  Mean  Gmd  .05  .10  .25  .50  .75  .90  .95 

30510  0  157  0.999  75.38  19.5  50  55  62  73  86  98  107 
sysbp: Systolic Blood Pressure mmHg
n  missing  distinct  Info  Mean  Gmd  .05  .10  .25  .50  .75  .90  .95 

30510  0  196  0.999  129  26.58  92.0  100.0  112.0  129.5  144.0  160.0  170.0 
miloc: MI Location
n  missing  distinct 

30510  0  3 
Value Inferior Other Anterior Frequency 17582 1062 11866 Proportion 0.576 0.035 0.389
pmi: Previous MI
n  missing  distinct 

30510  0  2 
Value no yes Frequency 25452 5058 Proportion 0.834 0.166
tx
n  missing  distinct 

30510  0  2 
Value SK tPA Frequency 20162 10348 Proportion 0.661 0.339
Overall treatment effect
The simplest analysis of treatment effect is by performing an intentiontotreat analysis of the randomized patients for the primary outcome (30day mortality). In GUSTOI, 10,348 patients were randomized to receive tPA; 20,162 to SK and had 30day mortality status known. The 30day mortality was ^{653}⁄_{10},348 = 6.3% vs ^{1475}⁄_{20},162 = 7.3%; an absolute difference of 1.0%, or an odds ratio of 0.85 [0.780.94].
# simple crosstable
table1(~ as.factor(day30)  tx, data=gusto, digits=2)
SK (N=20162) 
tPA (N=10348) 
Overall (N=30510) 


as.factor(day30)  
0  18687 (92.7%)  9695 (93.7%)  28382 (93.0%) 
1  1475 (7.3%)  653 (6.3%)  2128 (7.0%) 
tab2 < table(gusto$day30, gusto$tx)
result < OddsRatio(tab2, conf.level = 0.95)
names(result) < c("Odds Ratio", "Lower CI", "Upper CI")
kable(as.data.frame(t(result))) %>% kable_styling(full_width=F, position = "left")
Odds Ratio  Lower CI  Upper CI 

0.853  0.776  0.939 
# BinomDiffCI(x1 = events1, n1 = n1, x2 = events2, n2 = n2, ...)
CI < BinomDiffCI(x1 = tab2[2,1], n1 = sum(tab2[,1]), x2 = tab2[2,2], n2 = sum(tab2[,2]),
method = "scorecc")
colnames(CI) < c("Absolute difference", "Lower CI", "Upper CI")
result < round(CI, 3) # absolute difference with confidence interval
kable(as.data.frame(result)) %>% kable_styling(full_width=F, position = "left")
Absolute difference  Lower CI  Upper CI 

0.01  0.004  0.016 
Adjustment for baseline covariates
The unadjusted odds ratio of 0.853 is a marginal estimate, while a lot can be said in favor of conditional estimates, where we adjust for prognostically important baseline characteristics.
There may be 3 compelling arguments in favor of conditioning on baseline covariates when we consider binary outcomes.
 Interpretation
 Statistical power
 Correction for baseline imbalance
Support from literature
Let’s look at some supportive points from references on these arguments.
 Interpretation: Hauck et al, 1998, provide strong support.
Abstract: “The analyses of the primary objectives of randomized clinical trials often are not adjusted for covariates, except possibly for stratification variables. For analyses with linear models, adjustment is a precision issue only. … For nonlinear analyses, omitting covariates from the analysis of randomized trials leads to a loss of efficiency as well as a change in the treatment effect being estimated. We recommend that the primary analyses adjust for important prognostic covariates in order to come as close as possible to the clinically most relevant subjectspecific measure of treatment effect. Additional benefits would be an increase in efficiency of tests for no treatment effect and improved external validity.”
Controlled Clin Trials 1998;19:249–256.
 So, these authors emphasize argument 1 (“to come as close as possible to the clinically most relevant subjectspecific measure of treatment effect”), and argument 2 (“increase in efficiency of tests for no treatment effect”); while also recognizing a remarkable issue in nonlinear models (“a change in the treatment effect being estimated”). This change is different from linear models, where the adjusted and unadjusted effects are on expectation equal. In nonlinear models such as the logistic regression model, effect estimates are noncollapsible.
Statistical power: Robinson & Jewell 1991 provide a fascinating paper on the impact of covariate adjustment in nonlinear models, such as the logistic regression model: the precision of the estimated treatment effect is worse than without adjustment, while conditioning makes that the expected effect is further from Null. Which impact is stronger? They show that efficiency is expected to increase (provided that the covariate is prognostic for the outcome).
Correction for baseline imbalance In RCTs, imbalance will arise by pure chance. It may hamper the interpretation of a treatment effect in a specific RCT if one group has a better prognosis according to baseline characteristics than another.
Of course, we can only adjust for observed baseline characteristics. We argued in a 2000 AHJ paper that potential imbalances on other, unobserved patient characteristics do not invalidate attempts to correct for observed covariates.
Practice in medical research
The statistical model for covariate adjustment can be simple or more complex. In various reviews researchers have noted that typically 5 to 10 baseline covariates are considered.
Poor practice was noted for papers published in 2007. Pocock et al, Lancet 2000 note:
FINDINGS: Most trials presented baseline comparability in a table. These tables were often unduly large, and about half the trials inappropriately used significance tests for baseline comparison. Methods of randomisation, including possible stratification, were often poorly described. There was little consistency over whether to use covariate adjustment and the criteria for selecting baseline factors for which to adjust were often unclear. Most trials emphasised the simple unadjusted results and covariate adjustment usually made negligible difference. Twothirds of the reports presented subgroup findings, but mostly without appropriate statistical tests for interaction. Many reports put too much emphasis on subgroup analyses that commonly lacked statistical power.
INTERPRETATION: Clinical trials need a predefined statistical analysis plan for uses of baseline data, especially covariateadjusted analyses and subgroup analyses. Investigators and journals need to adopt improved standards of statistical reporting, and exercise caution when drawing conclusions from subgroup findings.
More recent papers show that covariateadjusted analyses are far more common:
Trials published in 2014 … reported adjusted analyses in 87% with prespecified adjustment in analyses in 95% …
Importantly, EMA guidance is available on how to do such analyses:
6.2. Number of covariates in the analysis
No more than a few covariates should be included in the primary analysis. Even though methods of adjustment, such as analysis of covariance, can theoretically adjust for a large number of covariates it is safer to prespecify a simple model.
Illustration in GUSTOI: adjust for age
A simple illustration is to examine the impact of age (which is a strong prognostic factor in many diseases) for adjustment of the primary treatment effect in GUSTOI.
# Analyses
options(prType='html')
f0 < lrm(day30 ~ tx, data=gusto)
print(f0) # coef tpa: 0.1586
Logistic Regression Model
lrm(formula = day30 ~ tx, data = gusto)
Model Likelihood Ratio Test 
Discrimination Indexes 
Rank Discrim. Indexes 


Obs 30510  LR χ^{2} 10.82  R^{2} 0.001  C 0.517 
0 28382  d.f. 1  g 0.071  D_{xy} 0.035 
1 2128  Pr(>χ^{2}) 0.0010  g_{r} 1.074  γ 0.079 
max ∂log L/∂β 3×10^{8}  g_{p} 0.005  τ_{a} 0.005  
Brier 0.065 
β  S.E.  Wald Z  Pr(>Z)  

Intercept  2.5392  0.0270  93.88  <0.0001 
tx=tPA  0.1586  0.0486  3.26  0.0011 
Let’s continue with age adjustment for the tpa effect.
1. How different was the mean age between randomized groups?
2. How much of the difference between adjusted and unadjusted effect estimate can be attributed to this imbalance?
# Examine impact of age
table1(~ age  tx, data=gusto, digits=4) # age 61.03 in tpa vs 60.86 in SK group, delta: 0.17 years
SK (N=20162) 
tPA (N=10348) 
Overall (N=30510) 


age  
Mean (SD)  60.86 (11.87)  61.03 (11.97)  60.91 (11.90) 
Median [Min, Max]  61.58 [19.03, 110.0]  61.57 [20.78, 108.0]  61.58 [19.03, 110.0] 
f.age < lrm(day30 ~ tx + age, data=gusto)
print(f.age)
Logistic Regression Model
lrm(formula = day30 ~ tx + age, data = gusto)
Model Likelihood Ratio Test 
Discrimination Indexes 
Rank Discrim. Indexes 


Obs 30510  LR χ^{2} 1506.21  R^{2} 0.121  C 0.741 
0 28382  d.f. 2  g 1.118  D_{xy} 0.482 
1 2128  Pr(>χ^{2}) <0.0001  g_{r} 3.060  γ 0.483 
max ∂log L/∂β 2×10^{7}  g_{p} 0.062  τ_{a} 0.063  
Brier 0.061 
β  S.E.  Wald Z  Pr(>Z)  

Intercept  7.9008  0.1634  48.34  <0.0001 
tx=tPA  0.1878  0.0500  3.75  0.0002 
age  0.0821  0.0023  35.23  <0.0001 
# Difference in tx effect by adjustment
d.tx.age < f.age$coefficients[2]  f0$coefficients[2]
# Impact of age difference on tx effect
d.age < with(gusto, mean(age[tx=='SK'])  mean(age[tx=='tPA']))
d.tx.ageimpact < f.age$coefficients[3] * d.age
# Impact of stratification on age
# d.tx.age # 0.0291 stronger effect
# d.tx.ageimpact # 0.0138 because of prognostic difference: age difference between randomized groups
# d.tx.age  d.age # 0.0154 attributable to conditioning on age: stratification effect, noncollapsibility
f.unadjusted < c("coef"=as.vector(f0$coefficients[2]) , SE=sqrt(f0$var[2,2]), d.coef=NA, d.SE=NA,
"Imbalance (%)"=NA, "Stratification (%)"=NA)
f.age.adj < c("coef"=as.vector(f.age$coefficients[2]) , SE=sqrt(f.age$var[2,2]),
d.coef=f.age$coefficients[2] / f0$coefficients[2]  1,
d.SE=sqrt(f.age$var[2,2]) / sqrt(f0$var[2,2])  1 ,
"Imbalance (%)"= d.tx.ageimpact / f0$coefficients[2],
"Stratification (%)"=(d.tx.age  d.tx.ageimpact)/ f0$coefficients[2] )
kable(as.data.frame(rbind(f.unadjusted, f.age.adj)), digits=3) %>%
kable_styling(full_width=F, position = "left")
coef  SE  d.coef  d.SE  Imbalance (%)  Stratification (%)  

f.unadjusted  0.159  0.049  NA  NA  NA  NA 
f.age.adj  0.188  0.050  0.184  0.028  0.087  0.097 
# As Table III in Steyerberg 2000 paper
Summary of impact of adjustment on coefficient and SE
Coefficient behavior: the unadjusted coeffient was 0.159; adjusted for age it is 0.188. This is a difference of 0.029, or +18% in estimate of the treatment effect Steyerberg, Bossuyt, Lee; AHJ 2000.
Part of this change is attributable to a difference in age at baseline: the tPA group was slightly disadvantaged by a higher age (61.03 years) compared to the SK group (60.86 years). The difference of 0.168 years accounts for a change of 0.014 in the treatment effect estimate:
d.age (in years) x f.age$coef[3] =
0.168 x 0.082 = 0.014.
The remaining difference is:
delta coefficient  delta attributable to age imbalance =
d.tx.age  d.tx.ageimpact =
0.029  0.014 = 0.015.
So, the +18% more extreme effect estimate can be attributed for 8.7% to imbalance, and 9.7% to using a conditional rather than an unconditional model: stratification, or noncollapsibility (see also Gail et al, 1984).
Conclusions
The GUSTOI serves well to illustrate the impact of conditioning on baseline covariates when we consider binary outcomes. The ageadjusted estimate of the overall treatment effect has a different interpretation than the unadjusted estimate: the effect for ‘Patients with acute MI’ versus ‘A patient with an acute MI of a certain age’. The statistical power for testing of the adjusted effect is higher than that of the unadjusted effect. The required sample size is reduced by a factor of approximately (1 – R^2^). For age, the Nagelkerke R^2^ was 12%. This implies that an analysis of the ageadjusted treatment effect with 88% of the sample size would have the same power as an unadjusted analysis in 100% of the sample. Finally, the ageadjusted treatment corrected for baseline imbalance.
Implications for estimating heterogeneity of treatment effect
The unadjusted and adjusted estimate of the overall treatment effect discussed above are effect estimates on a relative scale: odds ratios on the odds scale. The translation from relative to absolute scale can be made by explicitly considering the baseline risk. If the baseline risk is low, the treatment benefit can only be small; If the baseline risk is high, the treatment benefit can be large.
The recent Predictive Approaches to Treatment Effect Heterogeneity (PATH) Statement provides guidance on predictive approaches to heterogeneous treatment effects. The practical implementation of principles from the PATH statement is discussed in another blog.
References
Classics
MH Gail, S Wieand, S Piantadosi  Biometrika, 1984
Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates
SJ Senn  Statistics in medicine, 1989
Covariate imbalance and random allocation in clinical trials
LD Robinson, NP Jewell  International Statistical Review, 1991 Some surprising results about covariate adjustment in logistic regression models
WW Hauck, S Anderson, SM Marcus  Controlled clinical trials, 1998
Should we adjust for covariates in nonlinear regression analyses of randomized trials?
SJ Pocock, SE Assmann, LE Enos…  Statistics in Medicine, 2002
Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems
Illustrations focused on neurotrauma RCTs
AV Hernández, EW Steyerberg, GS Taylor…  Neurosurgery, 2005
Subgroup analysis and covariate adjustment in randomized clinical trials of traumatic brain injury: a systematic review
AV Hernández, EW Steyerberg, I Butcher…  Journal of Neurotrauma, 2006
Adjustment for strong predictors of outcome in traumatic brain injury trials: 25% reduction in sample size requirements in the IMPACT study
P Perel,…, EW Steyerberg, CRASH Trial Collaborators  Journal of clinical epidemiology, 2012
Covariate adjustment increased power in randomized controlled trials: an example in traumatic brain injury
GUSTOI references
EW Steyerberg, PMM Bossuyt, KL Lee  American heart journal, 2000
Clinical trials in acute myocardial infarction: should we adjust for baseline characteristics?
Gusto Investigators  New England Journal of Medicine, 1993
An international randomized trial comparing four thrombolytic strategies for acute myocardial infarction
Califf R, …, ML Simoons, EJ Topol, GUSTOI Investigators  American heart journal, 1997 Selection of thrombolytic therapy for individual patients: development of a clinical model
Other references
EMA 2015: Guideline on adjustment for baseline covariates in clinical trials
BC Kahan, V Jairath, CJ Doré, TP Morris  Trials, 2014 The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies
AV Hernández, MJC Eijkemans, EW Steyerberg  Annals of epidemiology, 2006
Randomized controlled trials with timetoevent outcomes: how much does prespecified covariate adjustment increase power?
AV Hernández, EW Steyerberg…  Journal of clinical epidemiology, 2004
Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements
DD Thompson, HF Lingsma, WN Whiteley, GD Murray, EW Steyerberg  Journal of clinical epidemiology, 2015
Covariate adjustment had similar benefits in small and large randomized controlled trials
PATH Statement references
The Predictive Approaches to Treatment effect Heterogeneity
(PATH) Statement
David M. Kent, MD, MS; Jessica K. Paulus, ScD; David van Klaveren, PhD; Ralph D’Agostino, PhD;
Steve Goodman, MD, MHS, PhD; Rodney Hayward, MD; John P.A. Ioannidis, MD, DSc; Bray PatrickLake, MFS; Sally Morton, PhD;
Michael Pencina, PhD; Gowri Raman, MBBS, MS; Joseph S. Ross, MD, MHS; Harry P. Selker, MD, MSPH; Ravi Varadhan, PhD;
Andrew Vickers, PhD; John B. Wong, MD; and Ewout W. Steyerberg, PhD
Ann Intern Med. 2020;172:3545.
Annals of Internal Medicine, main text