Background
As explained here, the power for a group comparison can be greatly increased over that provided by a binary endpoint, with greater increase when an ordinal endpoint has several wellpopulated categories or has a great many categories, in which it becomes a standard continuous variable. When a randomized clinical trial (RCT) is undertaken and deaths can occur, there are disadvantages to
 excluding the death and analyzing responses only on survivors
 using death as a competing risk, which makes for hardtointerpret results and doesn’t penalize efficacy for death
 using a complex estimand that involves counterfactuals or other complexities
By making death the worst level of an ordinal response , nothing is swept under the rug, and a treatment having more deaths is penalized for that. Evidence for treatment effectiveness may be driven by the nonfatal outcomes. Suppose for example that is renal function at 6 weeks, measured by serum creatinine, with death coded as a value higher than the highest observed creatinine (it doesn’t matter how high for ordinal analyses). Evidence for treatment effectiveness in improving may be stated as “the treatment improved renal function accounting for death”.
Often sponsors want evidence for a specific effect on mortality, even though they are unwilling to budget for a study large enough to provide evidence for a mortality benefit on its own. In that case, the only way to have Bayesian or frequentist power to detect a mortality improvement is to assume that some of the treatment benefit on nonfatal outcome components spills over to mortality. The partial proportional odds (PO) semiparametric ordinal logistic regression model by Peterson & Harrell, 1990 when coupled with a Bayesian implementation of the model provides a very formal way to borrow treatment effect information across levels of Y.
Suppose that for simplicity we ignore powerenhancing baseline covariates, and have an outcome variable where represents death. The PO model can be written as
where , (inverse logit), is the intercept corresponding to a cutoff of (), for treatment B and for treatment A, and is the B:A log odds ratio. Hence is the B:A odds ratio (OR). Under the PO assumption the possible B:A ORs for are the same for all . For example the treatment effect on death is , just like the treatment effect on the last three categories combined, for example if .
Peterson and Harrell proposed the partial PO model and the constrained partial PO model. Using the latter we allow for a special effect of treatment only on Y= and assume a constant OR for all other cutoffs, for example. This constrained partial PO model is
where is an indicator variable that is if , otherwise. represents a “special effect of treatment B” for . So the B:A odds ratio for is .
If the model is fitted using a frequentist maximum likelihood approach, or using a Bayesian procedure that puts a noninformative prior on , the precision of (or its antilog, the OR for mortality) will come from the effective sample size for a pure death outcome.
Example Partial Proportional Odds Analysis
Suppose that we have a parallelgroup twotreatment randomized trial, and are in the bizarre situation where there are no patient risk factors, i.e., patient outcomes are homogeneous within treatment so that no covariates are needed. The treatment is designed to keep hospitalized patients from requiring mechanical ventilation, and hopefully also to lower inhospital mortality. Suppose the outcomes are for alive and not on ventilator, alive and on ventilator, or dead, respectively. Suppose the following outcome frequencies and summary statistics arise.
Code
require(Hmisc)
require(rmsb)
require(data.table)
require(ggplot2)
options(prType='html',
datatable.print.topn=50, datatable.print.class=FALSE,
mc.cores=4, rmsb.backend='cmdstan')
cmdstanr::set_cmdstan_path(cmdstan.loc) # cmdstan.loc is in ~/.Rprofile
k < rbind(A = c(300, 70, 30),
B = c(335, 40, 25))
colnames(k) < c('0', '1', '2')
p < k / 400
cs < function(x) rev(cumsum(rev(x)))
cp < rbind(A=cs(p[1,]), B=cs(p[2,]))
or < function(p) {
ob < p[2] / (1  p[2])
oa < p[1] / (1  p[1])
ob / oa
}
OR < c(or(cp[, 2]), or(cp[, 3]))
names(OR) < c('Y>=1', 'Y=2')
rOR < OR[2] / OR[1]
printL(Frequencies=k, Proportions=p, 'Cumulative Proportions'=cp, # printL in Hmisc
'B:A Odds Ratios'=OR, 'Y=2 : Y>=1 Ratio of Odds Ratios'=rOR, dec=4)
Frequencies:
0 1 2
A 300 70 30
B 335 40 25
Proportions:
0 1 2
A 0.7500 0.175 0.0750
B 0.8375 0.100 0.0625
Cumulative Proportions:
0 1 2
A 1 0.2500 0.0750
B 1 0.1625 0.0625
B:A Odds Ratios:
Y>=1 Y=2
0.5821 0.8222
Y=2 : Y>=1 Ratio of Odds Ratios: 1.4125
Create a one row per patient dataset and fit a Bayesian constrained partial PO model using the rmsb
package blrm
function. Start by using noninformative priors for the overall B:A treatment effect and the extra treatment effect on death, .
Code
# Create one row per patient
w < expand.grid(tx=c('A', 'B'), y=0:2)
setDT(w)
d < w[, .(y=rep(y, k[tx, as.character(y)])), by=.(tx, y)]
# Define the form of nonPO: special effect for death
g < function(y) 1 * (y == 2)
# Second formula given to blrm specifies predictors allowed
# to operate in nonPO
set.seed(11)
f < blrm(y ~ tx, ~ tx, cppo=g, data=d)
Running MCMC with 4 parallel chains...
Chain 1 finished in 1.7 seconds.
Chain 2 finished in 1.7 seconds.
Chain 3 finished in 1.7 seconds.
Chain 4 finished in 1.7 seconds.
All 4 chains finished successfully.
Mean chain execution time: 1.7 seconds.
Total execution time: 1.8 seconds.
Code
Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.541 for Intercepts
blrm(formula = y ~ tx, ppo = ~tx, cppo = g, data = d)
Obs 800 
LOO log L 510.82±23.29 
g 0.278 [0.108, 0.455] 
C 0.62 [0.62, 0.62] 
0 635 
LOO IC 1021.64±46.57 
g_{p} 0.045 [0.016, 0.071] 
D_{xy} 0.24 [0.24, 0.24] 
1 110 
Effective p 3.97±0.27 
EV 0.013 [0, 0.027] 

2 55 
B 0.162 [0.162, 0.163] 
v 0.085 [0.001, 0.172] 

Draws 4000 

vp 0.002 [0, 0.005] 

Chains 4 



Time 2.6s 



p 1 



y≥1 
1.0986 
1.0953 
1.0934 
0.1130 
1.3152 
0.8720 
0.0000 
0.93 
y≥2 
2.5123 
2.5209 
2.5153 
0.1863 
2.9037 
2.1708 
0.0000 
0.93 
tx=B 
0.5412 
0.5453 
0.5449 
0.1773 
0.8935 
0.1968 
0.0005 
1.03 
tx=B x f(y) 
0.3454 
0.3362 
0.3375 
0.2355 
0.1232 
0.7955 
0.9232 
0.97 
From the output there is a very low probability that , i.e., a very high probability that , i.e., that treatment B is beneficial.
Compute posterior mode odds ratios from this fit and compare to the observed ORs.
Code
h < function(f) {
k < coef(f, stat='mode')
beta < k['tx=B']
tau < k['tx=B x f(y)']
or < c(exp(beta), exp(beta + tau), exp(tau))
names(or) = c('B:A overall', 'B:A Y=2', 'Y=2:Y>=1')
printL('Posterior mode odds ratios'=or)
}
h(f)
Posterior mode odds ratios:
B:A overall B:A Y=2 Y=2:Y>=1
0.5820643 0.8221778 1.4125205
The Bayesian estimates are virtually the same as the empirical estimates.
Now add informative priors. For the overall treatment effect the prior is chosen so that there is only a 0.025 chance of the OR exceeding 4 and a 0.025 chance of being below . For the departure from PO, represents the log of the ratio of the OR for death to the OR for the general effect of treatment denoted by . The prior of is chosen so that there is a 0.9 chance that is in the interval . So we are assuming it is unlikely that the treatment effect on death differs from the general effect by more than a factor of 2 in either direction.
For the blrm
function priors are stated in terms of contrasts so that convenient user data units can be used.
Code
. < list # convenient abbreviation
# Contrast for beta
sigma < log(4) / qnorm(0.975)
pcon < .(sd=sigma, c1=.(tx='B'), c2=.(tx='A'),
contrast=expression(c1  c2))
npcon < pcon
npcon$sd < log(2) / 1.644854
f < blrm(y ~ tx, ~ tx, cppo=g, data=d,
pcontrast=pcon, npcontrast=npcon)
Running MCMC with 4 parallel chains...
Chain 1 finished in 1.5 seconds.
Chain 3 finished in 1.6 seconds.
Chain 2 finished in 1.7 seconds.
Chain 4 finished in 1.7 seconds.
All 4 chains finished successfully.
Mean chain execution time: 1.6 seconds.
Total execution time: 1.9 seconds.
Code
Bayesian Constrained Partial Proportional Odds Ordinal Logistic Model
Dirichlet Priors With Concentration Parameter 0.541 for Intercepts
blrm(formula = y ~ tx, ppo = ~tx, cppo = g, data = d, pcontrast = pcon,
npcontrast = npcon)
Obs 800 
LOO log L 510.57±23.23 
g 0.256 [0.09, 0.41] 
C 0.62 [0.62, 0.62] 
0 635 
LOO IC 1021.14±46.46 
g_{p} 0.041 [0.014, 0.065] 
D_{xy} 0.24 [0.24, 0.24] 
1 110 
Effective p 3.63±0.25 
EV 0.012 [0, 0.024] 

2 55 
B 0.162 [0.162, 0.163] 
v 0.073 [0.002, 0.149] 

Draws 4000 

vp 0.002 [0, 0.004] 

Chains 4 



Time 2.5s 



p 1 



y≥1 
1.1157 
1.1142 
1.1132 
0.1144 
1.3440 
0.8964 
0.0000 
1.00 
y≥2 
2.4876 
2.4937 
2.4914 
0.1808 
2.8419 
2.1447 
0.0000 
0.98 
tx=B 
0.5028 
0.5034 
0.5050 
0.1693 
0.8466 
0.1830 
0.0018 
0.97 
tx=B x f(y) 
0.2588 
0.2524 
0.2576 
0.2038 
0.1115 
0.6843 
0.8890 
0.91 
Contrasts Given Priors
[1] list(sd = 0.707306038302134, c1 = list(tx = "B"), c2 = list(tx = "A"),
[2] contrast = expression(c1  c2))
Contrasts Given Priors for NonProportional Odds Effects
[1] list(sd = 0.421403468368588, c1 = list(tx = "B"), c2 = list(tx = "A"),
[2] contrast = expression(c1  c2))
Code
Posterior mode odds ratios:
B:A overall B:A Y=2 Y=2:Y>=1
0.6048371 0.7834805 1.2953578
The parameters are attenuated by the skeptical priors.
Compute the probability of an overall effect, a treatment effect on mortality, and the probability that the the differential treatment effect on mortality vs. overall is by more than a factor of 1.2.
Code
# The rmsb function PostF makes it easy to compute posterior probs.
# of any assertions (by computing proportion of posterior draws
# meeting the conditions)
P < PostF(f, pr=TRUE)
Original Name Short Name
y>=1 a1
y>=2 a2
tx=B b1
tx=B x f(y) b2
Code
printL('Pr(beta < 0)' = P(b1 < 0),
'Pr(beta + tau < 0)' = P(b1 + b2 < 0),
'Pr(ratio of ORS > fold change of 1.2)' = P(abs(b2) > log(1.2)), dec=4)
Pr(beta < 0): 0.9982
Pr(beta + tau < 0): 0.8342
Pr(ratio of ORS > fold change of 1.2): 0.6652
We see strong evidence for an overall treatment benefit, moderate evidence for a mortality reduction, and little evidence for a strong nonPO effect, i.e., ratio of ORs or . In other words, there is not much evidence for an inconsistent treatment effect. To be fair, there is not much evidence for a consistent treatment effect either, using a cutoff for of 1.2 or its reciprocal.
Sixlevel Outcome Example and Frequentist Power
Suppose that in the control group we expect the following proportions of levels of .
Code
p < c(.2, .32, .2, .105, .1, 0.075)
names(p) < paste0('Y=', 0:5)
p
Y=0 Y=1 Y=2 Y=3 Y=4 Y=5
0.200 0.320 0.200 0.105 0.100 0.075
So the probability of death in treatment A is 0.075. Say we want to detect a common OR of 0.65 with 0.9 power at with equal sample sizes in the two groups. The needed sample size for the full ordinal outcome is computed below using the R Hmisc
package posamsize
function. For us to have any hope of detecting a mortality difference, set the binary outcome power to 0.5.
This function needs the average probabilities over the two treatments so we have move the probabilities by the square root of the odds ratio to get the averages and still set up for the target OR in the B:A comparison. The pomodm
function in Hmisc
does the needed OR shifting of probabilities.
Code
or < 0.65
pmid < pomodm(p=p, odds.ratio=sqrt(or))
pB < pomodm(p=p, odds.ratio=or)
round(rbind(A=p, B=pB, Average=pmid), 3)
Y=0 Y=1 Y=2 Y=3 Y=4 Y=5
A 0.200 0.320 0.200 0.105 0.100 0.075
B 0.278 0.347 0.173 0.081 0.071 0.050
Average 0.237 0.337 0.188 0.093 0.085 0.061
Code
nord < round(posamsize(p=pmid, odds.ratio=or, power=0.9)$n)
printL('Sample size needed for ordinal Y' = nord)
Sample size needed for ordinal Y: 723
Code
pdeath1 < p[length(p)]
pdeath2 < plogis(qlogis(pdeath1) + log(or))
printL('Probabilities of death for the two treatments' = c(pdeath1, pdeath2), dec=4)
Probabilities of death for the two treatments:
Y=5 Y=5
0.0750 0.0501
Code
nbin < round(sum(bsamsize(pdeath1, pdeath2, power=0.5)))
printL('Sample size needed for binary Y' = nbin)
Sample size needed for binary Y: 1449
Code
n < max(nord, nbin)
# Check OR for mortality
printL('OR for mortality:' = (0.05/0.95)/(0.075/0.925), dec=3)
Select as the study sample size the minimum of the sample sizes needed for the ordinal comparison (at 0.9 power) and for the binary comparison (at 0.5 power). The total sample size needed is =1449.
As a check, compute the power for the standalone mortality comparison using the PO/Wilcoxon test formula.
Code
printL('Binomial test power' = bpower(0.075, odds.ratio=or, n=n), dec=3)
Code
pmA < c(1  p[6], p[6])
pmMid < c(1  pmid[6], pmid[6])
printL('Ordinal test power for binary Y' = popower(p=pmMid, odds.ratio=or, n=n))
Ordinal test power for binary Y:
Power: 0.503
Efficiency of design compared with continuous response: 0.173
Approximate standard error of log odds ratio: 0.2191
More About Bayesian Borrowing
The assumption we’ll make is that there is some similarity of treatment effect on as there is on (death). Similarity will be expressed as how likely it is for the effect on to be very different than the effect on . The similarity of treatment effects is captured by the ratio of two odds ratios—the B:A OR for and the B:A OR for where : . Let’s put a prior on , the log ratio of ORs, such that it is equally likely that the ratio falls is below 1 as it is above 1. Use a Gaussian (normal) prior distribution for the log ratio, , with mean zero to accomplish this. Then select the standard deviation of the normal distribution to control the amount of borrowing of treatment effect information across levels.
Let be the ratio of ORs as before. Skepticism against being far from 1.0 (closeness to 1.0 indicating a belief that the mortality effect is similar to the other effects) can be specified in many ways. One way is by selecting an interval for which one believes there is a chance of 0.9 of being in that interval. Let’s make the tail areas equal so that . If then and if has a normal distribution with mean and standard deviation then in this symmetric case. In general and where , the 0.95 quantile of the standard normal distribution.
For a variety of here are the corresponding prior distribution mean and standard deviations. All but the last has and so has a mean of zero. The last one, , gives only a 0.05 chance for the treatment effect on mortality to be better (smaller OR) than the effect on the other outcome levels.
Code
bv < c(1.1, 1.25, 1.5, 1.75, 2:7, 7)
av < c(1 / bv[1 : 10], 1)
w < data.table(a=av, b=bv, mu=(log(bv) + log(av)) / 2,
sigma=(log(bv)  log(av)) / (2 * 1.644854))
g < function(x) ifelse(x >= 1, x, paste0('1/', 1/x))
u < w[, .(Interval=paste0('[', g(a), ', ', b, ']'),
mu=round(mu, 3), sigma)]
print(u, digits=3)
Interval mu sigma
1: [1/1.1, 1.1] 0.000 0.0579
2: [1/1.25, 1.25] 0.000 0.1357
3: [1/1.5, 1.5] 0.000 0.2465
4: [1/1.75, 1.75] 0.000 0.3402
5: [1/2, 2] 0.000 0.4214
6: [1/3, 3] 0.000 0.6679
7: [1/4, 4] 0.000 0.8428
8: [1/5, 5] 0.000 0.9785
9: [1/6, 6] 0.000 1.0893
10: [1/7, 7] 0.000 1.1830
11: [1, 7] 0.973 0.5915
Here are the prior distributions for for the above .
Code
x < seq(3.5, 3.5, length=300)
f < function(mu, sigma) list(x=x, y=dnorm(x, mu, sigma))
w < u[, f(mu, sigma), by=.(mu, sigma, Interval)]
br < sort(log(c(1/c(1,2,3,4,6,9), 2:9)))
br2 < sort(log(c(1/(1:10), 2:10)))
ggplot(w, aes(x, y, col=Interval)) + geom_line() +
scale_x_continuous(breaks=br, labels=g(exp(br)),
minor_breaks = br2) +
xlab('Ratio of ORs') + ylab('')
Bayesian Power Simulation
Bayesian power is taken to mean the probability that the posterior probability of a benefit on mortality exceeds 0.95. For a total =1449 use the blrm
function to fit the constrained partial PO model with a variety 0.9 intervals for . The intervals are symmetric in . By simulating 1000 upper 0.9 interval values we don’t need to replicate any one of them in the simulations as we can just interpolate/smooth over the iterations to estimate Bayesian power as a function of the 0.9 interval chosen.
The constraint for partial PO will be such that it applies to death (). The prior for the main treatment effect will be left as noninformative, and the prior for the mortalitydifferential treatment effect is specified through a nonPO contrast (argument npcontrast
to blrm
).
In this simulation, the data generating mechanism is such that the treatment effect is constant (OR=0.65) across all levels of outcomes. The analysis of a mortalityspecific effect does not have access to this information.
Code
w < data.table(b = seq(1.01, 4, length=1000)) # using intervals [1/b, b]
w[, sigma := log(b) / 1.644854]
# Function to simulate one sample and compute the posterior probability
# of treatment benefit on mortality
sim1 < function(n, pcontrol, or=0.65, sigma) {
# n: number of pts in each group
# pcontrol: vector of outcome probabilities for control group
# or: odds ratio to detect
i << i + 1
cat(i, '\n', file='/tmp/z')
# Compute outcome probabilities for new treatment
ptr < pomodm(p=pcontrol, odds.ratio=or)
k < length(pcontrol)  1
tx < c(rep(0, n), rep(1, n))
y < c(sample(0 : k, n, replace=TRUE, prob=pcontrol),
sample(0 : k, n, replace=TRUE, prob=ptr) )
# Set prior on the "treatment by Y interaction" contrast
npcon < list(sd=sigma, c1=list(tx=1), c2=list(tx=0),
contrast=expression(c1  c2))
f < blrm(y ~ tx,
y ~ tx, cppo=function(y) y == 5,
npcontrast=npcon,
loo=FALSE, method='sampling', iter=1000)
beta < f$draws[, 'tx']
tau < f$draws[, 'tx x f(y)']
list(p1=mean(beta < 0), p2=mean(beta + tau < 0), p3=mean(abs(tau) > 0.2))
}
g < function() {
set.seed(7)
R < copy(w)
R[, sim1(round(n / 2), p, sigma=sigma), by=.(sigma, b)]
}
i < 0
R < runifChanged(g, w, n, file='sim.rds')
Look at the distribution of the three posterior probabilities.
Code
u < meltData(p1 + p2 + p3 ~ b, tall='left', data=R)
lb < c(p1='beta < 0', p2='beta + tau < 0', p3='abs(tau) > 0.2')
u[, variable := lb[variable]]
cnt < c(1:5, 10, 25, 50)
ggplot(u, aes(x=b, y=value)) +
stat_binhex(aes(fill=cut2(after_stat(count), cnt)), bins=75) +
facet_wrap(~ variable, nrow=3, label='label_parsed') +
guides(fill=guide_legend(title='Frequency')) +
xlab('0.9 Interval Upper Limit b') +
ylab('Posterior Probability')
Show Bayesian power vs. , which summarizes the amount of endpoint borrowing. Powers of three assessments are shown:
 treatment effect on ordinal outcome: (top curves)
 treatment effect on mortality: (middle curves)
 strength of nonPO: (bottom curves)
Use binary logistic regression and overlapping moving window proportions to interpolate/smooth the results.
Code
v < u[, hit := 1 * (value > 0.95), by=.(variable, b)]
m < movStats(hit ~ b + variable,
eps=100, lrm=TRUE, lrm_args=list(maxit=30, tol=1e13),
nignore=0, data=v, melt=TRUE)
ggplot(m,
aes(x=b, y=hit, col=variable, linetype=Type)) + geom_line() +
geom_hline(yintercept=0.8, alpha=0.3) +
scale_x_continuous(breaks=seq(1, 7, by=1),
minor_breaks=seq(1, 7, by=.25)) +
scale_y_continuous(breaks=seq(0, 1, by=0.1),
minor_breaks=seq(0, 1, by=0.05)) +
scale_colour_discrete(labels = scales::parse_format()) +
guides(color=guide_legend(title='')) +
ylab('Bayesian Power')
Code
# Find b such that power is 0.8
b8 < m[variable=='beta + tau < 0' & Type=='LR',
approx(hit, b, xout=0.8)$y]
b8 < round(c(1 / b8, b8), 3)
printL('0.9 Interval for r yielding 0.8 power for mortality assessment' = b8)
0.9 Interval for r yielding 0.8 power for mortality assessment: 0.416, 2.403
The posterior probability that hovered around zero, which is good since data were generated assuming PO. Bayesian power for the overall treatment effect is nearly 1.0. The minimum value of required to achieve 0.8 Bayesian power is 2.403.
Our example considers death as a binary event. Analyzing it instead as timetoevent in its own model will allow for variable followup time and will increase power if deaths are common. If the incidence of death is low as in our simulation design, little power is gained from utilizing event times.
A Markov longitudinal ordinal model can jointly analyze ordinal responses as well as time to event while allowing for variable followup.
Summary
The partial proportional odds model provides a formal way to quantify evidence for a treatment effect on a given interval of an ordinal outcome. When one of the outcomes is very important but relatively rare, it is difficult to size studies to have power to show a treatment effect on that outcome without borrowing some information from the treatment effect on the outcomes that have a higher effective sample size. The question is how much borrowing is clinically acceptable. The above simulation example shows how much borrowing is required for power for a treatment comparison on the highest outcome level to be adequate. Researchers can run similar simulations for their situations, and judge whether the amount of borrowing needed for adequate Bayesian power is justifiable. In the simulation, the required amount of borrowing to achieve at least 0.8 Bayesian power for mortality assessment involves assuming that the differential treatment effect on mortality (ratio of two odds ratios) is in the interval with 0.9 probability, or is in a narrower interval.
A separate simulation was done using the sample size. The amount of borrowing needed to achieve 0.8 Bayesian power for a mortality assessment was . A good rule of thumb may be to expect to need a power of a standalone mortality assessment to be at least 0.5 for the Bayesian power to be sufficient with a reasonable amount of borrowing.
When a study doesn’t have a good chance of demonstrating a mortality difference with little or no borrowing of information across levels of , it may be important to clinically prespecify the prior for the inconsistency of mortality and nonfatal effects before the study, and to stick with that for computing the posterior probability of a mortality benefit.
It is important to note that because the PO model is always consistent with a Wilcoxon test in terms of the direction of a treatment effect (and whether the effect is zero), the more powerful PO model that does not include the partial PO parameter for a special effect of treatment on mortality will have strong ability to discern which treatment is better overall, penalizing for death, if there is clinical consensus on the outcome rankings. This is despite the problem caused by strong violation of PO for the treatment variable: the PO model may give the wrong assessment of the isolated effect of treatment on mortality.
]]>