If You Like the Wilcoxon Test You Must Like the Proportional Odds Model

Clearly, the dependence of the proportional odds model on the assumption of proportionality can be over-stressed. Suppose that two different statisticians would cut the same three-point scale at different cut points. It is hard to see how anybody who could accept either dichotomy could object to the compromise answer produced by the proportional odds model. — Stephen Senn


The Wilcoxon-Mann-Whitney two-sample rank-sum test is a special case of the proportional odds (PO) ordinal logistic regression model. The numerator of the PO model score \(\chi^2\) test for comparing two groups without covariate adjustment is exactly the Wilcoxon statistic. The equivalence of the PO model and the Wilcoxon test in this simple two-group setting is perhaps demonstrated more compellingly by showing how the Wilcoxon test statistic may be accurately approximated by a simple function of the odds ratio (OR) estimate from the PO model, even when PO is strongly violated.

In Violation of Proportional Odds is Not Fatal I used simulation to derive an accurate approximation to the Wilcoxon statistic from the PO model group 2 : group 1 odds ratio estimate (OR). When the Wilcoxon statistic is re-scaled to have a 0-1 range, i.e., to a concordance probability \(c\), the approximation is \[c = \frac{\mathrm{OR}^{0.66}}{1 + \mathrm{OR}^{0.66}}\]

Over a wide variety of simulated datasets, this approximation has a mean absolute error of 0.0021.

For \(n\) overall observations, \(n_1\) in group 1 and \(n_2\) in group 2, let \(R\) denote the vector of ranks of all the observations ignoring group membership. In case of ties, midranks are used. The Wilcoxon rank-sum statistic \(W\) is based on the sum of ranks in group 2. Let \(X_{i}\) be [group = 2], the 1/0 indicator of being in group 2 for the \(i^{\mathrm{th}}\) observation.

\[W = \sum_{i=1}^{n} X_{i} R_{i} - \frac{n_{2} (n_{2} + 1)}{2}\]

Letting \(\bar{R}\) denote the mean of the group 2 ranks (\(\frac{1}{n_{2}} \sum_{i=1}^{n} X_{i}R_{i})\), the Wilcoxon statistic is proportional to

\[c = \frac{\bar{R} - \frac{n_{2} + 1}{2}}{n_{1}}\] where \(c\) is the c-index or concordance probability. It is the proportion of all possible pairs of observations, one from group 2 and one from group 1, such that the the observation from group 2 is the larger of the two. Since midranks are used for ties, \(c\) counts a tied pair as \(\frac{1}{2}\) concordant. So letting \(Y_{1}\) and \(Y_{2}\) represent, respectively, random observations from groups 1 and 2, \(c\) estimates \(\Pr(Y_{2} > Y_{1}) + \frac{1}{2} \Pr(Y_{2} = Y_{1})\).

There is an identity with Somers’ \(D_{yx}\) rank correlation, which is the probability of concordance minus the probability of discordance. \(D_{yx}\) can be also be written as \(D_{yx} = 2 \times (c - \frac{1}{2})\). Note that the R Hmisc package function rcorr.cens used below computes \(D_{xy}\), but \(D_{yx}(X, Y)\) is the same as \(D_{xy}(Y, X)\). \(D_{yx}\) means “discard ties on \(X\), let ties on \(Y\) count against us.” In a two-group comparison we are discarding ties on \(X\), i.e., are not comparing observations from group 1 with other observations in group 1.

Discrete Ordinal Y Example

Let’s go through the calculations and check the Wilcoxon PO OR-based approximation using data in which there is a severe violation of the PO assumption. We have three levels of \(Y\) (0, 1, 2), with the group 2 : group 1 OR for \(Y=2\) being 2.5 but the OR for \(Y \geq 1\) being 0.795. The compromise OR from assuming PO is 1.114.

w <- expand.grid(group=1:2, y=0:2)
n <- c(100, 110, 50, 10, 30, 60)
u <- w[rep(1:6, n),]
with(u, table(group, y))
group   0   1   2
    1 100  50  30
    2 110  10  60
or2 <- exp(coef(lrm(y == 2 ~ group, data=u))['group'])
or1 <- exp(coef(lrm(y >= 1 ~ group, data=u))['group'])
or12 <- exp(coef(lrm(y ~ group, data=u))['group'])
ors <- c(or2, or1, or12)
names(ors) <- c('y=2', 'y>=1', 'y')
      y=2      y>=1         y 
2.5000000 0.7954545 1.1135786 

Now compute the Wilcoxon statistic

wilcox.test(y ~ group, u, correct=FALSE)

    Wilcoxon rank sum test

data:  y by group
W = 15750, p-value = 0.6061
alternative hypothesis: true location shift is not equal to 0
sumr1 <- with(u, sum(rank(y)[group == 1]))
sumr2 <- with(u, sum(rank(y)[group == 2]))
n1 <- sum(u$group == 1)
n2 <- sum(u$group == 2)
# wilcox.test uses sum of ranks in group 1
W <- sumr1 - n1 * (n1 + 1) / 2   # equals wilcox.test
[1] 15750
# Going forward use sum of ranks in group 2
W <- sumr2 - n2 * (n2 + 1) / 2

Compute \(c\) three different ways.

W / (n1 * n2)
[1] 0.5138889
with(u, (mean(rank(y)[group == 2]) - (n2 + 1) / 2) / n1)
[1] 0.5138889
b <- with(u, rcorr.cens(y, group))
       C Index            Dxy           S.D.              n        missing 
  5.138889e-01   2.777778e-02   5.470021e-02   3.600000e+02   0.000000e+00 
    uncensored Relevant Pairs     Concordant      Uncertain 
  3.600000e+02   6.480000e+04   3.330000e+04   0.000000e+00 
conc <- b['C Index']
  C Index 

Now compare the concordance probability with the approximation from the PO-estimated OR:

po <- ors['y']
capprox <- po ^ 0.66 / (1 + po ^ 0.66)

The approximation is off by 0.004.

Check against the original regression equation fitted here:

plogis((log(po) - 0.0003) / 1.5179)

Continuous Y Example

Now consider a two-sample problem with continuous Y. We could induce mild non-PO by sampling from two normal distributions with equal variance and a nonzero difference in means2. But let’s induce major non-PO by also allowing the variance in the two groups to be unequal. Draw a random sample of size 100 from a normal distribution with mean 0 and variance 1 and a second sample from a normal distribution with mean 1 and variance 0.2.

n1 <- n2 <- 100
y1 <- rnorm(n1, 0, 1)
y2 <- rnorm(n1, 1, sqrt(0.2))
group <- c(rep(1, n1), rep(2, n2))
y <- c(y1, y2)
Ecdf(~ y, group=group, fun=qlogis, ylab='logit ECDF')

Serious non-parallelism of the logit of the two empirical cumulative distributions means serious non-proportional odds. Let’s get the \(c\) index (concordance probability) and its approximation from the PO model, as before. We use the rms package orm function which is designed to efficiently analyze continuous Y. Here the model has 199 intercepts since there are no ties in the data. Below I have back-solved for the Wilcoxon test \(\chi^2\) statistic.

conc <- (mean(rank(y)[group == 2]) - (n2 + 1) / 2) / n1
w <- wilcox.test(y ~ group)
wchisq <- qchisq(1 - w$p.value, 1)

[1] 59.67072

f <- orm(y ~ group)

Logistic (Proportional Odds) Ordinal Regression Model

 orm(formula = y ~ group)
Model Likelihood
Ratio Test
Rank Discrim.
Obs 200LR χ2 69.08R2 0.292ρ 0.548
Distinct Y 200d.f. 1g 1.146
Y0.5 0.7002136Pr(>χ2) <0.0001gr 3.144
max |∂log L/∂β| 6×10-7Score χ2 66.46|Pr(Y ≥ median)-½| 0.257
Pr(>χ2) <0.0001
βS.E.Wald ZPr(>|Z|)
group 2.2796 0.29227.80<0.0001
or <- exp(coef(f)['group'])
capprox <- or ^ 0.66 / (1 + or ^ 0.66)
z <- c(conc, capprox, or)
names(z) <- c('c-index', 'c-approx', 'OR')
  c-index  c-approx        OR 
0.8162000 0.8182461 9.7723775 

The approximation is off by 0.002.

Computing Environment

 R version 4.0.4 (2021-02-15)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Pop!_OS 20.10
 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base     
 other attached packages:
 [1] rms_6.1-2       SparseM_1.78    Hmisc_4.6-0     ggplot2_3.3.2  
 [5] Formula_1.2-4   survival_3.2-7  lattice_0.20-41
To cite R in publications use:

R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

To cite the Hmisc package in publications use:

Harrell F (2021). Hmisc: Harrell Miscellaneous. R package version 4.6-0, https://hbiostat.org/R/Hmisc/.

To cite the rms package in publications use:

Harrell Jr FE (2021). rms: Regression Modeling Strategies. https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms.

  1. Note that \(0.66 = \frac{1}{1.52}\) from the blog article.↩︎

  2. Note that PO would hold if one simulated from a logistic distribution with a shift in location only.↩︎

Frank Harrell
Frank Harrell
Professor of Biostatistics

My research interests include Bayesian statistics, predictive modeling and model validation, statistical computing and graphics, biomedical research, clinical trials, health services research, cardiology, and COVID-19 therapeutics.

comments powered by Disqus