# If You Like the Wilcoxon Test You Must Like the Proportional Odds Model

Clearly, the dependence of the proportional odds model on the assumption of proportionality can be over-stressed. Suppose that two different statisticians would cut the same three-point scale at different cut points. It is hard to see how anybody who could accept either dichotomy could object to the compromise answer produced by the proportional odds model. — Stephen Senn

# Background

The Wilcoxon-Mann-Whitney two-sample rank-sum test is a special case of the proportional odds (PO) ordinal logistic regression model. The numerator of the PO model score \(\chi^2\) test for comparing two groups without covariate adjustment is exactly the Wilcoxon statistic. The equivalence of the PO model and the Wilcoxon test in this simple two-group setting is perhaps demonstrated more compellingly by showing how the Wilcoxon test statistic may be accurately approximated by a simple function of the odds ratio (OR) estimate from the PO model, even when PO is strongly violated.

In Violation of Proportional Odds is Not Fatal I used simulation to derive an accurate approximation to the Wilcoxon statistic from the PO model group 2 : group 1 odds ratio estimate (OR). When the Wilcoxon statistic is re-scaled to have a 0-1 range, i.e., to a *concordance probability* \(c\), the approximation is \[c = \frac{\mathrm{OR}^{0.66}}{1 + \mathrm{OR}^{0.66}}\]

Over a wide variety of simulated datasets, this approximation has a mean absolute error of 0.002^{1}.

For \(n\) overall observations, \(n_1\) in group 1 and \(n_2\) in group 2, let \(R\) denote the vector of ranks of all the observations ignoring group membership. In case of ties, midranks are used. The Wilcoxon rank-sum statistic \(W\) is based on the sum of ranks in group 2. Let \(X_{i}\) be [group = 2], the 1/0 indicator of being in group 2 for the \(i^{\mathrm{th}}\) observation.

\[W = \sum_{i=1}^{n} X_{i} R_{i} - \frac{n_{2} (n_{2} + 1)}{2}\]

Letting \(\bar{R}\) denote the mean of the group 2 ranks (\(\frac{1}{n_{2}} \sum_{i=1}^{n} X_{i}R_{i})\), the Wilcoxon statistic is proportional to

\[c = \frac{\bar{R} - \frac{n_{2} + 1}{2}}{n_{1}}\] where \(c\) is the c-index or concordance probability. It is the proportion of all possible pairs of observations, one from group 2 and one from group 1, such that the the observation from group 2 is the larger of the two. Since midranks are used for ties, \(c\) counts a tied pair as \(\frac{1}{2}\) concordant. So letting \(Y_{1}\) and \(Y_{2}\) represent, respectively, random observations from groups 1 and 2, \(c\) estimates \(\Pr(Y_{2} > Y_{1}) + \frac{1}{2} \Pr(Y_{2} = Y_{1})\).

There is an identity with Somers’ \(D_{yx}\) rank correlation, which is the probability of concordance minus the probability of discordance. \(D_{yx}\) can be also be written as \(D_{yx} = 2 \times (c - \frac{1}{2})\). Note that the R `Hmisc`

package function `rcorr.cens`

used below computes \(D_{xy}\), but \(D_{yx}(X, Y)\) is the same as \(D_{xy}(Y, X)\). \(D_{yx}\) means “discard ties on \(X\), let ties on \(Y\) count against us.” In a two-group comparison we are discarding ties on \(X\), i.e., are not comparing observations from group 1 with other observations in group 1.

# Discrete Ordinal Y Example

Let’s go through the calculations and check the Wilcoxon PO OR-based approximation using data in which there is a severe violation of the PO assumption. We have three levels of \(Y\) (0, 1, 2), with the group 2 : group 1 OR for \(Y=2\) being 2.5 but the OR for \(Y \geq 1\) being 0.795. The compromise OR from assuming PO is 1.114.

```
w <- expand.grid(group=1:2, y=0:2)
n <- c(100, 110, 50, 10, 30, 60)
u <- w[rep(1:6, n),]
with(u, table(group, y))
```

```
y
group 0 1 2
1 100 50 30
2 110 10 60
```

```
or2 <- exp(coef(lrm(y == 2 ~ group, data=u))['group'])
or1 <- exp(coef(lrm(y >= 1 ~ group, data=u))['group'])
or12 <- exp(coef(lrm(y ~ group, data=u))['group'])
ors <- c(or2, or1, or12)
names(ors) <- c('y=2', 'y>=1', 'y')
ors
```

```
y=2 y>=1 y
2.5000000 0.7954545 1.1135786
```

Now compute the Wilcoxon statistic

`wilcox.test(y ~ group, u, correct=FALSE)`

```
Wilcoxon rank sum test
data: y by group
W = 15750, p-value = 0.6061
alternative hypothesis: true location shift is not equal to 0
```

```
sumr1 <- with(u, sum(rank(y)[group == 1]))
sumr2 <- with(u, sum(rank(y)[group == 2]))
n1 <- sum(u$group == 1)
n2 <- sum(u$group == 2)
# wilcox.test uses sum of ranks in group 1
W <- sumr1 - n1 * (n1 + 1) / 2 # equals wilcox.test
W
```

`[1] 15750`

```
# Going forward use sum of ranks in group 2
W <- sumr2 - n2 * (n2 + 1) / 2
```

Compute \(c\) three different ways.

`W / (n1 * n2)`

`[1] 0.5138889`

`with(u, (mean(rank(y)[group == 2]) - (n2 + 1) / 2) / n1)`

`[1] 0.5138889`

```
b <- with(u, rcorr.cens(y, group))
b
```

```
C Index Dxy S.D. n missing
5.138889e-01 2.777778e-02 5.470021e-02 3.600000e+02 0.000000e+00
uncensored Relevant Pairs Concordant Uncertain
3.600000e+02 6.480000e+04 3.330000e+04 0.000000e+00
```

```
conc <- b['C Index']
conc
```

```
C Index
0.5138889
```

Now compare the concordance probability with the approximation from the PO-estimated OR:

```
po <- ors['y']
capprox <- po ^ 0.66 / (1 + po ^ 0.66)
capprox
```

```
y
0.517743
```

The approximation is off by 0.004.

Check against the original regression equation fitted here:

`plogis((log(po) - 0.0003) / 1.5179)`

```
y
0.5176616
```

# Continuous Y Example

Now consider a two-sample problem with continuous Y. We could induce mild non-PO by sampling from two normal distributions with equal variance and a nonzero difference in means^{2}. But let’s induce major non-PO by also allowing the variance in the two groups to be unequal. Draw a random sample of size 100 from a normal distribution with mean 0 and variance 1 and a second sample from a normal distribution with mean 1 and variance 0.2.

```
set.seed(1)
n1 <- n2 <- 100
y1 <- rnorm(n1, 0, 1)
y2 <- rnorm(n1, 1, sqrt(0.2))
group <- c(rep(1, n1), rep(2, n2))
y <- c(y1, y2)
Ecdf(~ y, group=group, fun=qlogis, ylab='logit ECDF')
```

Serious non-parallelism of the logit of the two empirical cumulative distributions means serious non-proportional odds. Let’s get the \(c\) index (concordance probability) and its approximation from the PO model, as before. We use the `rms`

package `orm`

function which is designed to efficiently analyze continuous Y. Here the model has 199 intercepts since there are no ties in the data. Below I have back-solved for the Wilcoxon test \(\chi^2\) statistic.

```
conc <- (mean(rank(y)[group == 2]) - (n2 + 1) / 2) / n1
w <- wilcox.test(y ~ group)
wchisq <- qchisq(1 - w$p.value, 1)
wchisq
```

[1] 59.67072

```
f <- orm(y ~ group)
f
```

**Logistic (Proportional Odds) Ordinal Regression Model**

orm(formula = y ~ group)

Model Likelihood Ratio Test | Discrimination Indexes | Rank Discrim. Indexes | |
---|---|---|---|

Obs 200 | LR χ^{2} 69.08 | R^{2} 0.292 | ρ 0.548 |

Distinct Y 200 | d.f. 1 | g 1.146 | |

Y_{0.5} 0.7002136 | Pr(>χ^{2}) <0.0001 | g_{r} 3.144 | |

max |∂log L/∂β| 6×10^{-7} | Score χ^{2} 66.46 | |Pr(Y ≥ median)-½| 0.257 | |

Pr(>χ^{2}) <0.0001 |

β | S.E. | Wald Z | Pr(>|Z|) | |
---|---|---|---|---|

group | 2.2796 | 0.2922 | 7.80 | <0.0001 |

```
or <- exp(coef(f)['group'])
capprox <- or ^ 0.66 / (1 + or ^ 0.66)
z <- c(conc, capprox, or)
names(z) <- c('c-index', 'c-approx', 'OR')
z
```

```
c-index c-approx OR
0.8162000 0.8182461 9.7723775
```

The approximation is off by 0.002.

# Computing Environment

R version 4.0.4 (2021-02-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Pop!_OS 20.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rms_6.1-2 SparseM_1.78 Hmisc_4.6-0 ggplot2_3.3.2 [5] Formula_1.2-4 survival_3.2-7 lattice_0.20-41To cite R in publications use:

R Core Team (2021).
*R: A Language and Environment for Statistical Computing*.
R Foundation for Statistical Computing, Vienna, Austria.
https://www.R-project.org/.

`Hmisc`package in publications use:

Harrell F (2021).
*Hmisc: Harrell Miscellaneous*.
R package version 4.6-0, https://hbiostat.org/R/Hmisc/.

`rms`package in publications use:

Harrell Jr FE (2021).
*rms: Regression Modeling Strategies*.
https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms.