# Adjudication and Statistical Efficiency

## Background

In clinical and epidemiologic studies one is frequently tasked with maximizing accuracy when assessing the presence of clinical conditions (symptoms, diagnoses, syndromes, etc.) or verifying outcome events such as stroke, myocardial infarction, or death from a specific cause. Prospective studies have the advantage of standardizing definitions of clinical conditions, minimizing bias, and being honest about disagreements about clinical designations. Many studies have clinical endpoint committees or adjudication committees. Statistical efficiency and completeness of reporting are optimized by having as many committee members as feasible, and having the members operate as independently as possible.

Statistical efficiency also comes from minimizing forced choices and utilizing gray zones. For example, if a study has only one adjudicator, and this clinical expert is uncertain about some of the designations, it is best for her to code determinations using at least one level of gray. The way to understand why this is more statistically efficient than having forced choices is to consider a 3-level (negative, uncertain, positive) clinical outcome that is being correlated with a 5-level severity of a symptom. Uncertain outcomes may occur more often for patients having a middle symptom severity. Making use of 3 levels of outcome will capitalize on this to increase power.

Sometimes the clinical condition needs to be used not as a multilevel ordinal outcome but is instead used in subsetting patients. For example, one may want to analyze a subset of the cohort consisting of patients designated as having a certain clinical syndrome at baseline. It is not hard to analyze subsets when the subsetting is uncertain. For example, if one translated an adjudication to the probability the patient has a syndrome, one can easily use multiple imputation to analyze subsets under uncertainties. If a given patient has a probability of 0.6 of having syndrome X, 10 imputations of the binary syndrome can be generated. In the long run, \(\frac{6}{10}\) of the imputations will be positive for the syndrome. The needed subset analysis can be done by including, for each of the multiple imputations, all the patients imputed to be positive for X. By repeating this process over, say, 10 multiple imputations, noise in this process will average out and one-time forced choice classification is unnecessary.

Even if one does not want to use multiple imputation or Bayesian models to account for adjudication uncertainties, it is important to design the adjudications to lead to an optimum final negative/positive designation.

## A Hierarchy of Statistical Information and Power

Besides having independent adjudicators, statistical information is maximized when one delays forced-choice designations as much as possible and respects gray zones to the extent possible. Here is a hierarchy of statistical information/efficiency/power from highest to lowest, for various strategies.

- Have each adjudicator record the probability the patient is in the clinical category of interest, then average these probabilities to yield a final result that is used in analyses. When the clinical category is used as an outcome variable, ordinal regression may be used in the final analysis. This can be used to estimate the probability that the outcome is at a certain level \(y\) or higher, for any \(y\) and for any level of baseline variables.
- Classify the patient as negative/positive depending on whether this average probability the condition exists exceeds a pre-specified level.
- Have each adjudicator record a forced choice of negative/positive. Code the final result as the proportion (over adjudicators) of positives.
- Have each adjudicator record a forced choice of negative/positive. Code the final result as negative/positive depending on a majority rule. One would need to have an odd number of reviewers for this rule.

When one has a probability of being in a clinical class and such probabilities are not all near 0 or 1, the probabilities are self-contained in terms of capturing the difficulty of the task of classifying patients. This translates directly to quantifying the arbitrariness of forced-choice classifications.

## Resources

- Probabilistic readjudication of heart failure hospitalization events in the PARAGON-HF study by GM Felker, J Butler, JL Januzzi, AS Desai, JJV McMurray, SD Solomon (includes a multiple imputation approach)
- A comparison of approaches for adjudicating outcomes in clinical trials by BC Kahan, B Feagan, V Jairath
- Descriptive approach to analyzing observer variability
- How breaking ties in a variable increases statistical power
- Against diagnosis in favor of matters of degree, by AJ Vickers, E Basch, MW Kattan
- Probabilistic prediction in patient management and cinical trials by DJ Spiegelhalter
- The end of the “syndrome” in critical care by Lawrence Lynn