# Dichotomization

## Datamethods

datamethods.org is a discussion site where data methodologists meet each other and subject matter experts including clinical trialists and clinical researchers. Its development is documented here. Datamethods is provided by the Department of Biostatistics, Vanderbilt University School of Medicine. I have written some short articles on the site, listed below. Responder analysis: Loser x 4 Problems with NNT Should we ignore covariate imbalance and stop presenting a stratified ‘table one’ for randomized trials?

## Information Gain From Using Ordinal Instead of Binary Outcomes

This article gives examples of information gained by using ordinal over binary response variables. This is done by showing that for the same sample size and power, smaller effects can be detected

## Statistical Errors in the Medical Literature

Misinterpretation of P-values and Main Study Results Dichotomania Problems With Change Scores Improper Subgrouping Serial Data and Response Trajectories Cluster Analysis As Doug Altman famously wrote in his Scandal of Poor Medical Research in BMJ in 1994, the quality of how statistical principles and analysis methods are applied in medical research is quite poor. According to Doug and to many others such as Richard Smith, the problems have only gotten worse.

## Damage Caused by Classification Accuracy and Other Discontinuous Improper Accuracy Scoring Rules

I discussed the many advantages or probability estimation over classification. Here I discuss a particular problem related to classification, namely the harm done by using improper accuracy scoring rules. Accuracy scores are used to drive feature selection, parameter estimation, and for measuring predictive performance on models derived using any optimization algorithm. For this discussion let Y denote a no/yes false/true 0/1 event being predicted, and let Y=0 denote a non-event and Y=1 the event occurred.

## Clinicians' Misunderstanding of Probabilities Makes Them Like Backwards Probabilities Such As Sensitivity, Specificity, and Type I Error

Optimum decision making in the presence of uncertainty comes from probabilistic thinking. The relevant probabilities are of a predictive nature: P(the unknown given the known). Thresholds are not helpful and are completely dependent on the utility/cost/loss function. Corollary: Since p-values are P(someone else’s data are more extreme than mine if H0 is true) and we don’t know whether H0 is true, it is a non-predictive probability that is not useful for decision making.