Null Hypothesis Significance Testing Never Worked

Much has been written about problems with our most-used statistical paradigm: frequentist null hypothesis significance testing (NHST), p-values, type I and type II errors, and confidence intervals. Rejection of straw-man null hypotheses leads researchers to believe that their theories are supported, and the unquestioning use of a threshold such as p<0.05 has resulted in hypothesis substitution, search for subgroups, and other gaming that has badly damaged science. But we seldom examine whether the original idea of NHST actually delivered on its goal of making good decisions about effects, given the data.