It is easy to compute the sample size N1 needed to reliably estimate how one predictor relates to an outcome. It is next to impossible for a machine learning algorithm entertaining hundreds of features to yield reliable answers when the sample size < N1.
I discussed the many advantages or probability estimation over classification. Here I discuss a particular problem related to classification, namely the harm done by using improper accuracy scoring rules. Accuracy scores are used to drive feature selection, parameter estimation, and for measuring predictive performance on models derived using any optimization algorithm. For this discussion let Y denote a no/yes false/true 0/1 event being predicted, and let Y=0 denote a non-event and Y=1 the event occurred.
Classification combines prediction and decision making and usurps the decision maker in specifying costs of wrong decisions. The classification rule must be reformulated if costs/utilities change. Predictions are separate from decisions and can be used by any decision maker. The field of machine learning arose somewhat independently of the field of statistics. As a result, machine learning experts tend not to emphasize probabilistic thinking. Probabilistic thinking and understanding uncertainty and variation are hallmarks of statistics.