Navigating Statistical Modeling and Machine Learning

data-science
machine-learning
prediction
2018
This article elaborates on Frank Harrell’s post providing guidance in choosing between machine learning and statistical modeling for a prediction project.
Author
Affiliation

GoodScience, Inc.

Published

May 14, 2018

… the art of data analysis is about choosing and using multiple tools.
  — Regression Modeling Strategies, pp. vii

Frank Harrell’s post, Road Map for Choosing Between Statistical Modeling and Machine Learning, does us the favor of providing a contrast of statistical modeling (SM) and machine learning (ML) in terms of fundamental attributes (signal:noise and data requirements, dependence on assumptions and structure, interest in “special” parameters, accounting of uncertainties and predictive accuracy). This is clarifying perspective. Despite the prevalent conflation of SM and ML within the rubric of ‘data science’, Frank’s post underscores that SM and ML are different in important ways and the individual considerations in this contrast should assist us in making deliberated decisions about when and how to apply one approach or another. This cogent set of criteria help us better select tools that are fit-for-purpose and serve our particular ends with the best means. Getting clarity about what our real ends are might be the harder part.

To extend the analogy, the guideposts identified by Frank could be illustrated as a route map if put into the format of a series of junctures (and termini). Here is an example:

  1. Do you want to isolate the effect of special variables or have an interpretable model? If yes, turn left toward SM; if no, keep driving …
  2. Is your sample size less than huge? If yes, park in the space designated “SM”; if no, …
  3. Is your signal:noise low? If yes, take the ramp toward “SM”; if no, …
  4. Is there interest in estimating the uncertainty in forecasts? If yes, merge into SM lane; if no, …
  5. Is non-additivity/complexity expected to be strong? If yes, gun the pedal toward ML; if no, … you can continue the journey with SM.

This allegorical cartoon is simplistic: the situation is certainly much more nuanced than this. But it is more systematic thinking than is often employed (such as, ‘I have lots of data, therefore ML’). There are other maps that people could draw, and junctures to consider. The route illustrated above is intended to encourage others to plot a course thoughtfully. And the allegory is certainly narratively thin: there are surprises lurking in the landscape along the highway.

Frank’s contrast between SM and ML exposes an essential question: “who/what is actually learning?” For the most part, in ML only the machine is learning. Little or no understanding is escaping from the black box for human knowledge, and this means that ML is purely instrumental. In some ways ML is like operant conditioning, or the automative System 1 thinking process in humans (Kahneman’s Thinking Fast and Slow). They take in information and result in behavioral outputs, but operate below the level of conscious awareness. The machine is largely ‘dumb’ and cannot tell you very well what it has learned; nor can it be aware of when and how it may be fundamentally wrong. While ML can serve many purposes, there are potential risks and costs associated with mechanical opacity (viz, machine trading).

The Scientific Method demonstrates that you can use controlled experiments and strong claims to understand causation and predict events. SMs sort of blur the boundaries between rigorous causal understanding and purely instrumental utility: they are predictive tools that are also comprehensible to humans Inference to the Best Explanation, and the random component of the model models the sources of uncontrolled variation. ML shows that if you relax the first two requirements to weak claims, you can still predict events, but perhaps not understand them [special thanks to Garrett Grolemund for his thinking and language about these issues]. We have seen how ML vs. SM can be reframed as situated on some spectrum (e.g., the spectrum of human intermediation, in Big Data and Machine Learning in Health Care, by AL Beam and IS Kohane). This suggests yet another spectrum:

Experimental Science → clear causal understanding and predictions
Statistical Models → understanding that holds under a set of assumptions, and supplies predictions and uncertainty estimates
Machine Learning → predictions

This spectrum invites consideration of the various ways we use predictions: from corroborating or refuting theory (as in the scientific method), to calibrating fit and positing structure, to utilitarian prognostication.

For several medical applications a black box prediction tool would appear to be entirely suitable, such as reading pathology, predicting treatment non-adherence, or some high-complexity non-linear systems biology problems, etc. Predicting accurately in such applications may be entirely enough, whether or not you know why the predictions are accurate. We don’t need to be mechanics to have a car get us to our destinations. In this, ML may best be construed literally as a ‘tool’ in the instrumental sense: a form of augmentation of human effective capacity. In a generic sense, AI/ML is, to date, primarily about building systems that can address a discrete and specific problem by processing enormous volumes of data and providing answers to highly structured questions in an automated way, very quickly.

So, it may be that one of the first forks in the road map for choosing between ML and SM should be whether you want to claim to be doing formal science or not. For the endeavor to be scientific, you have to have and empirically assess hypotheses or theories about how some aspect of the world works; which are minimal or absent in ML. If learning, in the sense of accruing knowledge about how the world works, is not a predicate of ML, however highly technical ML may be, it should not be misconstrued as scientific. Despite being a central feature of the current Data Science meme, ML should surrender any pretensions about being science. But is a potentially highly effective technology.

This reasoning exposes as well an obverse issue in how SM is sometimes used in medicine. While SM provides prediction based on evaluation of specific hypotheses about nature, it is very frequently used to rationalize a simplistic heuristic approach for clinical decision making, inadvertently forsaking the full probabilistic information available for the decision. Ultimately, real-world medical decision-making is a forecast: conditional on a set of premises provided in data it is a prediction about what course of action is likely to yield the best result, especially for individual patient-level decision-making (e.g., Precision Medicine, Personalized Medicine). Traditional rigorous causal inference has led to a reductionist focus on particular independent effects and has encouraged a selective focus on a limited set of terms in the right-hand side (r.h.s.) of the equation. With SM a prevalent tendency is to focus, after adjustment, on selected variables and just use these ‘risk factors’. Frequently, just categorical classes of the selected variables are used in making decisions about care, further reducing these to heuristics for decision making—much as we tend to use p-values as facile surrogates for richer evidence. This is also similar to promoting the value of a new biomarker that in isolation provides less information than the basic clinical data available. We have a strong tendency to reduce information for decisions to singular and simple binary inputs. This is entropic dissipation of information, due largely to our stubborn preference for cognitive ease in decision making.

Models that make accurate predictions of responses for future observations by incorporating relevant information for decision making perform the correct calculus of integrating information, and provide correct output for informing decisions with explicit probability and uncertainty estimates (the left-hand side of the equation: l.h.s.). You will hear remarks that reflect resistance to probability-based clinical decision making: complaints that probabilities are too complex and that emphasize what physicians want or need. I think this is a misplaced objective at a fundamental level. The correct objective and focus is what leads ultimately to the best outcomes for patients. This should not be about how to make it easy for physicians—it is about finding and adopting the best process for decision making that serves the interest of patients, no matter how difficult, awkward or inconvenient for physicians. I have sympathy for clinicians—they are only human, with limited cognitive capacity (information bandwidth) like the rest of us. Because thinking consumes our limited energy human cognition is prone to take the path of least resistance. And we are generally entirely unaware of this as we are doing it. Cognitive laziness is built deep into our nature. But the real value to be served in clinical decision making is the quality of care and outcomes for patients. Where individual patients are involved, rich multi-variable information rigorously integrated for individual patient-level decision-making leads to much greater acuity in predicting the consequences of health care actions; and ultimately, to better decisions and outcomes.

Well formulated real-world posterior conditional probabilities (i.e., l.h.s.) are high-value information about both potential outcomes and uncertainties. Left hand side-based decision-making maps observations to actions, and better informs effective care related decision-making, potentially improving outcomes for patients. Paradoxically, while we may have learned something specific and scientific from the data with SM, we also are not using the predictive capacity of SM—the l.h.s.—optimally either. Prediction generalizes estimation and to some extent hypothesis testing (Regression Modeling Strategies, pp. 1); For SM—like ML—overall prediction remains a major goal.

Our tendency toward cognitive ease (our allergy to complexity) may explain part of the sex appeal of ML: the allure of outsourcing cognitive effort to the machine. The perceived value of our technology is in removing the difficulty and uncertainty from our lives. This is a source of the seductive power of technology. Part of what is attractive about ML is that it appears to absolve humans of the need to think hard and that solutions will appear out of the machine ‘automagically’. ML appeals to our bias for cognitive ease, and risks beguiling “magical thinking” (a term I borrowed from What Algorithms Want by Ed Finn). There is a prevalent fantasy about “the killer app’”, and how it will liberate us from our cognitive limitations and the effort of hard thought. And this “killer app” fantasy (in combination with our lazy thinking) reinforces the notion that success is all about the technology—about the algorithm.

Judging from the prevalence of articles and advertisements in the vocational literature and the lay press, the requirement for ML experience among job postings, the emphasis on ML at professional meetings, etc., you might think that SM has gone the way of the horse-and-buggy, or is an endangered species occupying a precarious ecological niche. But, whereas in this epoch we are carried away in a tsunami of data, and ML requires big data, it does not follow that doing ML should now be obligatory. We need to be thinking more carefully than that. An important initial reflection should be on the temptation to be doing ‘Big Data Science’ for the sake of ‘doing Big Data Science’. This is a prevalent confusion of means and ends: solutions in search of a problem. It confuses instruments with objectives. While there are many useful technologies, wisdom resides in knowing which to use and when to (and not to) use them. True value is in the quality of the results, not in just being able to claim pride of place on the Data Science bandwagon. Notwithstanding the rare lucky shots, arbitrary applications of a technology more often than not have under-whelming results. “Give somebody a hammer, and he will treat everything as a nail” very often leads to “This hammer is no good at pounding this screw!” There are many and diverse sources of knowledge about individual statistical methods and applications, but “… the art of data analysis is about choosing and using multiple tools” (Regression Modeling Strategies, pp. vii.) True value will emerge from the judicious and appropriate application of tools for settled purposes. This is where the road map for choosing between ML and SM is useful.

The issue of a false dichotomy is moot: ML and SM are different. A better question may be, are there conditions and ways in which ML and SM can be complementary for specific purposes? Are there ways they can be combined? Are they compatible within the domain of modern applied practice? In the general domain of practice SM and ML only fully displace one another in a perspective of chauvinistic zero-sum domination. They only appear to compete if their respective advantages under specific conditions and for specific purposes are not understood. They only appear to compete under conditions of prejudice or incomplete understanding. Frank’s roadmap does much to resolve this.