As discussed in more detail in Section 5.3 of Regression Modeling Strategies Course Notes and the same section of the RMS book, data splitting is an unstable method for validating models or classifiers, especially when the number of subjects is less than about 20,000 (fewer if signal:noise ratio is high). This is because were you to split the data again, develop a new model on the training sample, and test it on the holdout sample, the results are likely to vary significantly. Data splitting requires a significantly larger sample size than resampling to work acceptably well. See also Section 10.11 of BBR.
There are also very subtle problems:
- When feature selection is done, data splitting validates just one of a myriad of potential models. In effect it validates an example model. Resampling (repeated cross-validation or the bootstrap) validate the process that was used to develop the model. Resampling is honest in reporting the results because it depicts the uncertainty in feature selection, e.g., the disagreements in which variables are selected from one resample to the next.
- It is not uncommon for researchers to be disappointed in the test sample validation and to ask for a "re-do" whereby another split is made or the modeling starts over, or both. When reporting the final result they sometimes neglect to mention that the result was the third attempt at validation.
- Users of split-sample validation are wise to recombine the two samples to get a better model once the first model is validated. But then they have no validation of the new combined data model.