Statistics Learning - Prediction Error (Training versus Test)
Table of Contents
1 - About
In general, because the more data, the bigger the sample size, the more information you have, the lower the error is.
2 - Articles Related
3 - Metrics
3.1 - Regression
3.2 - Classification
4 - Type
4.1 - Training
Training error is the error we get applying the model to the same data from which we trained.
4.2 - Test
Test error is the error that we incur on new data. The test error is actually how well we'll do on future data the model hasn't seen.
5 - Training vs Test
Training error almost always UNDERestimates test error, sometimes dramatically.
Training error usually UNDERestimates test error when the model is very complex (compared to the training set size), and is a pretty good estimate when the model is not very complex. However, it's always possible we just get too few hard-to-predict points in the test set, or too many in the training set. Then the test error can be LESS than training error, when by chance the test set has easier cases than the training set.