Data Mining - Test Set

> (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis)

Table of Contents

1 - About

The test set is a set that is used to validate the model.

Test set represent the foresight (unknown data, real data) whereas training Set represents the hindsight.

Generally, the test data is created during the building phase through resampling methods.

You shouldn't use any information about the class values in the test set to help within a learning method, otherwise the model has already seen it and has already captured the test set information. The test error will then be badly improved.