Data Mining - Test Set

Thomas Bayes

About

The test set is a set that is used to validate the model.

Test set represent the foresight (unknown data, real data) whereas training Set represents the hindsight.

Generally, the test data is created during the building phase through resampling methods.

You shouldn't use any information about the class values in the test set to help within a learning method, otherwise the model has already seen it and has already captured the test set information. The test error will then be badly improved.





Discover More
Thomas Bayes
Data Mining - Training (Data|Set)

In statistics, the training data is the sample whereas in data mining, machine learning, the training data is often a subset of the data set. See Training Set represents the hindsight whereas test set...
Bed Overfitting
Machine Learning - (Overfitting|Overtraining|Robust|Generalization) (Underfitting)

A learning algorithm is said to overfit if it is: more accurate in fitting known data (ie training data) (hindsight) but less accurate in predicting new data (ie test data) (foresight) Ie the model...
Bin Interval
Statistics - (Discretizing|binning) (bin)

Discretization is the process of transforming numeric variables into nominal variables called bin. The created variables are nominal but are ordered (which is a concept that you will not find in true...
Model Performance Michaelangelo Uber
Statistics - Model Evaluation (Estimation|Validation|Testing)

Evaluation is how to determine if the model is a good representation of the truth. Validation applies the model to test data in order to determine whether the model, built on a training set, is generalizable...
Thomas Bayes
Statistics - Resampling through Random Percentage Split

Percentage Split (Fixed or Holdout) is a re-sampling method that leave out random N% of the original data. For example, you might select: 75% of the rows formed the training setfor building the model...



Share this page:
Follow us:
Task Runner