R - Feature Selection - Indirect Model Selection

> Procedural Languages > R

1 - About

In a feature selection process, once you have generated all possible models, you have to select the best one. This article talks about the indirect methods.

3 - Model Selection

3.1 - Adujstement formula

We will select the models using CP but as you see below, the regsubset object that has been created in the model generation step, has also adjusted r squared and BIC

names(myPathOfModel.summary)
[1] "which"  "rsq"    "rss"    "adjr2"  "cp"     "bic"    "outmat" "obj"   

Like for each models, the best subset models has the following variables:

You use this data to plot them

Advertising

3.2 - Function

The idea here is to pick a model with the lowest Cp. To identify it, you can do that with the Cp plot or with the following function:

which.min(myPathOfModel.summary$cp)
[1] 10

In this case, the model with 10 variables is the smallest.

3.3 - Plot

3.3.1 - Cp

Cp is an estimate of prediction error.

plot(myPathOfModel.summary$cp,xlab="Number of Variables",ylab="Cp")

You can also plot the best point:

points(10,myPathOfModel.summary$cp[10],pch=20,col="red")

3.3.2 - Cp Model by variables

plot(myPathOfModel,scale="Cp")

This plot gives a quick summary of all the models by variables, as opposed to just seeing the Cp statistics.

  • the unique value of Cp for each model are in a descendant order (worst and worst) on the y axis (Small is good)
  • the variables are on the x axis
  • The black squares indicates that variable's are in (one) and the white squares indicates that variable's are out (null)
Advertising
lang/r/model_selection_indirect.txt · Last modified: 2017/02/12 21:51 by 66.249.69.140