R - Feature Selection - Indirect Model Selection

R Subset Selection Cp

About

In a feature selection process, once you have generated all possible models, you have to select the best one. This article talks about the indirect methods.

Model Selection

Adujstement formula

We will select the models using CP but as you see below, the regsubset object that has been created in the model generation step, has also adjusted r squared and BIC

names(myPathOfModel.summary)
[1] "which"  "rsq"    "rss"    "adjr2"  "cp"     "bic"    "outmat" "obj"   

Like for each models, the best subset models has the following variables:

You use this data to plot them

Function

The idea here is to pick a model with the lowest Cp. To identify it, you can do that with the Cp plot or with the following function:

which.min(myPathOfModel.summary$cp)
[1] 10

In this case, the model with 10 variables is the smallest.

Plot

Cp

Cp is an estimate of prediction error.

plot(myPathOfModel.summary$cp,xlab="Number of Variables",ylab="Cp")

You can also plot the best point:

points(10,myPathOfModel.summary$cp[10],pch=20,col="red")

R Subset Selection Cp

Cp Model by variables

plot(myPathOfModel,scale="Cp")

Plot Best Subset Selection

This plot gives a quick summary of all the models by variables, as opposed to just seeing the Cp statistics.

  • the unique value of Cp for each model are in a descendant order (worst and worst) on the y axis (Small is good)
  • the variables are on the x axis
  • The black squares indicates that variable's are in (one) and the white squares indicates that variable's are out (null)





Discover More
Card Puncher Data Processing
R - Feature selection - Model Generation (Best Subset and Stepwise)

This article talks the first step of feature selection in R that is the models generation. Once the models are generated, you can select the best model with one of this approach: Best...



Share this page:
Follow us:
Task Runner