Data Mining - (Attribute|Feature) (Selection|Importance)
Table of Contents
1 - About
Feature selection is the second class of dimension reduction methods. They are used to reduce the number of predictors used by a model by selecting the best d predictors among the original p predictors.
This allows for smaller, faster scoring, and more meaningful Generalized Linear Models (GLM).
Feature selection techniques are often used in domains where there are many features and comparatively few samples (or data points).
Feature selection is also useful as part of the data analysis process, as it shows which features are important for prediction, and how these features are related.
What are the important variables to include in the model.
When we have a small number of features, the model becomes more interpretable.
Feature selection is a way of choosing among features to find the ones that are most informative.
We'd like to fit a model that has all the good (signal) variables and leaves out the noise variables.
2 - Articles Related
3 - Assumption
The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features.
4 - Procedure
We have access to p predictors but we want to actually have a simpler model that involves only a subset of those p predictors. This model selection is made in two steps:
- model generation and selection for each k parameters
- model selection among all best model for each k parameters. We choose then between them based on some criterion that balances training error with model size.
4.1 - Models Generation
4.1.1 - Subset selection
All the below methods take a subset of the predictors and use least squares to fit the model.
Subset selection flavours two methods in order to generate the models for k predictors:
4.1.2 - Shrinkage
The shrinkage methods take all of the predictors, but use a shrinkage approach to fit the model instead of least squares.
4.1.3 - Dimension reduction
This methods use least squares not on the original predictors but on new predictors, which are linear combinations of the original projectors.