Data Mining - Partial least squares (PLS)

> (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis)

1 - About

Partial least squares (PLS) is is a dimension reduction method and uses the same method than principle components regression but it selects the new predictors (principal component) in a supervised way.

The PLS approach attempts to find directions (ie principal component) that help explain both:

  • the response
  • and the original predictors.

PLS look for a direction in which the original predictors varies that are also related to the response.


3 - Steps

  • The first partial least squares direction z1, is proportional to the correlation between the response y and the data matrix x.
  • Subsequent directions are found by taking residuals and then repeating the above prescription.

4 - PLS vs PCR

In principle, partial least squares should be a huge gain over principle components regression because it chooses the direction looking at the response but in practice, PLS often does not give a huge gain over principle components regression (PCR).

Ridge and principle components regression work as well and are both simpler.