R - Linear Discriminant Analysis (LDA)

> Procedural Languages > R

1 - About

2 - Steps

2.1 - Prerequisites

require(MASS)

2.2 - Model

  • Fit the model
ldaModel=lda(Target~Variable1+Variable2,data=dataframe, subset=VariableN<10)
  • Print it by tapping its name
ldaModel
Call:
lda(Target~ Variable1+ Variable2, data = dataframe, subset=VariableN<10)

Prior probabilities of groups:
    False    True
0.491984 0.508016 

Group means:
       Variable1   Variable2
False 0.04279022  0.03389409
True -0.03954635 -0.03132544

Coefficients of linear discriminants:
                 LD1
Variable1 -0.6420190
Variable2 -0.5135293

where:

  • the prior probabilities are just the proportions of false and true in the data set. It's kind of a random walk. Half the time it goes up, half the time it goes down.
  • the LDA coefficients. The LDA function fits a linear function for separating the two groups. Therefore, it's got two coefficients.
Advertising

2.3 - Plot

plot(ldaModel)

It plots a linear discriminant function separately, the values of the linear discriminant function, separately for the up group and the down group.

There's really not much difference.

2.4 - Predictions and classification

predictions=predict(ldaModel,dataframe)
# It returns a list as you can see with this function
class(predictions)
# When you have a list of variables, and each of the variables have the same number of observations, 
# a convenient way of looking at such a list is through data frame.
# Seeing the first 5 rows
data.frame(predictions)[1:5,]
       class   posterior.False  posterior.True            LD1
999     True         0.4901792       0.5098208     0.08293096
1000    True         0.4792185       0.5207815     0.59114102
1001    True         0.4668185       0.5331815     1.16723063
1002    True         0.4740011       0.5259989     0.83335022
1003    True         0.4927877       0.5072123    -0.03792892

where:

  • the first column is the column name
  • the class column is the classification
  • the posterior probabilities for all the class
  • the LDA coefficients

2.5 - Accuracy

  • Confusion Matrix
table(predictions$class,dataframe$target)
       Down  Up
  Down   35  35
  Up     76 106
  • Current classification rate
mean(predictions$class==dataframe$target)
[1] 0.5595238
Advertising