Statistics Learning - Multi-variant logistic regression

1 - About

A logistic regression with multiple variables and two class outcome.

3 - General linear model

<MATH> \begin{array}{rrrl} Pr(Y = 1|X) & = & p(X) & = & \frac{\displaystyle e^{\displaystyle B_0 + B_1 . X_1 + \dots + B_i . X_i}}{\displaystyle 1+ e^{\displaystyle B_0 + B_1 . X_1 + \dots + B_i . X_i }} \\ \end{array} </MATH>

Invert of the logit transformation: <MATH> \begin{array}{rrl} log \left (\frac {\displaystyle p(X)}{\displaystyle 1 - p(X)} \right ) & = & B_0 + B_1 X_1 + \dots + B_i X_i \\ \end{array} </MATH>

4 - R

myLogisticRegressionModel <- glm ( targetVariable~., data=myDataFrame , family = binomial )
summary ( myLogisticRegressionModel )
  • tilde means to be modeled as.
  • And dot means all the other variables in the data frame
  • A binomial family tells to fit the logistic regression model.

5 - Interpretation

We're not too interested in the intercept.

It is difficult to interpret regression coefficients in a multiple regression model, because the correlations between the variables can affect the signs.

When we have correlated variables, these variables act as surrogates for each other and it can affect:

  • the sign of the coefficient
  • the p-value (significant or not)
data_mining/multi-variant_logistic_regression.txt · Last modified: 2015/07/13 21:44 by gerardnico