R - Multiple Linear Regression

> Procedural Languages > R

1 - About

Multiple linear regression with R functions such as lm

Advertising

3 - Steps

3.1 - Linear Model

3.1.1 - Unstandardized

Unstandardized Multiple Regression

myFit=lm(response~predictor1+predictor2,data=data.frame)
myFit
Call:
lm(formula = response~ predictor1 + predictor2, data = data.frame)

Coefficients:
(Intercept)   predictor1   predictor2
   33.22276     -1.03207      0.03454  

3.1.2 - Standardized

Regression analyses, standardized (in the z scale).

modelz <- lm(scale(data$OutcomeVariable) ~ scale(data$PredictorVariable1) + scale(data$PredictorVariable2) )

3.1.3 - All variables are predictors

myFit=lm(outcome~.,DataFrame)

The point is a short-cut to select all variables.

3.1.4 - Updating a model

Updating a model to remove the non-significant predictors.

myModel1=update(myModel1,~.-NoSignificantPredictor1-NoSignificantPredictor1)
Advertising

3.2 - Model Attributes

attributes(myFit)

See R - Names

$names
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        

$class
[1] "lm"

where:

3.3 - Summary

Summary Statistics

summary(myFit)
Call:
lm(formula = response~ predictor1 + predictor2, data = data.frame)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.981  -3.978  -1.283   1.968  23.158 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.22276    0.73085  45.458  < 2e-16 ***
predictor1  -1.03207    0.04819 -21.416  < 2e-16 ***
predictor2   0.03454    0.01223   2.826  0.00491 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.173 on 503 degrees of freedom
Multiple R-squared:  0.5513,	Adjusted R-squared:  0.5495 
F-statistic:   309 on 2 and 503 DF,  p-value: < 2.2e-16

where:

Advertising

3.4 - Confidence Interval

Confidence Interval

confint(myFit)
                  2.5 %      97.5 %
(Intercept) 31.78687150 34.65864956
predictor1  -1.12674848 -0.93738865
predictor2   0.01052507  0.05856361

3.5 - Prediction

predict(fit2,data.frame(predictor1=c(5,10,15),predictor2=c(20,30,40)),interval="confidence")
       fit      lwr      upr
1 28.75330 27.67694 29.82967
2 23.93841 22.97305 24.90376
3 19.12351 18.12607 20.12094

3.6 - Plot

# A two by two frames to receive the scatter-plots
par(mfrow=c(2,2))
# Plot
plot(model)

Plot gives various views of the linear model:

  • Residuals against the fitted values.

The goal is to capture non-linearities. If we see a curve in the residuals, it means that the model is not quite capturing everything that's going on because of some non-linearity effect.

  • Normal QQ
  • Scale Location
  • Residuals vs Leverage
lang/r/multiple_regression.txt · Last modified: 2017/11/16 18:03 by 172.68.144.53