# Machine Learning - Linear regression

“Linear regression” is a standard mathematical technique for predicting numeric classes.

This is a classical statistical method dating back more than 2 centuries (from 1805).

Otherwise, practical problems often require non‐linear solutions.

The linear model is an important example of a parametric model.

Linear regression is very extensible and can be used to capture non-linear effects.

In linear regression, “Regression” is an unusual term as it's a linear model. It comes historically from the idea of regression towards the mean, which is a concept which was discussed in the early 1900s. But we have to live with this term because it's become time honored.

## Problem definition

You have a cloud of data points in (2|n) dimensions and are looking for the best straight (line|hyperline) fit.

You might have more than 2 dimensions. It's a standard matrix problem.

## Assumption

• Linear Regression assumes that the dependence of the target $Y$ on the predictors $X_1, ..., X_p$ is linear. Even if true regression functions are never linear.

## Algorithm

### Model

It produces a model that is a linear function (i.e., a weighted sum) of the input attributes. There are non-linear methods that build trees of linear models.

In two dimensions, it's a line.

Formula:

$Y = B_0 + B_1.X_1 + B_2.X_2+ \dots +B_p.X_p$

where:

Linear Regression works naturally with numeric classes (not with nominal ones) because the predictors are multiplied by weights.

Procedure:

• Calculate weights (B) from training data. Large correlation is good, and the value cannot be greater than 1.
• Calculate an error (The squared error) on the training data and try to minimize it by choosing the best weights.

## Performance

M5P performs quite a lot better than Linear Regression.

## Classification

### Two‐class problem

Linear regression can be used for classification as well:

• Calculate a linear function using regression
• and then apply a threshold to decide whether it's 0 or 1 (two-valued nominal classes).

Steps:

• On the training dataset: convert the class to binary attributes (0 and 1)
• Use the regression output and the nominal class as an input for One_Rule in order to define a threshold
• Use this threshold for predicting class 0 or 1

### Multi-class problem

For more class labels than 2, the following methods can be used:

• multi-response linear regression
• pairwise linear regression

Steps:

• Training: perform a regression for each class. N regression for a problem where there are n different classes. Set output to 1 for training instances that belong to the class, 0 for instances that don’t
• Prediction:
• choose the class with the largest output
• or use “pairwise linear regression”, which performs a regression for every pair of classes

Example for multi-response linear regression:
For a three class problem, we create three prediction model where the target is one class and zero for the others. If the actual and predicted outputs for the third instance are:

Instance Id Model Numeric Class Prediction
3 Blue 0 0.359
3 Green 1 0.322
3 Red 0 0.32

the predicted class is Blue because the first model predicts the largest output. The actual class of the instance 3 is Green because the numeric class is a 1 in the second model

## Implementation

### Weka

Weka has a supervised attribute filter (not the “unsupervised” one) called NominalToBinary that converts a nominal attribute into the same set of binary attributes used by LinearRegression and M5P.

To show the original instance numbers alongside the predictions, use the AddID unsupervised attribute filter, and the “Output additional attributes” option from the Classifier panel “More options …” menu. Be sure to use the attribute *index* (e.g., 1) rather than the attribute *name* (e.g., ID).