# Data Mining - (Stochastic) Gradient descent (SGD)

Gradient descent can be used to train various kinds of regression and classification models.

It's an iterative process and therefore is well suited for map reduce process.

## 3 - Linear regression

The gradient descent update for linear regression is: $$\mathbf{w}_{i+1} = \mathbf{w}_i - \alpha_i \sum_{j=0}^N (\mathbf{w}_i^\top\mathbf{x}_j - y_j) \mathbf{x}_j \,$$

where:

• $i$ is the iteration number of the gradient descent algorithm,
• $j$ identifies the observation.
• $N$ identifies the number of observations.
• $(\mathbf{w}^\top \mathbf{x} - y) \mathbf{x} \,$ is the summand
• $y$ is the target value
• $\mathbf{x}$ is a features vector.
• $\mathbf{w}_i$ is the weights vector for the iteration $i$ (when starting they are all null). $\mathbf{w}_0 = [0,0,0,\dots, 0]$
• $\alpha_i$ is the step. $\alpha_i = \frac\alpha_{i-1}}n * \sqrt{i+1}$ . alpha start with the value 1. $\alpha_0 = 1$
exampleW = [1, 1, 1]
gradientSummand = (dot([1 1 1], [3 1 4]) - 2) * [3 1 4] = (8 - 2) * [3 1 4] = [18 6 24]