Data Mining - (Stochastic) Gradient descent (SGD)

1 - About

Gradient descent can be used to train various kinds of regression and classification models.

It's an iterative process and therefore is well suited for map reduce process.

3 - Linear regression

The gradient descent update for linear regression is: <MATH> \mathbf{w}_{i+1} = \mathbf{w}_i - \alpha_i \sum_{j=0}^N (\mathbf{w}_i^\top\mathbf{x}_j - y_j) \mathbf{x}_j \, </MATH>

where:

  • <math>i</math> is the iteration number of the gradient descent algorithm,
  • <math>j</math> identifies the observation.
  • <math>N</math> identifies the number of observations.
  • <math>(\mathbf{w}^\top \mathbf{x} - y) \mathbf{x} \,</math> is the summand
  • <math>y</math> is the target value
  • <math>\mathbf{x}</math> is a features vector.
  • <math>\mathbf{w}_i</math> is the weights vector for the iteration <math>i</math> (when starting they are all null). <math>\mathbf{w}_0 = [0,0,0,\dots, 0]</math>
  • <math>\alpha_i</math> is the step. <math>\alpha_i = \frac{\displaystyle \alpha_{i-1}}{\displaystyle n * \sqrt{i+1}}</math> . alpha start with the value 1. <math>\alpha_0 = 1</math>

3.1 - Summand

exampleW = [1, 1, 1]
exampleX = [3, 1, 4]
exampleY = 2.0
gradientSummand = (dot([1 1 1], [3 1 4]) - 2) * [3 1 4] = (8 - 2) * [3 1 4] = [18 6 24]

where:

4 - Implementation

5 - Documentation / Reference

data_mining/gradient_descendent.txt · Last modified: 2015/07/13 22:12 by gerardnico