Data Mining - (Stochastic) Gradient descent (SGD)

> (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis)

1 - About

Gradient descent can be used to train various kinds of regression and classification models.

It's an iterative process and therefore is well suited for map reduce process.

3 - Linear regression

The gradient descent update for linear regression is:

<MATH> \mathbf{w}_{i+1} = \mathbf{w}_i - \alpha_i \sum_{j=0}^N (\mathbf{w}_i^\top\mathbf{x}_j - y_j) \mathbf{x}_j \, </MATH>

where:

  • <math>i</math>

    is the iteration number of the gradient descent algorithm,

  • <math>j</math>

    identifies the observation.

  • <math>N</math>

    identifies the number of observations.

  • <math>(\mathbf{w}^\top \mathbf{x} - y) \mathbf{x} \,</math>

    is the summand

  • <math>y</math>

    is the target value

  • <math>\mathbf{x}</math>

    is a features vector.

  • <math>\mathbf{w}_i</math>

    is the weights vector for the iteration

    <math>i</math>

    (when starting they are all null).

    <math>\mathbf{w}_0 = [0,0,0,\dots, 0]</math>
  • <math>\alpha_i</math>

    is the step.

    <math>\alpha_i = \frac{\displaystyle \alpha_{i-1}}{\displaystyle n * \sqrt{i+1}}</math>

    . alpha start with the value 1.

    <math>\alpha_0 = 1</math>
Advertising

3.1 - Summand

exampleW = [1, 1, 1]
exampleX = [3, 1, 4]
exampleY = 2.0
gradientSummand = (dot([1 1 1], [3 1 4]) - 2) * [3 1 4] = (8 - 2) * [3 1 4] = [18 6 24]

where:

4 - Implementation

5 - Documentation / Reference

data_mining/gradient_descendent.txt · Last modified: 2015/07/13 22:12 by gerardnico