Data Mining - (Stochastic) Gradient descent (SGD)

Thomas Bayes

About

Gradient descent can be used to train various kinds of regression and classification models.

It's an iterative process and therefore is well suited for map reduce process.

Linear regression

The gradient descent update for linear regression is: <MATH> \mathbf{w}_{i+1} = \mathbf{w}_i - \alpha_i \sum_{j=0}^N (\mathbf{w}_i^\top\mathbf{x}_j - y_j) \mathbf{x}_j \, </MATH>

where:

  • <math>i</math> is the iteration number of the gradient descent algorithm,
  • <math>j</math> identifies the observation.
  • <math>N</math> identifies the number of observations.
  • <math>(\mathbf{w}^\top \mathbf{x} - y) \mathbf{x} \,</math> is the summand
  • <math>y</math> is the target value
  • <math>\mathbf{x}</math> is a features vector.
  • <math>\mathbf{w}_i</math> is the weights vector for the iteration <math>i</math> (when starting they are all null). <math>\mathbf{w}_0 = [0,0,0,\dots, 0]</math>
  • <math>\alpha_i</math> is the step. <math>\alpha_i = \frac{\displaystyle \alpha_{i-1}}{\displaystyle n * \sqrt{i+1}}</math> . alpha start with the value 1. <math>\alpha_0 = 1</math>

Summand

exampleW = [1, 1, 1]
exampleX = [3, 1, 4]
exampleY = 2.0
gradientSummand = (dot([1 1 1], [3 1 4]) - 2) * [3 1 4] = (8 - 2) * [3 1 4] = [18 6 24]

where:

Implementation

Documentation / Reference





Discover More
Thomas Bayes
Convex

Log(loss) is convex, which means that we can use gradient descent to find weights that result in a global minimum. 0 / 1 loss is not convex due to its abrupt decision boundary at z = 0, so it is difficult...



Share this page:
Follow us:
Task Runner