Data Mining - Cosine Similarity (Measure of Angle)

> (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis)

1 - About

The cosine similarity is a measure of the angle between two vectors, normalized by magnitude. You just divide the dot product by the magnitude of the two vectors.


3 - Formula

By taking the algebraic and geometric definition of the dot product, we get the cosine similarity that is a normalized dot product of two vectors <MATH> similarity = \cos \theta = \frac{a.b}{||a|| ||b||} = \frac{ \sum a_i b_i }{ \sqrt{\sum a_i^2} \sqrt{\sum b_i^2} } </MATH>

  • If the angle is small (they share many tokens in common), the cosine is large.
  • If the angle is large (and they have few tokens in common), the cosine is small.

4 - Comparison

4.1 - Text

5 - Documentation / Reference