What is the Cosine Similarity or Cosine Distance? (Measure of Angle)

Thomas Bayes

About

The cosine similarity (or cosine distance) is a distance that measures the angle between two vectors, normalized by magnitude. You just divide the dot product by the magnitude of the two vectors.

Formula

By taking the Linear Algebra - (Dot|Scalar|Inner) Product of two vectors and Linear Algebra - (Dot|Scalar|Inner) Product of two vectors definition of the dot product, we get the cosine similarity that is a normalized dot product of two vectors <MATH> similarity = \cos \theta = \frac{a.b}{||a|| ||b||} = \frac{ \sum a_i b_i }{ \sqrt{\sum a_i^2} \sqrt{\sum b_i^2} } </MATH>

  • If the angle is small (they share many tokens in common), the cosine is large.
  • If the angle is large (and they have few tokens in common), the cosine is small.

Comparison

Text

See Natural Language - Document (Cosine) Similarity

Documentation / Reference





Discover More
Card Puncher Data Processing
Linear Algebra - (Dot|Scalar|Inner) Product of two vectors

A dot Product is the multiplication of two two equal-length sequences of numbers (usually coordinate vectors) that produce a scalar (single number) Dot-product is also known as: scalar product or...
Text Mining
Natural Language - Document (Cosine) Similarity

Cosine similarity applied to document similarity. Each document becomes a vector in some high dimensional space. To compare two documents we compute the cosine of the angle between their two document...
Text Mining
What are models of text in NLP? (Natural Language, Text Modeling)

This page talks model creation for natural language text. ie how to store and represent text ? Let's say that you want to search in a list of documents, documents that are similar on 2 dimensions,...
Thomas Bayes
What is Similarity?

Simliarity is determined as being the closest distance between 2 objects in a set. You can find similarities by looking at: the metadata: Were they created at roughly the same time? Do they tend...
Thomas Bayes
What is a Distance?

Distance is a numerical description of how far apart objects are. Same as: In most cases, “distance from A to B” is interchangeable with “distance between B and A”. In physics...
Text Mining
What is a Term-document Matrix?

A term-document matrix is an important representation for text analytics. Each row of the matrix is a document vector, with one column for every term in the entire corpus. Naturally, some documents...



Share this page:
Follow us:
Task Runner