Why Cosine Similarity Scores Cannot Be Universally Compared
Recently, there has been growing interest in understanding the implications of cosine similarity scores, especially in e-commerce applications. As you might expect, this topic touches on several critical aspects of how we assess product relevance.
In conversations with stakeholders, when someone says, ‘the cosine similarity scores don’t seem to make sense,’ they often mean that the scores generated for different queries don’t align with their expectations or understanding of relevance. We’ll explore the reasons behind the context-dependent nature of cosine similarity and illustrate how these nuances impact interpretation.
Cosine similarity is widely used in information retrieval and e-commerce to judge the relevance of products based on their embeddings, but this metric can also lead to significant misunderstandings.
The issue I see with cosine similarity is that contrastive loss shapes the embedding space according to the relative distances between candidates, rather than their absolute relevance scores. This creates challenges in interpreting cosine similarity scores. Because the embedding space is optimized for relative comparisons within a query, cosine similarity scores cannot be compared…