CosineSimilarity

Returns the similarity between two embedding vectors as a number between -1 (opposite) and 1 (similar).

Format

CosineSimilarity ( v1 ; v2 )

Parameters

v1 and v2 - any text expression, text field, or container field that contains embedding vectors.

Data type returned

number

Originated in version

21.0

Description

This function returns a measure of the similarity between two embedding vectors using the cosine method. For embedding vectors, cosine similarity gives a useful measure of how similar two text values are likely to be. Results range from -1 to 1 (inclusive), with values closer to 1 indicating higher semantic similarity, 0 indicating no similarity, and -1 indicating dissimilarity.

If v1 and v2 are text, they must be in the form of JSON arrays. The vectors must also have the same dimensions (the number of elements in the arrays must be the same). Typically, though, using embedding vectors as binary container data improves performance.

Notes

Normalized embedding vectors are required. All embedding vectors must be generated from the same model to ensure compatibility and performance; mixing embedding vectors from different models isn't supported.

Example 1

CosineSimilarity ( "[-0.043686170000000003333, 0.042094484000000001456, ... ]" ; "[-0.049242082999999998993, 0.040926795000000001923, ... ]" ) returns .90848158767415143622 for a particular model.

Example 2

CosineSimilarity ( v1 ; v2 ) returns .54682693950088512302 for a particular model when the v1 and v2 fields contain embedding vectors for the text "Claris" and "Claire," respectively.