CosineSimilarity
Returns the similarity between two embedding vectors as a number between -1 (opposite) and 1 (similar).
Format
CosineSimilarity ( v1 ; v2 )
Parameters
v1
and v2
- any text expression, text field, or container field that contains embedding vectors.
Data type returned
number
Originated in version
21.0
Description
This function returns a measure of the similarity between two embedding vectors using the cosine method. For embedding vectors, cosine similarity gives a useful measure of how similar two text values are likely to be. Results range from -1 to 1 (inclusive), with values closer to 1 indicating higher semantic similarity, 0 indicating no similarity, and -1 indicating dissimilarity.
If v1 and v2 are text, they must be in the form of JSON arrays. The vectors must also have the same dimensions (the number of elements in the arrays must be the same). Typically, though, using embedding vectors as binary container data improves performance.
Notes
-
Normalized embedding vectors are required. All embedding vectors must be generated from the same model to ensure compatibility and performance; mixing embedding vectors from different models isn't supported.
Example 1
CosineSimilarity ( "[-0.043686170000000003333, 0.042094484000000001456, ... ]" ; "[-0.049242082999999998993, 0.040926795000000001923, ... ]" )
returns .90848158767415143622 for a particular model.
Example 2
CosineSimilarity ( v1 ; v2 )
returns .54682693950088512302 for a particular model when the v1 and v2 fields contain embedding vectors for the text "Claris" and "Claire," respectively.