NormalizeEmbedding

Normalizes an embedding vector. If specified, the dimension parameter reduces the number of vector dimensions to use before normalizing.

Format

NormalizeEmbedding ( data { ; dimension } )

Parameters

data - any text expression, text field, or container field that contains an embedding vector.

dimension - the number of vector dimensions to use for normalization. If omitted, or if the value is greater than the vector's actual dimension size or less than or equal to 0, the entire vector dimension size is used in the calculation.

Parameters in braces { } are optional.

Data type returned

text, container

Originated in version

22.0

Description

This function returns a normalized version of the input embedding vector. Normalizing a vector means scaling its values so that its length (magnitude) becomes 1. This is often a required step before performing calculations like cosine similarity, because it ensures that the similarity measure is based only on the direction of the vectors, not their magnitude.

If the data parameter is text, it must be in the form of a JSON array containing floating-point numbers—for example, [-0.1, 0.5, ...]. Typically, though, using embedding vectors as binary container data improves performance.

The dimension parameter allows you to normalize the vector based only on a specified number of its initial dimensions. If dimension is specified, the function calculates the magnitude using only the first 'dimension' elements and then scales the entire original vector based on that magnitude. The returned vector has the same number of dimensions as the input vector, unless the dimension parameter is used to truncate the vector before normalization.

Notes

Most embedding models generate embedding vectors that are already normalized (unit length). In such cases, calling NormalizeEmbedding for these vectors isn't necessary and simply returns the original vector. You typically need to use this function only if you are working with embedding vectors generated by a model that doesn't output normalized vectors, or if you specifically need to normalize based on a subset of the vector's dimensions.
Using the optional dimension parameter can be useful if you want to work with a smaller, fixed-size representation of a larger vector while maintaining comparability based on the initial dimensions.

Example 1

NormalizeEmbedding ( "[3, 4]" ) returns [0.5999999999999999778, 0.80000000000000004441], which for the purpose of illustration, is approximately [0.6, 0.8]. The original vector [3, 4] has a length of Sqrt(3^2 + 4^2) = 5. The normalized vector [0.6, 0.8] has a length of Sqrt(0.6^2 + 0.8^2) = 1.

Example 2

NormalizeEmbedding ( Table::EmbeddingData; 256 ) returns a new vector containing only the first 256 dimensions of the original vector, normalized so that its length is 1. The embedding vector is stored in a container field named Table::EmbeddingData. This is useful if your embedding model produces large vectors, but you only want to use a smaller, fixed number of dimensions for your cosine similarity calculations and need those dimensions to be normalized.