Vector search has becoming very useful in deep learning applications.
To search dense vectors in Elasticsearch 8.6, you can use the “dense_vector” data type, which was introduced in Elasticsearch 7.10. This data type allows you to store dense vectors as a single field in your documents, which can then be searched using various similarity measures such as cosine similarity or euclidean distance.
Here’s an example of how to search for similar vectors using cosine similarity:
First, you need to create an index with a dense_vector field:
# Search for documents that are similar to a given vector data = { "query": { "script_score": { "query": {"match_all": {}}, "script": { "source": "cosineSimilarity(params.queryVector, 'my_vector') + 1.0", "params": {"queryVector": [0.1, 0.5, 0.3]} } } } } headers = {'Content-Type': 'application/json', 'Authorization': 'Bearer YOUR_ACCESS_TOKEN'} response = requests.get(url+'/_search', data=json.dumps(data), headers=headers) print(response.json())
Notice that in the cosineSimilarity function, we add 1.0 to the equation to avoid negative error from elastic search something like this: Error: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"script_score script returned an invalid score [-0.9679827] for doc [0]. Must be a non-negative score!"}]
Another to notice is that, we are using cosineSimilarity to calcualte score. However if the vector is normalized or you simply want to calcuate the dot product score of vectors, we switch to use dotProduct. Reason to use `dotProduct’ is becuase less computing and potential faster in elasticsearch. code is for example:
Reprint policy:
All articles in this blog are used except for special statements
CC BY 4.0
reprint policy. If reproduced, please indicate source
robot learner
!