Keyword & Hybrid search
You can also perform keyword (BM25) searches to find items based on their keyword similarity, or hybrid searches that combine BM25 and semantic/vector searches.
Keyword search
Code
This example finds entries in "Movie" with the highest keyword search scores for the term "history", and prints out the title and release year of the top 5 matches.
import weaviate
import weaviate.classes.query as wq
import os
# Instantiate your client (not shown). e.g.:
# headers = {"X-Cohere-Api-Key": os.getenv("COHERE_APIKEY")} # Replace with your Cohere API key
# client = weaviate.connect_to_weaviate_cloud(..., headers=headers) or
# client = weaviate.connect_to_local(..., headers=headers)
# Define a function to call the endpoint and obtain embeddings
from typing import List
import os
import cohere
from cohere import Client as CohereClient
co_token = os.getenv("COHERE_APIKEY")
co = cohere.Client(co_token)
# Define a function to call the endpoint and obtain embeddings
def vectorize(cohere_client: CohereClient, texts: List[str]) -> List[List[float]]:
response = cohere_client.embed(
texts=texts, model="embed-multilingual-v3.0", input_type="search_document"
)
return response.embeddings
# Get the collection
movies = client.collections.get("MovieCustomVector")
# Perform query
response = movies.query.bm25(
query="history", limit=5, return_metadata=wq.MetadataQuery(score=True)
)
# Inspect the response
for o in response.objects:
print(
o.properties["title"], o.properties["release_date"].year
) # Print the title and release year (note the release date is a datetime object)
print(
f"BM25 score: {o.metadata.score:.3f}\n"
) # Print the BM25 score of the object from the query
client.close()
Explain the code
The results are based on a keyword search score using what's called the BM25f algorithm.
The limit
parameter here sets the maximum number of results to return.
The return_metadata
parameter takes an instance of the MetadataQuery
class to set metadata to return in the search results. The current query returns the score
, which is the BM25 score of the result.
Example results
American History X 1998
BM25 score: 2.707
A Beautiful Mind 2001
BM25 score: 1.896
Legends of the Fall 1994
BM25 score: 1.663
Hacksaw Ridge 2016
BM25 score: 1.554
Night at the Museum 2006
BM25 score: 1.529
Hybrid search
Code
This example finds entries in "Movie" with the highest hybrid search scores for the term "history", and prints out the title and release year of the top 5 matches.
import weaviate
import weaviate.classes.query as wq
import os
# Instantiate your client (not shown). e.g.:
# headers = {"X-Cohere-Api-Key": os.getenv("COHERE_APIKEY")} # Replace with your Cohere API key
# client = weaviate.connect_to_weaviate_cloud(..., headers=headers) or
# client = weaviate.connect_to_local(..., headers=headers)
# Define a function to call the endpoint and obtain embeddings
from typing import List
import os
import cohere
from cohere import Client as CohereClient
co_token = os.getenv("COHERE_APIKEY")
co = cohere.Client(co_token)
# Define a function to call the endpoint and obtain embeddings
def vectorize(cohere_client: CohereClient, texts: List[str]) -> List[List[float]]:
response = cohere_client.embed(
texts=texts, model="embed-multilingual-v3.0", input_type="search_document"
)
return response.embeddings
# Get the collection
movies = client.collections.get("MovieCustomVector")
# Perform query
response = movies.query.hybrid(
query="history", # For BM25 part of the hybrid search
vector=query_vector, # For vector part of the hybrid search
limit=5,
return_metadata=wq.MetadataQuery(score=True),
)
# Inspect the response
for o in response.objects:
print(
o.properties["title"], o.properties["release_date"].year
) # Print the title and release year (note the release date is a datetime object)
print(
f"Hybrid score: {o.metadata.score:.3f}\n"
) # Print the hybrid search score of the object from the query
client.close()
Explain the code
The results are based on a hybrid search score. A hybrid search blends results of BM25 and semantic/vector searches.
As we are using custom vectors, we provide the vector manually to the hybrid query using the vector
parameter.
The limit
parameter here sets the maximum number of results to return.
The return_metadata
parameter takes an instance of the MetadataQuery
class to set metadata to return in the search results. The current query returns the score
, which is the hybrid score of the result.
Example results
Night at the Museum 2006
Hybrid score: 0.016
The Butterfly Effect 2004
Hybrid score: 0.014
Legends of the Fall 1994
Hybrid score: 0.014
Hidden Figures 2016
Hybrid score: 0.012
A Beautiful Mind 2001
Hybrid score: 0.012
Questions and feedback
If you have any questions or feedback, let us know in the user forum.