Skip to main content

Searches

As collections with named vectors can include multiple vectors, any vector or similarity search must specify a "target" vector.

This applies for near_text and near_vector searches, as well as multimodal searches such as near_image and so on. Let's explore a few examples here.

Text searches

Code

Here, we look for entries in "MovieNVDemo" based on their similarity to the phrase "A joyful holiday film". Note, however, that we show multiple versions of the same query, each with a different target_vector parameter:

import weaviate
import weaviate.classes.query as wq
import os


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")} # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)

# Get the collection
movies = client.collections.get("MovieNVDemo")

# Perform query
response = movies.query.near_text(
query="A joyful holiday film",
target_vector="title", # The target vector to search against
limit=5,
return_metadata=wq.MetadataQuery(distance=True),
return_properties=["title", "release_date", "tmdb_id", "poster"]
)

# Inspect the response
for o in response.objects:
print(
o.properties["title"], o.properties["release_date"].year, o.properties["tmdb_id"]
) # Print the title and release year (note the release date is a datetime object)
print(
f"Distance to query: {o.metadata.distance:.3f}\n"
) # Print the distance of the object from the query

client.close()

Explain the code

Each named vector here is based on a different property of the movie data.

The first search compares the meaning of the movie title with the query, the second search compares the entire summary (overview) with the query, and the third compares the poster (and the title) with the query.

Weaviate also allows each named vector to be set with a different vectorizer. You will recall that the poster_title vector is created by the CLIP models, while the title and overview properties are created by the OpenAI model.

As a result, each named vector can be further specialized by using the right model for the right property.

Explain the results

The results of each search are different, as they are based on different properties of the movie data.

title vs overview

Note that the search with 'overview' target vector includes titles like "Home Alone" and "Home Alone 2: Lost in New York", which are not included in the other searches.

This is because the plot of these movies are holiday-themed, even though the titles are not obviously joyful or holiday-related.

poster

The search with poster_title target vector interestingly includes "Misery" - the Stephen King horror movie! This is very likely because the poster of the movie is a snowy scene. And since the CLIP vectorizer is trained to identify elements of images, it identifies this terrifying film as a result of the search.

Given the imagery of the poster only and no other context, you would have to say that the search isn't wrong, even though anyone who's read the book or watched the movie would agree.

How the Grinch Stole Christmas 2000 8871
Distance to query: 0.162

The Nightmare Before Christmas 1993 9479
Distance to query: 0.177

The Pursuit of Happyness 2006 1402
Distance to query: 0.182

Jingle All the Way 1996 9279
Distance to query: 0.184

Mrs. Doubtfire 1993 788
Distance to query: 0.189

Code

This example finds entries in "MovieNVDemo" with the highest hybrid search scores for the term "history", and prints out the title and release year of the top 5 matches.

import weaviate
import weaviate.classes.query as wq
import os


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")} # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)

# Get the collection
movies = client.collections.get("MovieNVDemo")

# Perform query
response = movies.query.hybrid(
query="history",
target_vector="overview", # The target vector to search against
limit=5,
return_metadata=wq.MetadataQuery(score=True)
)

# Inspect the response
for o in response.objects:
print(
o.properties["title"], o.properties["release_date"].year
) # Print the title and release year (note the release date is a datetime object)
print(
f"Hybrid score: {o.metadata.score:.3f}\n"
) # Print the hybrid search score of the object from the query

client.close()

Explain the code

Hybrid search with named vectors work the same way as with other vector searches with named vectors. You must provide a target_vector parameter to specify the named vector for the vector search component of the hybrid search.

As named vectors affect the vector representations of objects, they do not affect keyword searches. You can perform keyword searches on named vector collections using the same syntax as you would for any other collections.

The use of named vectors enables flexible search options that can be tailored to your needs.

Each object can have as many named vectors as you would like, with any combinations of properties and vectorizers, or even multiple custom vectors provided by your own models.

This flexibility allows you to create databases with vector representations that are tailored to your specific use case, and to search for similar items based on any combination of properties.

What about RAG?

RAG, or retrieval augmented generation, queries with named vectors work the same way as with other vector searches with named vectors. You must provide a target_vector parameter to specify the named vector for the vector search component of the RAG query.

This, in turn, can improve the quality of the generation. Let's explore a few examples in the next section.

Questions and feedback

If you have any questions or feedback, please let us know on our forum. For example, you can: