Create a collection
To use named vectors, your collection be configured with named vector definitions.
Code
This example creates a collection for the movie data, including multiple named vector definitions:
import weaviate
import weaviate.classes.config as wc
# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")} # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)
client.collections.create(
name="MovieNVDemo", # The name of the collection ('NV' for named vectors)
properties=[
wc.Property(name="title", data_type=wc.DataType.TEXT),
wc.Property(name="overview", data_type=wc.DataType.TEXT),
wc.Property(name="vote_average", data_type=wc.DataType.NUMBER),
wc.Property(name="genre_ids", data_type=wc.DataType.INT_ARRAY),
wc.Property(name="release_date", data_type=wc.DataType.DATE),
wc.Property(name="tmdb_id", data_type=wc.DataType.INT),
wc.Property(name="poster", data_type=wc.DataType.BLOB),
],
# Define & configure the vectorizer module
vectorizer_config=[
# Vectorize the movie title
wc.Configure.NamedVectors.text2vec_openai(
name="title", source_properties=["title"]
),
# Vectorize the movie overview (summary)
wc.Configure.NamedVectors.text2vec_openai(
name="overview", source_properties=["overview"]
),
# Vectorize the movie poster & title
wc.Configure.NamedVectors.multi2vec_clip(
name="poster_title",
image_fields=[
wc.Multi2VecField(name="poster", weight=0.9)
], # 90% of the vector is from the poster
text_fields=[
wc.Multi2VecField(name="title", weight=0.1)
], # 10% of the vector is from the title
),
],
# Define the generative module
generative_config=wc.Configure.Generative.openai(),
)
client.close()
Explain the code
The key difference here is the use of NamedVectors
class to define vectorizer configurations. Let's review the code in further detail:
This code builds on the multimodal example. Review that example for further explanations.
Named vector configuration
This definition allows each object to be represented by three vectors, named title
, overview
and poster_title
.
# Vectorize the movie title
wc.Configure.NamedVectors.text2vec_openai(
name="title", source_properties=["title"]
),
# Vectorize the movie overview (summary)
wc.Configure.NamedVectors.text2vec_openai(
name="overview", source_properties=["overview"]
),
# Vectorize the movie poster & title
wc.Configure.NamedVectors.multi2vec_clip(
name="poster_title",
image_fields=[
wc.Multi2VecField(name="poster", weight=0.9)
], # 90% of the vector is from the poster
text_fields=[
wc.Multi2VecField(name="title", weight=0.1)
], # 10% of the vector is from the title
),
title
This vector representation is generated from the title
property (source_properties
). The text2vec-openai
module is used for vectorization.
You could use this to search for movies by similarities to their titles.
overview
This vector representation is based on the overview
property. As such, you could use this to search for movies by similarities to their plot or key ideas.
poster_title
This vector representation is generated from a combination of the title
and poster
properties. The multi2vec-clip
module is used for vectorization.
Note that the majority of the vector weight is given to the poster
property (90%), and the rest to the title
property (10%). This means that the vector representation will be more influenced by the poster than the title.
As this uses a multimodal vectorizer, you could use this to search for movies using any image, or text, by their similarity to the title or poster.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.