Skip to main content

Create a collection

To use named vectors, your collection be configured with named vector definitions.

Code

This example creates a collection for the movie data, including multiple named vector definitions:

import weaviate

import weaviate.classes.config as wc


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")} # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)

client.collections.create(
name="MovieNVDemo", # The name of the collection ('NV' for named vectors)
properties=[
wc.Property(name="title", data_type=wc.DataType.TEXT),
wc.Property(name="overview", data_type=wc.DataType.TEXT),
wc.Property(name="vote_average", data_type=wc.DataType.NUMBER),
wc.Property(name="genre_ids", data_type=wc.DataType.INT_ARRAY),
wc.Property(name="release_date", data_type=wc.DataType.DATE),
wc.Property(name="tmdb_id", data_type=wc.DataType.INT),
wc.Property(name="poster", data_type=wc.DataType.BLOB),
],
# Define & configure the vectorizer module
vectorizer_config=[
# Vectorize the movie title
wc.Configure.NamedVectors.text2vec_openai(
name="title", source_properties=["title"]
),
# Vectorize the movie overview (summary)
wc.Configure.NamedVectors.text2vec_openai(
name="overview", source_properties=["overview"]
),
# Vectorize the movie poster & title
wc.Configure.NamedVectors.multi2vec_clip(
name="poster_title",
image_fields=[
wc.Multi2VecField(name="poster", weight=0.9)
], # 90% of the vector is from the poster
text_fields=[
wc.Multi2VecField(name="title", weight=0.1)
], # 10% of the vector is from the title
),
],
# Define the generative module
generative_config=wc.Configure.Generative.openai(),
)

client.close()

Explain the code

The key difference here is the use of NamedVectors class to define vectorizer configurations. Let's review the code in further detail:

Revision

This code builds on the multimodal example. Review that example for further explanations.

Named vector configuration

This definition allows each object to be represented by three vectors, named title, overview and poster_title.

        # Vectorize the movie title
wc.Configure.NamedVectors.text2vec_openai(
name="title", source_properties=["title"]
),
# Vectorize the movie overview (summary)
wc.Configure.NamedVectors.text2vec_openai(
name="overview", source_properties=["overview"]
),
# Vectorize the movie poster & title
wc.Configure.NamedVectors.multi2vec_clip(
name="poster_title",
image_fields=[
wc.Multi2VecField(name="poster", weight=0.9)
], # 90% of the vector is from the poster
text_fields=[
wc.Multi2VecField(name="title", weight=0.1)
], # 10% of the vector is from the title
),

title

This vector representation is generated from the title property (source_properties). The text2vec-openai module is used for vectorization.

You could use this to search for movies by similarities to their titles.

overview

This vector representation is based on the overview property. As such, you could use this to search for movies by similarities to their plot or key ideas.

poster_title

This vector representation is generated from a combination of the title and poster properties. The multi2vec-clip module is used for vectorization.

Note that the majority of the vector weight is given to the poster property (90%), and the rest to the title property (10%). This means that the vector representation will be more influenced by the poster than the title.

As this uses a multimodal vectorizer, you could use this to search for movies using any image, or text, by their similarity to the title or poster.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.