Create a collection
Weaviate stores data in "collections". A collection is a set of objects that share the same data structure. In our movie database, we might have a collection of movies, a collection of actors, and a collection of reviews.
Here we will create a collection of movies.
Code
This example creates a collection for the movie data:
import weaviate
import weaviate.classes.config as wc
# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")} # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)
client.collections.create(
name="MovieMM", # The name of the collection ('MM' for multimodal)
properties=[
wc.Property(name="title", data_type=wc.DataType.TEXT),
wc.Property(name="overview", data_type=wc.DataType.TEXT),
wc.Property(name="vote_average", data_type=wc.DataType.NUMBER),
wc.Property(name="genre_ids", data_type=wc.DataType.INT_ARRAY),
wc.Property(name="release_date", data_type=wc.DataType.DATE),
wc.Property(name="tmdb_id", data_type=wc.DataType.INT),
wc.Property(name="poster", data_type=wc.DataType.BLOB),
],
# Define & configure the vectorizer module
vectorizer_config=wc.Configure.Vectorizer.multi2vec_clip(
image_fields=[wc.Multi2VecField(name="poster", weight=0.9)], # 90% of the vector is from the poster
text_fields=[wc.Multi2VecField(name="title", weight=0.1)], # 10% of the vector is from the title
),
# Define the generative module
generative_config=wc.Configure.Generative.openai()
)
client.close()
Each collection definition must have a name. Then, you can define additional parameters like we've done in this example.
Explain the code
Properties
Properties are the object attributes that you want to store in the collection. Each property has a name and a data type.
In our movie database, we have properties like title
, release_date
and genre_ids
, with data types like TEXT
(string), DATE
(date), or INT
(integer). It's also possible to have arrays of integers, like we have with genre_ids
.
As a multimodal object, we also have the poster
property which is the image data, which is saved as a BLOB
(binary large object) data type.
Auto-schema
Weaviate can automatically infer the schema from the data. However, it's a good practice to define the properties explicitly, for better control and to avoid surprises.
Vectorizer configuration
If you do not specify the vector yourself, Weaviate will use a specified vectorizer to generate vector embeddings from your data.
In this code example, we specify the multi2vec-clip
module. This module uses the CLIP model to generate vector embeddings from the text and image data.
You can specify any number of text and image properties to be used for vectorization, and weight them differently. The weights are used to determine the relative importance of each property in the vector embedding generation process. In this example, we vectorize the poster
property (an image) with a 90% weight and the title
property (a string) with a 10% weight.
vectorizer_config=wc.Configure.Vectorizer.multi2vec_clip(
image_fields=[wc.Multi2VecField(name="poster", weight=0.9)], # 90% of the vector is from the poster
text_fields=[wc.Multi2VecField(name="title", weight=0.1)], # 10% of the vector is from the title
),
Generative configuration
If you wish to use your collection with a generative model (e.g. a large language model), you must specify the generative module.
In this code example, we specify the openai
module (generative-openai
is the full name) with default options.
generative_config=wc.Configure.Generative.openai()
A collection's generative
model integration configuration is mutable from v1.25.23
, v1.26.8
and v1.27.1
. See this section for details on how to update the collection configuration.
Python classes
The code example makes use of classes such as Property
, DataType
and Configure
. They are defined in the weaviate.classes.config
submodule and are used to define the collection.
For convenience, we import the submodule as wc
and use classes from it.
import weaviate.classes.config as wc
Questions and feedback
If you have any questions or feedback, let us know in the user forum.