Skip to main content

Create a collection

Weaviate stores data in "collections". A collection is a set of objects that share the same data structure. In our movie database, we might have a collection of movies, a collection of actors, and a collection of reviews.

Here we will create a collection of movies.

Code

This example creates a collection for the movie data:

import weaviate

import weaviate.classes.config as wc
import os


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")} # Replace with your OpenAI API key
# client = weaviate.connect_to_wcs(..., headers=headers) or
# client = weaviate.connect_to_local(..., headers=headers)

client.collections.create(
name="Movie",
properties=[
wc.Property(name="title", data_type=wc.DataType.TEXT),
wc.Property(name="overview", data_type=wc.DataType.TEXT),
wc.Property(name="vote_average", data_type=wc.DataType.NUMBER),
wc.Property(name="genre_ids", data_type=wc.DataType.INT_ARRAY),
wc.Property(name="release_date", data_type=wc.DataType.DATE),
wc.Property(name="tmdb_id", data_type=wc.DataType.INT),
],
# Define the vectorizer module
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),
# Define the generative module
generative_config=wc.Configure.Generative.openai()
)

client.close()

Each collection definition must have a name. Then, you can define additional parameters like we've done in this example.

Explain the code

Properties

Properties are the object attributes that you want to store in the collection. Each property has a name and a data type.

In our movie database, we have properties like title, release_date and genre_ids, with data types like TEXT (string), DATE (date), or INT (integer). It's also possible to have arrays of integers, like we have with genre_ids.

Auto-schema

Weaviate can automatically infer the schema from the data. However, it's a good practice to define the properties explicitly, for better control and to avoid surprises.

Vectorizer configuration

If you do not specify the vector yourself, Weaviate will use a specified vectorizer to generate vector embeddings from your data.

In this code example, we specify the text2vec-openai module with default options.

    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),

Generative configuration

If you wish to use your collection with a generative model (e.g. a large language model), you must specify the generative module.

In this code example, we specify the openai module (generative-openai is the full name) with default options.

    generative_config=wc.Configure.Generative.openai()

Python classes

The code example makes use of classes such as Property, DataType and Configure. They are defined in the weaviate.classes.config submodule and are used to define the collection.

For convenience, we import the submodule as wc and use classes from it.

import weaviate.classes.config as wc
import os

Questions and feedback

If you have any questions or feedback, please let us know on our forum. For example, you can: