Skip to main content

Manage collections

Every object in Weaviate belongs to exactly one collection. Use the examples on this page to manage your collections.

Terminology

Newer Weaviate documentation discuses "collections." Older Weaviate documentation refers to "classes" instead. Expect to see both terms throughout the documentation.

Create a collection

To create a collection, specify at least the collection name. If you don't specify any properties, auto-schema creates them.

    client.collections.create("Article")

Create a collection and define properties

Properties are the data fields in your collection. Each property has a name and a data type.

Additional information

Use properties to configure additional parameters such as data type, index characteristics, or tokenization.

For details, see:

    from weaviate.classes.config import Property, DataType

client.collections.create(
"Article",
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
]
)

Disable auto-schema

By default, Weaviate creates missing collections and missing properties. When you configure collections manually, you have more precise control of the collection settings.

To disable auto-schema set AUTOSCHEMA_ENABLED: 'false' in your system configuration file.

Specify a vectorizer

Specify a vectorizer for a collection.

Additional information

Collection level settings override default values and general configuration parameters such as environment variables.

    import weaviate.classes as wvc

client.collections.create(
"Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
properties=[ # properties configuration is optional
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="body", data_type=wvc.config.DataType.TEXT),
]
)

Define multiple (named) vectors

Added in v1.24

You can define multiple named vectors per collection. This allows each object to be represented by multiple vectors, such as a text vector and an image vector, or a title vector and a body vector.

    import weaviate.classes.config as wc

client.collections.create(
"ArticleNV",
properties=[ # Define properties
wc.Property(name="title", data_type=wc.DataType.TEXT),
wc.Property(name="body", data_type=wc.DataType.TEXT),
],
vectorizer_config=[
# Set a named vector
wc.Configure.NamedVectors.text2vec_cohere( # Use the "text2vec-cohere" vectorizer
name="title", source_properties=["title"] # Set the source property(ies)
),
# Set another named vector
wc.Configure.NamedVectors.text2vec_openai( # Use the "text2vec-openai" vectorizer
name="body", source_properties=["body"] # Set the source property(ies)
)
],
)

Specify vectorizer settings

To configure how a vectorizer works (i.e. what model to use) with a specific collection, set the vectorizer parameters.

    import weaviate.classes as wvc

client.collections.create(
"Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_cohere(
model="embed-multilingual-v2.0",
vectorize_collection_name=True
),
)

Set vector index type

The vector index type can be set for each collection between hnsw and flat index types. Compression (pq for hnsw indexes and bq for flat indexes) settings are also available.

    import weaviate.classes as wvc

client.collections.create(
"Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
vector_index_config=wvc.config.Configure.VectorIndex.hnsw(),
properties=[
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="body", data_type=wvc.config.DataType.TEXT),
]
)
Additional information

Property-level settings

Configure each property to choose whether to vectorize property name, include property in vectorization, and choose tokenization type.

    import weaviate.classes as wvc

client.collections.create(
"Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_huggingface(),

properties=[
wvc.config.Property(
name="title",
data_type=wvc.config.DataType.TEXT,
vectorize_property_name=True, # Use "title" as part of the value to vectorize
tokenization=wvc.config.Tokenization.LOWERCASE # Use "lowecase" tokenization
),
wvc.config.Property(
name="body",
data_type=wvc.config.DataType.TEXT,
skip_vectorization=True, # Don't vectorize this property
tokenization=wvc.config.Tokenization.WHITESPACE # Use "whitespace" tokenization
),
]
)

Specify a distance metric

If you choose to bring your own vectors, you should specify the distance metric.

    import weaviate.classes as wvc

client.collections.create(
"Article",
vector_index_config=wvc.config.Configure.VectorIndex.hnsw(
distance_metric=wvc.config.VectorDistances.COSINE
),
)
Additional information

For details on the configuration parameters, see the following:

Specify a generative module

Specify a generative module for a collection (for RAG).

Additional information
    import weaviate.classes as wvc

client.collections.create(
"Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
generative_config=wvc.config.Configure.Generative.openai(),
properties=[ # properties configuration is optional
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="body", data_type=wvc.config.DataType.TEXT),
]
)

Replication settings

Configure replication per collection.

    import weaviate.classes as wvc

client.collections.create(
"Article",
replication_config=wvc.config.Configure.replication(
factor=3
)
)
Additional information

To test replication factors great then one, use a multi-node deployment.

For details on the configuration parameters, see the following:

Sharding settings

Configure sharding per collection.

    import weaviate.classes as wvc

client.collections.create(
"Article",
sharding_config=wvc.config.Configure.sharding(
virtual_per_physical=128,
desired_count=1,
actual_count=1,
desired_virtual_count=128,
actual_virtual_count=128,
)
)
Additional information

For details on the configuration parameters, see the following:

Multi-tenancy

Added in v1.20

Create a collection with multi-tenancy enabled.

    client.collections.create(
"Article",
multi_tenancy_config=wvc.config.Configure.multi_tenancy(True)
)

Read a single collection definition

Retrieve a collection definition from the schema.

    articles = client.collections.get("Article")
articles_config = articles.config.get()

print(articles_config)
Sample configuration: Text objects

This configuration for text objects defines the following:

  • The collection name (Article)
  • The vectorizer module (text2vec-cohere) and model (embed-multilingual-v2.0)
  • A set of properties (title, body) with text data types.
{
"class": "Article",
"vectorizer": "text2vec-cohere",
"moduleConfig": {
"text2vec-cohere": {
"model": "embed-multilingual-v2.0",
},
},
"properties": [
{
"name": "title",
"dataType": ["text"]
},
{
"name": "body",
"dataType": ["text"]
},
],
}
Sample configuration: Nested objects
Added in v1.22

This configuration for nested objects defines the following:

  • The collection name (Person)

  • The vectorizer module (text2vec-huggingface)

  • A set of properties (last_name, address)

    • last_name has text data type
    • address has object data type
  • The address property has two nested properties (street and city)

{
"class": "Person",
"vectorizer": "text2vec-huggingface",
"properties": [
{
"dataType": ["text"],
"name": "last_name",
},
{
"dataType": ["object"],
"name": "address",
"nestedProperties": [
{"dataType": ["text"], "name": "street"},
{"dataType": ["text"], "name": "city"}
],
}
],
}
Sample configuration: Generative search

This configuration for generative search defines the following:

  • The collection name (Article)
  • The default vectorizer module (text2vec-openai)
  • The generative module (generative-openai)
  • A set of properties (title, chunk, chunk_no and url)
  • The tokenization option for the url property
  • The vectorization option (skip vectorization) for the url property
{
"class": "Article",
"vectorizer": "text2vec-openai",
"vectorIndexConfig": {
"distance": "cosine",
},
"moduleConfig": {
"generative-openai": {}
},
"properties": [
{
"name": "title",
"dataType": ["text"]
},
{
"name": "chunk",
"dataType": ["text"]
},
{
"name": "chunk_no",
"dataType": ["int"]
},
{
"name": "url",
"dataType": ["text"],
"tokenization": "field",
"moduleConfig": {
"text2vec-openai": {
"skip": true
},
}
},
],
}

Sample configuration: Images

This configuration for image search defines the following:

  • The collection name (Image)

  • The vectorizer module (img2vec-neural)

    • The image property configures collection to store image data.
  • The vector index distance metric (cosine)

  • A set of properties (image), with the image property set as blob.

For image searches, see Image search.

{
"class": "Image",
"vectorizer": "img2vec-neural",
"vectorIndexConfig": {
"distance": "cosine",
},
"moduleConfig": {
"img2vec-neural": {
"imageFields": [
"image"
]
}
},
"properties": [
{
"name": "image",
"dataType": ["blob"]
},
],
}

Read all collection definitions

Fetch the database schema to retrieve all of the collection definitions.

    response = client.collections.list_all()

print(response)

Update a collection definition

Some definitions cannot be modified after you create your collection.

    import weaviate.classes as wvc

articles = client.collections.get("Article")

# Update the collection definition
articles.config.update(
inverted_index_config=wvc.config.Reconfigure.inverted_index(
bm25_k1=1.5
)
)

Update a parameter

Some parameters cannot be modified after you create your collection.

    import weaviate.classes as wvc

# Get the Article collection object
articles = client.collections.get("Article")

# Update the collection configuration
articles.config.update(
# Note, use Reconfigure here (not Configure)
inverted_index_config=wvc.config.Reconfigure.inverted_index(
stopwords_removals=["a", "the"]
)
)

Delete a collection

You can delete any unwanted collection(s), along with the data that they contain.

Deleting a collection also deletes its objects

When you delete a collection, you delete all associated objects!

Be very careful with deletes on a production database and anywhere else that you have important data.

This code deletes a collection and its objects.

    # delete collection "Article" - THIS WILL DELETE THE COLLECTION AND ALL ITS DATA
client.collections.delete("Article") # Replace with your collection name