Skip to main content

How to configure a collection schema

Overview

This page describes collection schemas in Weaviate.

New terminology

Weaviate client APIs are transitioning from the term "class" to "collection."

Older Weaviate documentation refers to "classes." Newer documentation uses "collections." Expect to see both terms during the transition period.

Auto-schema

We recommend that you define your schema manually to ensure that it aligns with your specific requirements. However, Weaviate also provides an auto-schema feature.

When a collection definition is missing, or when the schema is inadequate for data import, the auto-schema feature generates a schema. The automatically generated schema is based on the Weaviate system defaults and the properties of the imported objects. For more information, see (Auto-schema).

Create a collection

A schema describes the data objects that make up a collection. To create a collection, follow the example below in your preferred language.

Minimal example

At a minimum, you must specify the class parameter for the collection name.

client.collections.create("Article")

Property definition

You can use the properties field to specify properties for the collection. A collection definition can include any number of properties.

import weaviate.classes as wvc

client.collections.create(
"Article",
properties=[
wvc.Property(name="title", data_type=wvc.DataType.TEXT),
wvc.Property(name="body", data_type=wvc.DataType.TEXT),
]
)

In addition to the property name, you can use properties to configure parameters such as the data type, inverted index tokenization and more.

Specify a vectorizer

You can set an optional vectorizer for each collection. If you specify a vectorizer for a collection, the specification overrides any default values that are present in the general configuration such as environment variables.

The following code sets the text2vec-openai module as the vectorizer for the Article collection.

import weaviate.classes as wvc

client.collections.create(
"Article",
vectorizer_config=wvc.Configure.Vectorizer.text2vec_openai(),
properties=[ # properties configuration is optional
wvc.Property(name="title", data_type=wvc.DataType.TEXT),
wvc.Property(name="body", data_type=wvc.DataType.TEXT),
]
)

Collection level module settings

Configure the moduleConfig parameter at the collection-level to set collection-wide settings for module behavior. For example, you can configure the vectorizer to use a particular model (model), or to vectorize the collection name (vectorizeClassName).

import weaviate.classes as wvc

client.collections.create(
"Article",
vectorizer_config=wvc.Configure.Vectorizer.text2vec_cohere(
model="embed-multilingual-v2.0",
vectorize_class_name=True
),
)

The available parameters vary according to the module. (Learn more).

Property-level module settings

Configure the moduleConfig parameter at the property-level to set property-level settings for module behavior. For example, you can vectorize the property name (vectorizePropertyName), or ignore the property altogether (skip).

import weaviate.classes as wvc

client.collections.create(
"Article",
vectorizer_config=wvc.Configure.Vectorizer.text2vec_huggingface(),

properties=[
wvc.Property(
name="title",
data_type=wvc.DataType.TEXT,
vectorize_property_name=True # use "title" as part of the value to vectorize
),
wvc.Property(
name="body",
data_type=wvc.DataType.TEXT,
skip_vectorization=True # don't vectorize body
),
]
)

The available parameters vary according to the module. (Learn more).

Indexing, sharding and replication settings

You can also set indexing, sharding and replication settings through the schema. For example, you can set a vector index distance metric or a replication factor for a collection.

This code sets the replication factor.

note

You need a multi-node setup to test replication factors greater than 1.

import weaviate.classes as wvc

client.collections.create(
"Article",
vector_index_config=wvc.Configure.vector_index(
distance_metric=wvc.VectorDistance.COSINE
),

replication_config=wvc.Configure.replication(
factor=3
)
)

For details on the configuration parameters, see the following configuration references:

Multi-tenancy

Added in v1.20

To enable multi-tenancy, set multiTenancyConfig to {"enabled": true} in the collection definition.

client.collections.create(
"Article",
multi_tenancy_config=wvc.Configure.multi_tenancy(True)
)
{
"class": "MultiTenancyClass",
"multiTenancyConfig": {"enabled": true}
}

For more details on multi-tenancy operations, see Multi-tenancy operations.

Delete a collection

You can delete any unwanted collection(s), along with the data that they contain.

Deleting a collection == Deleting its objects

Know that deleting a collection will also delete all associated objects!

Do not do this to a production database, or anywhere where you do not wish to delete your data.

Run the code below to delete the relevant collection and its objects.

if (client.collections.exists("Article")):
# delete collection "Article" - THIS WILL DELETE THE COLLECTION AND ALL ITS DATA
client.collections.delete("Article") # Replace with your collection name

Update a collection definition

Some parts of a collection definition are immutable, but you can modify other parts.

The following sections describe how to add a property to a collection and how to modify collection parameters.

Add a property

You can add a new property to an existing collection.

Add new properties to an existing schema one at a time. To add multiple properties, create a list of the new properties. Then, loop through the list to add one new property on each iteration.

import weaviate.classes as wvc

# Get the Article collection object
articles = client.collections.get("Article")

# Add a new property
articles.config.add_property(
additional_property=wvc.Property(
name="body",
data_type=wvc.DataType.TEXT
)
)
Remove or change an existing property

You cannot remove or rename a property that is part of a collection definition. This is due to the high compute cost associated with reindexing the data.

Modify a parameter

You can modify some parameters of a schema as shown below. However, many parameters are immutable and cannot be changed once set.

import weaviate.classes as wvc

# Get the Article collection object
articles = client.collections.get("Article")

# Update the collection configuration
articles.config.update(
# Note, use Reconfigure here (not Configure)
inverted_index_config=wvc.Reconfigure.inverted_index(
stopwords_removals=["a", "the"]
)
)

Get the schema

If you want to review the schema, you can retrieve it as shown below.

collection = client.collections.get("Article")
config = collection.config.get()

# print some of the config properties
print(config.vectorizer)
print(config.inverted_index_config)
print(config.inverted_index_config.stopwords.removals)
print(config.multi_tenancy_config)
print(config.vector_index_config)
print(config.vector_index_config.distance_metric)

The response is a JSON object like the one in this example.

Sample schema
{
"classes": [
{
"class": "Article",
"invertedIndexConfig": {
"bm25": {
"b": 0.75,
"k1": 1.2
},
"cleanupIntervalSeconds": 60,
"stopwords": {
"additions": null,
"preset": "en",
"removals": null
}
},
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text",
"vectorizeClassName": true
}
},
"properties": [
{
"dataType": [
"text"
],
"moduleConfig": {
"text2vec-openai": {
"skip": false,
"vectorizePropertyName": false
}
},
"name": "title",
"tokenization": "word"
},
{
"dataType": [
"text"
],
"moduleConfig": {
"text2vec-openai": {
"skip": false,
"vectorizePropertyName": false
}
},
"name": "body",
"tokenization": "word"
}
],
"replicationConfig": {
"factor": 1
},
"shardingConfig": {
"virtualPerPhysical": 128,
"desiredCount": 1,
"actualCount": 1,
"desiredVirtualCount": 128,
"actualVirtualCount": 128,
"key": "_id",
"strategy": "hash",
"function": "murmur3"
},
"vectorIndexConfig": {
"skip": false,
"cleanupIntervalSeconds": 300,
"maxConnections": 64,
"efConstruction": 128,
"ef": -1,
"dynamicEfMin": 100,
"dynamicEfMax": 500,
"dynamicEfFactor": 8,
"vectorCacheMaxObjects": 1000000000000,
"flatSearchCutoff": 40000,
"distance": "cosine",
"pq": {
"enabled": false,
"bitCompression": false,
"segments": 0,
"centroids": 256,
"encoder": {
"type": "kmeans",
"distribution": "log-normal"
}
}
},
"vectorIndexType": "hnsw",
"vectorizer": "text2vec-openai"
}
]
}