How to configure a collection schema
Overview
This page describes collection schemas in Weaviate.
Weaviate client APIs are transitioning from the term "class" to "collection."
Older Weaviate documentation refers to "classes." Newer documentation uses "collections." Expect to see both terms during the transition period.
Auto-schema
We recommend that you define your schema manually to ensure that it aligns with your specific requirements. However, Weaviate also provides an auto-schema feature.
When a collection definition is missing, or when the schema is inadequate for data import, the auto-schema feature generates a schema. The automatically generated schema is based on the Weaviate system defaults and the properties of the imported objects. For more information, see (Auto-schema).
Create a collection
A schema describes the data objects that make up a collection. To create a collection, follow the example below in your preferred language.
Minimal example
At a minimum, you must specify the class
parameter for the collection name.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
client.collections.create("Article")
class_obj = {'class': 'Article'}
client.schema.create_class(class_obj) # returns null on success
const emptyClassDefinition = {
class: 'Article',
};
// Add the class to the schema
let result = await client
.schema
.classCreator()
.withClass(emptyClassDefinition)
.do();
// The returned value is the full class definition, showing all defaults
console.log(JSON.stringify(result, null, 2));
Property definition
You can use the properties
field to specify properties for the collection. A collection definition can include any number of properties.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
import weaviate.classes as wvc
client.collections.create(
"Article",
properties=[
wvc.Property(name="title", data_type=wvc.DataType.TEXT),
wvc.Property(name="body", data_type=wvc.DataType.TEXT),
]
)
class_obj = {
'class': 'Article',
'properties': [
{
'name': 'title',
'dataType': ['text'],
},
{
'name': 'body',
'dataType': ['text'],
},
],
}
client.schema.create_class(class_obj) # returns null on success
const classWithProps = {
class: 'Article',
properties: [
{
name: 'title',
dataType: ['text'],
},
{
name: 'body',
dataType: ['text'],
},
],
};
// Add the class to the schema
result = await client
.schema
.classCreator()
.withClass(classWithProps)
.do();
// The returned value is the full class definition, showing all defaults
console.log(JSON.stringify(result, null, 2));
In addition to the property name, you can use properties to configure parameters such as the data type, inverted index tokenization and more.
Specify a vectorizer
You can set an optional vectorizer
for each collection. If you specify a vectorizer for a collection, the specification overrides any default values that are present in the general configuration such as environment variables.
The following code sets the text2vec-openai
module as the vectorizer for the Article
collection.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
import weaviate.classes as wvc
client.collections.create(
"Article",
vectorizer_config=wvc.Configure.Vectorizer.text2vec_openai(),
properties=[ # properties configuration is optional
wvc.Property(name="title", data_type=wvc.DataType.TEXT),
wvc.Property(name="body", data_type=wvc.DataType.TEXT),
]
)
class_obj = {
'class': 'Article',
'properties': [
{
'name': 'title',
'dataType': ['text'],
},
],
'vectorizer': 'text2vec-openai' # this could be any vectorizer
}
client.schema.create_class(class_obj)
const classWithVectorizer = {
class: 'Article',
properties: [
{
name: 'title',
dataType: ['text'],
},
],
vectorizer: 'text2vec-openai', // this could be any vectorizer
};
// Add the class to the schema
result = await client
.schema
.classCreator()
.withClass(classWithVectorizer)
.do();
// The returned value is the full class definition, showing all defaults
console.log(JSON.stringify(result, null, 2));
Collection level module settings
Configure the moduleConfig
parameter at the collection-level to set collection-wide settings for module behavior. For example, you can configure the vectorizer to use a particular model (model
), or to vectorize the collection name (vectorizeClassName
).
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
import weaviate.classes as wvc
client.collections.create(
"Article",
vectorizer_config=wvc.Configure.Vectorizer.text2vec_cohere(
model="embed-multilingual-v2.0",
vectorize_class_name=True
),
)
class_obj = {
'class': 'Article',
'properties': [
{
'name': 'title',
'dataType': ['text'],
},
],
'vectorizer': 'text2vec-cohere', # this could be any vectorizer
'moduleConfig': {
'text2vec-cohere': { # this must match the vectorizer used
'vectorizeClassName': True,
'model': 'embed-multilingual-v2.0',
}
}
}
client.schema.create_class(class_obj)
const classWithModuleSettings = {
class: 'Article',
properties: [
{
name: 'title',
dataType: ['text'],
},
],
vectorizer: 'text2vec-cohere', // this could be any vectorizer
moduleConfig: {
'text2vec-cohere': { // this must match the vectorizer used
vectorizeClassName: true,
model: 'embed-multilingual-v2.0',
},
},
};
// Add the class to the schema
result = await client
.schema
.classCreator()
.withClass(classWithModuleSettings)
.do();
// The returned value is the full class definition, showing all defaults
console.log(JSON.stringify(result, null, 2));
The available parameters vary according to the module. (Learn more).
Property-level module settings
Configure the moduleConfig
parameter at the property-level to set property-level settings for module behavior. For example, you can vectorize the property name (vectorizePropertyName
), or ignore the property altogether (skip
).
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
import weaviate.classes as wvc
client.collections.create(
"Article",
vectorizer_config=wvc.Configure.Vectorizer.text2vec_huggingface(),
properties=[
wvc.Property(
name="title",
data_type=wvc.DataType.TEXT,
vectorize_property_name=True # use "title" as part of the value to vectorize
),
wvc.Property(
name="body",
data_type=wvc.DataType.TEXT,
skip_vectorization=True # don't vectorize body
),
]
)
class_obj = {
'class': 'Article',
'vectorizer': 'text2vec-huggingface', # this could be any vectorizer
'properties': [
{
'name': 'title',
'dataType': ['text'],
'moduleConfig': {
'text2vec-huggingface': { # this must match the vectorizer used
'skip': False,
'vectorizePropertyName': False
}
}
},
],
}
client.schema.create_class(class_obj)
const classWithPropModuleSettings = {
class: 'Article',
vectorizer: 'text2vec-huggingface', // this could be any vectorizer
properties: [
{
name: 'title',
dataType: ['text'],
moduleConfig: {
'text2vec-huggingface': { // this must match the vectorizer used
skip: false,
vectorizePropertyName: false,
},
},
},
],
};
// Add the class to the schema
result = await client
.schema
.classCreator()
.withClass(classWithPropModuleSettings)
.do();
// The returned value is the full class definition, showing all defaults
console.log(JSON.stringify(result, null, 2));
The available parameters vary according to the module. (Learn more).
Indexing, sharding and replication settings
You can also set indexing, sharding and replication settings through the schema. For example, you can set a vector index distance metric or a replication factor for a collection.
This code sets the replication factor.
You need a multi-node setup to test replication factors greater than 1.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
import weaviate.classes as wvc
client.collections.create(
"Article",
vector_index_config=wvc.Configure.vector_index(
distance_metric=wvc.VectorDistance.COSINE
),
replication_config=wvc.Configure.replication(
factor=3
)
)
class_obj = {
'class': 'Article',
'vectorIndexConfig': {
'distance': 'cosine',
},
'replicationConfig': {
'factor': 3,
},
}
client.schema.create_class(class_obj)
const classWithIndexReplication = {
class: 'Article',
vectorIndexConfig: {
distance: 'cosine',
},
replicationConfig: {
factor: 3,
},
};
// Add the class to the schema
result = await client
.schema
.classCreator()
.withClass(classWithIndexReplication)
.do();
// The returned value is the full class definition, showing all defaults
console.log(JSON.stringify(result, null, 2));
For details on the configuration parameters, see the following configuration references:
Multi-tenancy
v1.20
To enable multi-tenancy, set multiTenancyConfig
to {"enabled": true}
in the collection definition.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
client.collections.create(
"Article",
multi_tenancy_config=wvc.Configure.multi_tenancy(True)
)
class_obj = {
'class': 'Article',
'multiTenancyConfig': {'enabled': True}
}
client.schema.create_class(class_obj) # returns null on success
await client.schema
.classCreator().withClass({
class: 'Article',
multiTenancyConfig: { enabled: true },
})
.do();
{
"class": "MultiTenancyClass",
"multiTenancyConfig": {"enabled": true}
}
For more details on multi-tenancy operations, see Multi-tenancy operations.
Delete a collection
You can delete any unwanted collection(s), along with the data that they contain.
Know that deleting a collection will also delete all associated objects!
Do not do this to a production database, or anywhere where you do not wish to delete your data.
Run the code below to delete the relevant collection and its objects.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
- Go
- Curl
if (client.collections.exists("Article")):
# delete collection "Article" - THIS WILL DELETE THE COLLECTION AND ALL ITS DATA
client.collections.delete("Article") # Replace with your collection name
# delete class "Article" - THIS WILL DELETE ALL DATA IN THIS CLASS
client.schema.delete_class("Article") # Replace with your class name
const className: string = 'YourClassName'; // Replace with your class name
await client.schema
.classDeleter()
.withClassName(className)
.do();
className := "YourClassName"
// delete the class
if err := client.Schema().ClassDeleter().WithClassName(className).Do(context.Background()); err != nil {
// Weaviate will return a 400 if the class does not exist, so this is allowed, only return an error if it's not a 400
if status, ok := err.(*fault.WeaviateClientError); ok && status.StatusCode != http.StatusBadRequest {
panic(err)
}
}
curl \
-X DELETE \
https://some-endpoint.weaviate.network/v1/schema/YourClassName
Update a collection definition
Some parts of a collection definition are immutable, but you can modify other parts.
The following sections describe how to add a property to a collection and how to modify collection parameters.
Add a property
You can add a new property to an existing collection.
Add new properties to an existing schema one at a time. To add multiple properties, create a list of the new properties. Then, loop through the list to add one new property on each iteration.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
import weaviate.classes as wvc
# Get the Article collection object
articles = client.collections.get("Article")
# Add a new property
articles.config.add_property(
additional_property=wvc.Property(
name="body",
data_type=wvc.DataType.TEXT
)
)
add_prop = {
'name': 'body',
'dataType': ['text'],
}
client.schema.property.create('Article', add_prop)
const prop = {
name: 'body',
dataType: ['text'],
};
const resultProp = await client
.schema
.propertyCreator()
.withClassName('Article')
.withProperty(prop)
.do();
// The returned value is full property definition
console.log(JSON.stringify(resultProp, null, 2));
You cannot remove or rename a property that is part of a collection definition. This is due to the high compute cost associated with reindexing the data.
Modify a parameter
You can modify some parameters of a schema as shown below. However, many parameters are immutable and cannot be changed once set.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
import weaviate.classes as wvc
# Get the Article collection object
articles = client.collections.get("Article")
# Update the collection configuration
articles.config.update(
# Note, use Reconfigure here (not Configure)
inverted_index_config=wvc.Reconfigure.inverted_index(
stopwords_removals=["a", "the"]
)
)
class_obj = {
'invertedIndexConfig': {
'stopwords': {
'preset': 'en',
'removals': ['a', 'the']
},
},
}
client.schema.update_config('Article', class_obj)
Coming soon. (Vote for the feature request.)
Get the schema
If you want to review the schema, you can retrieve it as shown below.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
collection = client.collections.get("Article")
config = collection.config.get()
# print some of the config properties
print(config.vectorizer)
print(config.inverted_index_config)
print(config.inverted_index_config.stopwords.removals)
print(config.multi_tenancy_config)
print(config.vector_index_config)
print(config.vector_index_config.distance_metric)
client.schema.get()
const schema = await client
.schema
.getter()
.do();
// The returned value is the entire schema
console.log(JSON.stringify(schema, null, 2));
The response is a JSON object like the one in this example.
Sample schema
{
"classes": [
{
"class": "Article",
"invertedIndexConfig": {
"bm25": {
"b": 0.75,
"k1": 1.2
},
"cleanupIntervalSeconds": 60,
"stopwords": {
"additions": null,
"preset": "en",
"removals": null
}
},
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text",
"vectorizeClassName": true
}
},
"properties": [
{
"dataType": [
"text"
],
"moduleConfig": {
"text2vec-openai": {
"skip": false,
"vectorizePropertyName": false
}
},
"name": "title",
"tokenization": "word"
},
{
"dataType": [
"text"
],
"moduleConfig": {
"text2vec-openai": {
"skip": false,
"vectorizePropertyName": false
}
},
"name": "body",
"tokenization": "word"
}
],
"replicationConfig": {
"factor": 1
},
"shardingConfig": {
"virtualPerPhysical": 128,
"desiredCount": 1,
"actualCount": 1,
"desiredVirtualCount": 128,
"actualVirtualCount": 128,
"key": "_id",
"strategy": "hash",
"function": "murmur3"
},
"vectorIndexConfig": {
"skip": false,
"cleanupIntervalSeconds": 300,
"maxConnections": 64,
"efConstruction": 128,
"ef": -1,
"dynamicEfMin": 100,
"dynamicEfMax": 500,
"dynamicEfFactor": 8,
"vectorCacheMaxObjects": 1000000000000,
"flatSearchCutoff": 40000,
"distance": "cosine",
"pq": {
"enabled": false,
"bitCompression": false,
"segments": 0,
"centroids": 256,
"encoder": {
"type": "kmeans",
"distribution": "log-normal"
}
}
},
"vectorIndexType": "hnsw",
"vectorizer": "text2vec-openai"
}
]
}