Skip to main content

Manage collections

Every object in Weaviate belongs to exactly one collection. Use the examples on this page to manage your collections.

Terminology

Newer Weaviate documentation discuses "collections." Older Weaviate documentation refers to "classes" instead. Expect to see both terms throughout the documentation.

Create a collection

To create a collection, specify at least the collection name. If you don't specify any properties, auto-schema creates them.

Capitalization

Weaviate follows GraphQL naming conventions.

  • Start collection names with an upper case letter.
  • Start property names with a lower case letter.

If you use an initial upper case letter to define a property name, Weaviate changes it to a lower case letter internally.

client.collections.create("Article")

Create a collection and define properties

Properties are the data fields in your collection. Each property has a name and a data type.

Additional information

Use properties to configure additional parameters such as data type, index characteristics, or tokenization.

For details, see:

from weaviate.classes.config import Property, DataType

# Note that you can use `client.collections.create_from_dict()` to create a collection from a v3-client-style JSON object
client.collections.create(
"Article",
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
]
)

Disable auto-schema

By default, Weaviate creates missing collections and missing properties. When you configure collections manually, you have more precise control of the collection settings.

To disable auto-schema set AUTOSCHEMA_ENABLED: 'false' in your system configuration file.

Specify a vectorizer

Specify a vectorizer for a collection.

Additional information

Collection level settings override default values and general configuration parameters such as environment variables.

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
"Article",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
properties=[ # properties configuration is optional
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
]
)

Define multiple (named) vectors

Added in v1.24

You can define multiple named vectors per collection. This allows each object to be represented by multiple vectors, such as a text vector and an image vector, or a title vector and a body vector.

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
"ArticleNV",
vectorizer_config=[
# Set a named vector
Configure.NamedVectors.text2vec_cohere( # Use the "text2vec-cohere" vectorizer
name="title", source_properties=["title"] # Set the source property(ies)
),
# Set another named vector
Configure.NamedVectors.text2vec_openai( # Use the "text2vec-openai" vectorizer
name="body", source_properties=["body"] # Set the source property(ies)
),
# Set another named vector
Configure.NamedVectors.text2vec_openai( # Use the "text2vec-openai" vectorizer
name="title_country", source_properties=["title", "country"] # Set the source property(ies)
)
],
properties=[ # Define properties
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
Property(name="country", data_type=DataType.TEXT),
],
)

Specify vectorizer settings

To configure how a vectorizer works (i.e. what model to use) with a specific collection, set the vectorizer parameters.

from weaviate.classes.config import Configure

client.collections.create(
"Article",
vectorizer_config=Configure.Vectorizer.text2vec_cohere(
model="embed-multilingual-v2.0",
vectorize_collection_name=True
),
)

Set vector index type

The vector index type can be set for each collection between hnsw, flat and dynamic index types.

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
"Article",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
vector_index_config=Configure.VectorIndex.hnsw(), # Use the HNSW index
# vector_index_config=Configure.VectorIndex.flat(), # Use the FLAT index
# vector_index_config=Configure.VectorIndex.dynamic(), # Use the DYNAMIC index
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
]
)
Additional information

Set vector index parameters

Various vector index parameters are configurable at collection creation time, including compression.

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
"Article",
# Additional configuration not shown
vector_index_config=Configure.VectorIndex.flat(
quantizer=Configure.VectorIndex.Quantizer.bq(
rescore_limit=200,
cache=True
),
vector_cache_max_objects=100000
),
)
Additional information

Property-level settings

Configure each property to choose whether to vectorize property name, include property in vectorization, and choose tokenization type.

from weaviate.classes.config import Configure, Property, DataType, Tokenization

client.collections.create(
"Article",
vectorizer_config=Configure.Vectorizer.text2vec_huggingface(),

properties=[
Property(
name="title",
data_type=DataType.TEXT,
vectorize_property_name=True, # Use "title" as part of the value to vectorize
tokenization=Tokenization.LOWERCASE # Use "lowecase" tokenization
),
Property(
name="body",
data_type=DataType.TEXT,
skip_vectorization=True, # Don't vectorize this property
tokenization=Tokenization.WHITESPACE # Use "whitespace" tokenization
),
]
)

Specify a distance metric

If you choose to bring your own vectors, you should specify the distance metric.

from weaviate.classes.config import Configure, VectorDistances

client.collections.create(
"Article",
vector_index_config=Configure.VectorIndex.hnsw(
distance_metric=VectorDistances.COSINE
),
)
Additional information

For details on the configuration parameters, see the following:

Set inverted index parameters

Various inverted index parameters are configurable for each collection. Some parameters are set at the collection level, while others are set at the property level.

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
"Article",
# Additional settings not shown
properties=[ # properties configuration is optional
Property(
name="title",
data_type=DataType.TEXT,
index_filterable=True,
index_searchable=True,
),
],
inverted_index_config=Configure.inverted_index( # Optional
bm25_b=0.7,
bm25_k1=1.25,
index_null_state=True,
index_property_length=True,
index_timestamps=True
)
)

Specify a generative module

Specify a generative module for a collection (for RAG).

Additional information
from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
"Article",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
generative_config=Configure.Generative.openai(),
)

Specify a generative model name

Specify a specific generative model name.

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
"Article",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
generative_config=Configure.Generative.openai(
model="gpt-4"
),
)

Replication settings

Replication factor change in v1.25

In Weaviate v1.25, a replication factor cannot be changed once it is set.

This is due to the schema consensus algorithm change in v1.25. This will be improved in future versions.

Configure replication per collection.

from weaviate.classes.config import Configure

client.collections.create(
"Article",
replication_config=Configure.replication(
factor=3
)
)
Additional information

To use replication factors greater than one, use a multi-node deployment.

For details on the configuration parameters, see the following:

Sharding settings

Configure sharding per collection.

from weaviate.classes.config import Configure

client.collections.create(
"Article",
sharding_config=Configure.sharding(
virtual_per_physical=128,
desired_count=1,
actual_count=1,
desired_virtual_count=128,
actual_virtual_count=128,
)
)
Additional information

For details on the configuration parameters, see the following:

Multi-tenancy

Added in v1.20

Create a collection with multi-tenancy enabled.

from weaviate.classes.config import Configure

client.collections.create(
"Article",
multi_tenancy_config=Configure.multi_tenancy(True)
)

Read a single collection definition

Retrieve a collection definition from the schema.

articles = client.collections.get("Article")
articles_config = articles.config.get()

print(articles_config)
Sample configuration: Text objects

This configuration for text objects defines the following:

  • The collection name (Article)
  • The vectorizer module (text2vec-cohere) and model (embed-multilingual-v2.0)
  • A set of properties (title, body) with text data types.
{
"class": "Article",
"vectorizer": "text2vec-cohere",
"moduleConfig": {
"text2vec-cohere": {
"model": "embed-multilingual-v2.0",
},
},
"properties": [
{
"name": "title",
"dataType": ["text"]
},
{
"name": "body",
"dataType": ["text"]
},
],
}
Sample configuration: Nested objects
Added in v1.22

This configuration for nested objects defines the following:

  • The collection name (Person)

  • The vectorizer module (text2vec-huggingface)

  • A set of properties (last_name, address)

    • last_name has text data type
    • address has object data type
  • The address property has two nested properties (street and city)

{
"class": "Person",
"vectorizer": "text2vec-huggingface",
"properties": [
{
"dataType": ["text"],
"name": "last_name",
},
{
"dataType": ["object"],
"name": "address",
"nestedProperties": [
{"dataType": ["text"], "name": "street"},
{"dataType": ["text"], "name": "city"}
],
}
],
}
Sample configuration: Generative search

This configuration for generative search defines the following:

  • The collection name (Article)
  • The default vectorizer module (text2vec-openai)
  • The generative module (generative-openai)
  • A set of properties (title, chunk, chunk_no and url)
  • The tokenization option for the url property
  • The vectorization option (skip vectorization) for the url property
{
"class": "Article",
"vectorizer": "text2vec-openai",
"vectorIndexConfig": {
"distance": "cosine",
},
"moduleConfig": {
"generative-openai": {}
},
"properties": [
{
"name": "title",
"dataType": ["text"]
},
{
"name": "chunk",
"dataType": ["text"]
},
{
"name": "chunk_no",
"dataType": ["int"]
},
{
"name": "url",
"dataType": ["text"],
"tokenization": "field",
"moduleConfig": {
"text2vec-openai": {
"skip": true
},
}
},
],
}

Sample configuration: Images

This configuration for image search defines the following:

  • The collection name (Image)

  • The vectorizer module (img2vec-neural)

    • The image property configures collection to store image data.
  • The vector index distance metric (cosine)

  • A set of properties (image), with the image property set as blob.

For image searches, see Image search.

{
"class": "Image",
"vectorizer": "img2vec-neural",
"vectorIndexConfig": {
"distance": "cosine",
},
"moduleConfig": {
"img2vec-neural": {
"imageFields": [
"image"
]
}
},
"properties": [
{
"name": "image",
"dataType": ["blob"]
},
],
}

Read all collection definitions

Fetch the database schema to retrieve all of the collection definitions.

response = client.collections.list_all(simple=False)

print(response)

Update a collection definition

Replication factor change in v1.25

In Weaviate v1.25, a replication factor cannot be changed once it is set.

This is due to the schema consensus algorithm change in v1.25. This will be improved in future versions.

You can update a collection definition to change the mutable collection settings.

from weaviate.classes.config import Reconfigure

articles = client.collections.get("Article")

# Update the collection definition
articles.config.update(
inverted_index_config=Reconfigure.inverted_index(
bm25_k1=1.5
)
)
articles = client.collections.get("Article")

article_shards = articles.config.update_shards(
status="READONLY",
shard_name="shard-1234"
)

print(article_shards)

Update a parameter

Some parameters cannot be modified after you create your collection.

from weaviate.classes.config import Reconfigure

# Get the Article collection object
articles = client.collections.get("Article")

# Update the collection configuration
articles.config.update(
# Note, use Reconfigure here (not Configure)
inverted_index_config=Reconfigure.inverted_index(
stopwords_removals=["a", "the"]
)
)

Delete a collection

You can delete any unwanted collection(s), along with the data that they contain.

Deleting a collection also deletes its objects

When you delete a collection, you delete all associated objects!

Be very careful with deletes on a production database and anywhere else that you have important data.

This code deletes a collection and its objects.

# delete collection "Article" - THIS WILL DELETE THE COLLECTION AND ALL ITS DATA
client.collections.delete("Article") # Replace with your collection name

Add a property

Limitations when adding a property after importing objects

Adding a property after importing objects can lead to limitations in inverted-index related behavior.


This is caused by the inverted index being built at import time. If you add a property after importing objects, the inverted index will not be updated. This means that the new property will not be indexed for existing objects. This can lead to unexpected behavior when querying.


To avoid this, you can either:

  • Add the property before importing objects.
  • Delete the collection, re-create it with the new property and then re-import the data.

We are working on a re-indexing API to allow you to re-index the data after adding a property. This will be available in a future release.

from weaviate.classes.config import Property, DataType

articles = client.collections.get("Article")

articles.config.add_property(
Property(
name="onHomepage",
data_type=DataType.BOOL
)
)

Inspect shards (for a collection)

An index itself can be comprised of multiple shards.

articles = client.collections.get("Article")

article_shards = articles.config.get_shards()
print(article_shards)

Update shard status

You can manually a shard to READY from READONLY, for example after disk pressure has been lowered.

articles = client.collections.get("Article")

article_shards = articles.config.update_shards(
status="READONLY",
shard_name="shard-1234"
)

print(article_shards)

Further resources

References

Background knowledge

Questions and feedback

If you have any questions or feedback, let us know in the user forum.