Indexing
Introductionโ
Weaviate supports two types of indices.
- An approximate nearest neighbor index (ANN) - the ANN index is used to serve all vector-search queries.
- An inverted index - the inverted index allows for filtering by properties, as well as serve BM25 queries.
You can configure indices in Weaviate per class. One of Weaviate's core strengths is combining the ANN index with an inverted index.
Some things to bear in mind:
- Especially for large datasets, configuring the indices is important because the more you index, the more storage is needed.
- A rule of thumb -- if you don't query over a specific field or vector space, don't index it.
- One of Weaviate's unique features is how the indices are configured (learn more about this here).
ANN indexingโ
What's important to know, is that the "A" in ANN (i.e., the "approximate") comes with a trade-off. That is, the index is approximate and, therefore not always 100% accurate. This is what the experts mean when they talk about the "recall of the algorithm."
There are different ANN algorithms, you can find a nice overview of them on this website. Only those algorithms which support CRUD can be used in Weaviate (we want that sweet database UX) and Weaviate's ANN system is completely plug-and-playable so that we can always add other algorithms in the future.
Because vector search use cases are growing rapidly, more and more ANN-algorithm are produced. A "good" ANN algorithm means that the recall is high and that it's fast. You can dive into the rabbit hole right here. But! Don't be like Alice; just make sure to come back here.
Let's take a look a few ANN settings in an example schema.
(note that we've removed some JSON that's irrelevant to the topic at hand).
{
"classes": [
{
"class": "Publication",
"properties": [],
"vectorIndexType": "hnsw" // <== the current ANN algorithm
"vectorIndexConfig": { // <== the vector index settings
"skip": false,
"cleanupIntervalSeconds": 300,
"pq": {"enabled": False,}
"maxConnections": 64,
"efConstruction": 128,
"ef": -1,
"dynamicEfMin": 100,
"dynamicEfMax": 500,
"dynamicEfFactor": 8,
"vectorCacheMaxObjects": 2000000,
"flatSearchCutoff": 40000,
"distance": "cosine"
}
},
{ } // <== the Author class
]
}
As shown above, there are quite a few configurable parameters available for an ANN index. Modifying them can affect Weaviate's performance, such as tradeoffs between the recall performance and query time, or between query time and import time.
Read more below on:
The ANN benchmark page contains a wide variety of vector search use cases and relative benchmarks. This page is ideal for finding a dataset similar to yours and learning what the most optimal settings are.
Module configurationโ
You can use Weaviate with or without modules. To use Weaviate with modules, you must configure them in the schema.
An example configuration:
{
"class": "Author",
"moduleConfig": { // <== module config on class level
"text2vec-transformers": { // <== the name of the module (in this case `text2vec-transformers`)
// the settings based on the choosed modules
}
},
"properties": [ ]
}
When using vectorizers, you need to set vectorization at the class and property level. If you use text vectorizers, the way the vectorizers work is explained here.
{
"class": "Author",
"moduleConfig": { // <== class level configuration
"text2vec-transformers": { // <== name of the module
"vectorizeClassName": false // <== vectorize the class name?
}
},
"properties": [{
"moduleConfig": { // <== property level configuration
"text2vec-transformers": { // <== name of the module
"skip": false, // <== skip this `string` for vectorization?
"vectorizePropertyName": false // <== vectorize the property name?
}
},
"dataType": [
"text"
],
"name": "name"
}]
}
Because Weaviate's vectorizer module configuration is set on class and property level, you can have multiple vectorizers for different classes. You can even mix multimodal, NLP, and image modules.
Recapโ
- The ANN index needs to be set for your use case (especially if you have a large dataset)
- You can enable or disable the index based on your use case
- You can configure Weaviate modules in the schema
More Resourcesโ
If you can't find the answer to your question here, please look at the:
- Frequently Asked Questions. Or,
- Knowledge base of old issues. Or,
- For questions: Stackoverflow. Or,
- For more involved discussion: Weaviate Community Forum. Or,
- We also have a Slack channel.