Skip to main content

Indexing

LICENSEĀ Weaviate on Stackoverflow badgeĀ Weaviate issues on Github badgeĀ Weaviate version badgeĀ Weaviate total Docker pulls badgeĀ Go Report Card

Introductionā€‹

Weaviate supports two types of indices.

  1. AnĀ approximate nearest neighborĀ index (ANN) - the ANN index is used to serve all vector-search queries.
  2. An inverted index - the inverted index allows for filtering by properties, as well as serve BM25 queries.

You can configureĀ indicesĀ in Weaviate per class. One of Weaviate's core strengths is combining the ANN index with anĀ inverted index.

Some things to bear in mind:

  • Especially for large datasets, configuring theĀ indices is importantĀ because the more you index, the more storage is needed.
  • A rule of thumb -- if you don't query over a specific field or vector space, don't index it.
  • One of Weaviate's unique features is how the indices are regulated (learn more about this here).

ANN indexingā€‹

What's important to know, is that the "A" in ANN (i.e., the "approximate") comes with a trade-off. That is, the index is approximate and, therefore not always 100% accurate. This is what the experts mean when they talk about the "recall of the algorithm."

tip

There are different ANNĀ algorithms, you can find a nice overview of them on this website. Only thoseĀ algorithmsĀ which support CRUD can be used in Weaviate (we want that sweet database UX) and Weaviate's ANN system is completely plug-and-playable so that we can always add otherĀ algorithmsĀ in the future.

note

Because vector search use cases are growing rapidly, more and more ANN-algorithmĀ are produced. A "good" ANNĀ algorithmĀ means that the recall is high and that it's fast. You can dive into the rabbit hole right here. But! Don't be like Alice; just make sure to come back here.

Let's take a look a few ANN settings in an example schema.

(note that we've removed some JSON that's irrelevant to the topic at hand).

{
"classes": [
{
"class": "Publication",
"properties": [],
"vectorIndexType": "hnsw" // <== the current ANN algorithm
"vectorIndexConfig": { // <== the vector index settings
"skip": false,
"cleanupIntervalSeconds": 300,
"maxConnections": 64,
"efConstruction": 128,
"ef": -1,
"dynamicEfMin": 100,
"dynamicEfMax": 500,
"dynamicEfFactor": 8,
"vectorCacheMaxObjects": 2000000,
"flatSearchCutoff": 40000,
"distance": "cosine"
}
},
{ } // <== the Author class
]
}

As shown above, there are quite a few configurable parameters available for an ANN index. Modifying them can affect Weaviate's performance, such as tradeoffs between the recall performance and query time, or between query time and import time.

Read more below on:

note

The ANN benchmark page contains a wide variety of vector search use cases and relative benchmarks. This page is ideal for finding a dataset similar to yours and learning what the most optimal settings are.Ā 

Module configurationā€‹

You can use Weaviate with or without modules. To use Weaviate with modules, you must configure them in the schema.

An example configuration:

{
"class": "Author",
"moduleConfig": { // <== module config on class level
"text2vec-transformers": { // <== the name of the module (in this case `text2vec-transformers`)
// the settings based on the choosed modules
}
},
"properties": [ ]
}

When using vectorizers, you need to set vectorization on the class and property level. In the case you use text vectorizers, the way the vectorizers work is explained here.

{
"class": "Author",
"moduleConfig": { // <== class level configuration
"text2vec-transformers": { // <== name of the module
"vectorizeClassName": false // <== vectorize the class name?
}
},
"properties": [{
"moduleConfig": { // <== property level configuration
"text2vec-transformers": { // <== name of the module
"skip": false, // <== skip this `string` for vectorization?
"vectorizePropertyName": false // <== vectorize the property name?
}
},
"dataType": [
"string"
],
"name": "name"
}]
}
note

Because Weaviate's vectorizer module configuration is set on class and property level, you can have multiple vectorizers for different classes. You can even mix multimodal, NLP, and image modules.

Recapā€‹

  • The ANN index needs to be set for your use case (especially if you have a large dataset)
  • You can enable or disable the index based on your use case
  • You can configure Weaviate modules in the schema

More Resourcesā€‹

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For issues: Github. Or,
  5. Ask your question in the Slack channel: Slack.