Indexing
Weaviate supports several types of indexes.
- Vector indexes - a vector index (e.g. HNSW or flat) is used to serve all vector-search queries.
- Inverted indexes - inverted indexes enable BM25 queries, or speed up filtering.
You can configure indexes in Weaviate per collection.
Some things to bear in mind:
- Especially for large datasets, configuring the indexes is important because the more you index, the more storage is needed.
- A rule of thumb -- if you don't query over a specific field or vector space, don't index it.
- One of Weaviate's unique features is how the indexes are configured (learn more about this here).
Vector indexes
A vector index is used to serve all vector-search queries. Weaviate supports multiple types of vector indexes:
- HNSW - an approximate nearest neighbor (ANN) search based vector index. HNSW indexes scale well with large datasets.
- Flat - a vector index that is used for brute-force searches. This is useful for small datasets.
- Dynamic - a vector index that is flat when the dataset is small and switches to HNSW when the dataset is large.
For more information on vector indexes, see the Vector Indexing page.
Inverted indexes
Performance improvements added in Oct 2024
In Weaviate versions v1.24.26
, v1.25.20
, v1.26.6
and v1.27.0
, we introduced performance improvements and bugfixes for the BM25F scoring algorithm.
- The BM25 segment merging algorithm was made faster.
- Improved WAND algorithm to remove exhausted terms from score computation and only do a full sort when necessary.
- Solved a bug in BM25F multi-prop search that could lead to not summing all the query term score for all segments.
- The BM25 scores are now calculated concurrently for multiple segments.
As always, we recommend upgrading to the latest version of Weaviate to benefit from improvements such as these.
BlockMax WAND algorithm
BlockMax WAND algorithm is available in v1.29
as a technical preview. This means that the feature is still under development and may change in future releases, including potential breaking changes. We do not recommend using this feature in production environments at this time.
The BlockMax WAND algorithm is a variant of the WAND algorithm that is used to speed up BM25 and hybrid searches (academic paper). It organizes the inverted index in blocks to enable skipping over blocks that are not relevant to the query. This can significantly reduce the number of documents that need to be scored, improving search performance.
If you are experiencing slow BM25 (or hybrid) searches, try enabling BlockMax WAND to see if it improves performance.
To use BlockMax WAND in Weaviate v1.29
, it must be enabled prior to collection creation. As of this version, Weaviate will not migrate existing collections to use BlockMax WAND.
Enable BlockMax WAND by setting the environment variables USE_BLOCKMAX_WAND
and USE_INVERTED_SEARCHABLE
to true
.
Once enabled, all BM25 and hybrid searches will use BlockMax WAND algorithm for searches, potentially improving search performance.
Even if BlockMax WAND is enabled, any existing collections will continue to use the default (pre-BlockMax WAND) disk structure and search algorithm. To take advantage of BlockMax WAND, you must create new collections after enabling the feature.
Future versions of Weaviate may enable an ability to migrate existing collections to use BlockMax WAND, and potentially make it the default search algorithm.
Example - Scenario 1
- Set
USE_BLOCKMAX_WAND
andUSE_INVERTED_SEARCHABLE
totrue
in the environment variables. - Start Weaviate for the first time.
- Ingest data & use Weaviate as usual.
In this scenario, all new data added to Weaviate will use BlockMax WAND for BM25 and hybrid searches.
Example - Scenario 2
- Run Weaviate without BlockMax WAND enabled.
- Add data to collection
"OldMovies"
. - Enable BlockMax WAND by setting the environment variables.
- Restart Weaviate.
- Create a new collection
"NewMovies"
.
In this scenario, all new data added since enabling BlockMax WAND will use the new search algorithm. However, the "OldMovies"
collection will continue to use the default WAND algorithm.
Due to the nature of the BlockMax WAND algorithm, the scoring of BM25 and hybrid searches may differ slightly from the default WAND algorithm. Additionally BlockMax WAND scores on single and multiple property search may be different due to different IDF and property length normalization calculations. This is expected behavior and is not a bug.
Configure the inverted index
There are three inverted index types in Weaviate:
indexSearchable
- a searchable index for BM25 or hybrid searchindexFilterable
- a match-based index for fast filtering by matching criteriaindexRangeFilters
- a range-based index for filtering by numerical ranges
Each inverted index can be set to true
(on) or false
(off) on a property level. The indexSearchable
and indexFilterable
indexes are on by default, while the indexRangeFilters
index is off by default.
The filterable indexes are only capable of filtering, while the searchable index can be used for both searching and filtering (though not as fast as the filterable index).
So, setting "indexFilterable": false
and "indexSearchable": true
(or not setting it at all) will have the trade-off of worse filtering performance but faster imports (due to only needing to update one index) and lower disk usage.
See the related how-to section to learn how to enable or disable inverted indexes on a property level.
A rule of thumb to follow when determining whether to switch off indexing is: if you will never perform queries based on this property, you can turn it off.
Inverted index types summary
Inverted index type | Description | Applicable data types | Default | Availability |
---|---|---|---|---|
indexSearchable | A searchable index for BM25-suitable Map index for BM25 or hybrid searching. | text , text[] , | true | v1.19 |
indexFilterable | A Roaring Bitmap index for match-based filtering. | Everything except blob , geoCoordinates , object and phoneNumber data types including arrays thereof | true | v1.19 |
indexRangeFilters | A Roaring Bitmap index for numerical range-based filtering. | int , number and date only | false | v1.26 |
- Enable one or both of
indexFilterable
andindexRangeFilters
to index a property for faster filtering.- If only one is enabled, the respective index is used for filtering.
- If both are enabled,
indexRangeFilters
is used for operations involving comparison operators, andindexFilterable
is used for equality and inequality operations.
This chart shows which filter makes the comparison when one or both index type is true
for an applicable property.
Operator | indexRangeFilters only | indexFilterable only | Both enabled |
---|---|---|---|
Equal | indexRangeFilters | indexFilterable | indexFilterable |
Not equal | indexRangeFilters | indexFilterable | indexFilterable |
Greater than | indexRangeFilters | indexFilterable | indexRangeFilters |
Greater than equal | indexRangeFilters | indexFilterable | indexRangeFilters |
Less than | indexRangeFilters | indexFilterable | indexRangeFilters |
Less than equal | indexRangeFilters | indexFilterable | indexRangeFilters |
Inverted index for timestamps
You can also enable an inverted index to search based on timestamps.
Timestamps are currently indexed using the indexFilterable
index.
Collections without indexes
If you don't want to set an index at all, this is possible too.
To create a collection without any indexes, skip indexing on the collection and on the properties.
{
"class": "Author",
"description": "A description of this collection, in this case, it's about authors",
"vectorIndexConfig": {
"skip": true // <== disable vector index
},
"properties": [
{
"indexFilterable": false, // <== disable filterable index for this property
"indexSearchable": false, // <== disable searchable index for this property
"dataType": [
"text"
],
"description": "The name of the Author",
"name": "name"
},
{
"indexFilterable": false, // <== disable filterable index for this property
"dataType": [
"int"
],
"description": "The age of the Author",
"name": "age"
},
{
"indexFilterable": false, // <== disable filterable index for this property
"dataType": [
"date"
],
"description": "The date of birth of the Author",
"name": "born"
},
{
"indexFilterable": false, // <== disable filterable index for this property
"dataType": [
"boolean"
],
"description": "A boolean value if the Author won a nobel prize",
"name": "wonNobelPrize"
},
{
"indexFilterable": false, // <== disable filterable index for this property
"indexSearchable": false, // <== disable searchable index for this property
"dataType": [
"text"
],
"description": "A description of the author",
"name": "description"
}
]
}
Further resources
Questions and feedback
If you have any questions or feedback, let us know in the user forum.