Skip to main content

Vector indexes

Vector indexes facilitate efficient, vector-first data storage and retrieval.

HNSW indexes

HNSW indexes are scalable and super fast at query time, but HNSW algorithms are costly when you add data during the index building process.

HNSW index parameters

Some HNSW parameters are mutable, but others cannot be modified after you create your collection.

ParameterTypeDefaultChangeableDetails
cleanupIntervalSecondsinteger300YesCleanup frequency. This value does not normally need to be adjusted. A higher value means cleanup runs less frequently, but it does more in a single batch. A lower value means cleanup is more frequent, but it may be less efficient on each run.
distancestringcosineNoDistance metric. The metric that measures the distance between two arbitrary vectors. For available distance metrics, see supported distance metrics.
efinteger-1YesBalance search speed and recall. ef is the size of the dynamic list that the HNSW uses during search. Search is more accurate when ef is higher, but it is also slower. ef values greater than 512 show diminishing improvements in recall.

Dynamic ef. Weaviate automatically adjusts the ef value and creates a dynamic ef list when ef is set to -1. For more details, see dynamic ef.
efConstructioninteger128NoBalance index search speed and build speed. A high efConstruction value means you can lower your ef settings, but importing is slower.

efConstruction must be greater than 0.
maxConnectionsinteger32NoMaximum number of connections per element. maxConnections is the connection limit per layer for layers above the zero layer. The zero layer can have (2 * maxConnections) connections.

maxConnections must be greater than 0.
dynamicEfMininteger100YesNew in v1.10.0.

Lower bound for dynamic ef. Protects against a creating search list that is too short.

This setting is only used when ef is -1.
dynamicEfMaxinteger500YesNew in v1.10.0.

Upper bound for dynamic ef. Protects against creating a search list that is too long.

If dynamicEfMax is higher than the limit, dynamicEfMax does not have any effect. In this case, ef is the limit.

This setting is only used when ef is -1.
dynamicEfFactorinteger8YesAdded in v1.10.0.

Multiplier for dynamic ef. Sets the potential length of the search list.

This setting is only used when ef is -1.
filterStrategystringsweepingYesAdded in v1.27.0. The filter strategy to use for filtering the search results. The filter strategy can be set to sweeping or acorn.

- sweeping: The default filter strategy.
- acorn: Uses Weaviate's ACORN implementation. Read more
flatSearchCutoffinteger40000YesOptional. Threshold for the flat-search cutoff. To force a vector index search, set "flatSearchCutoff": 0.
skipbooleanfalseNoWhen true, do not index the collection.

Weaviate decouples vector creation and vector storage. If you skip vector indexing, but a vectorizer is configured (or a vector is provided manually), Weaviate logs a warning each import.

To skip indexing and vector generation, set "vectorizer": "none" when you set "skip": true.

See When to skip indexing.
vectorCacheMaxObjectsinteger1e12YesMaximum number of objects in the memory cache. By default, this limit is set to one trillion (1e12) objects when a new collection is created. For sizing recommendations, see Vector cache considerations.
pqobject--YesEnable and configure product quantization (PQ) compression.

PQ assumes some data has already been loaded. You should have 10,000 to 100,000 vectors per shard loaded before you enable PQ.

For PQ configuration details, see PQ configuration parameters.

Database parameters for HNSW

Note that some database-level parameters are available to configure HNSW indexing behavior.

  • PERSISTENCE_HNSW_MAX_LOG_SIZE is a database-level parameter that sets the maximum size of the HNSW write-ahead-log. The default value is 500MiB.

Increase this value to improve efficiency of the compaction process, but be aware that this will increase the memory usage of the database. Conversely, decreasing this value will reduce memory usage but may slow down the compaction process.

Preferably, the PERSISTENCE_HNSW_MAX_LOG_SIZE should set to a value close to the size of the HNSW graph.

Tombstone cleanup parameters

Environment variable availability
  • TOMBSTONE_DELETION_CONCURRENCY is available in v1.24.0 and up.
  • TOMBSTONE_DELETION_MIN_PER_CYCLE and TOMBSTONE_DELETION_MAX_PER_CYCLE are available in v1.24.15 / v1.25.2 and up.

Tombstones are records that mark deleted objects. In an HNSW index, tombstones are regularly cleaned up, triggered periodically by the cleanupIntervalSeconds parameter.

As the index grows in size, the cleanup process may take longer to complete and require more resources. For very large indexes, this may cause performance issues.

To control the number of tombstones deleted per cleanup cycle and prevent performance issues, set the TOMBSTONE_DELETION_MAX_PER_CYCLE and TOMBSTONE_DELETION_MIN_PER_CYCLE environment variables.

  • Set TOMBSTONE_DELETION_MIN_PER_CYCLE to prevent occurrences of unnecessary cleanup cycles.
  • Set TOMBSTONE_DELETION_MAX_PER_CYCLE to prevent the cleanup process from taking too long and consuming too many resources.

As an example, for a cluster with 300 million objects per shard, a TOMBSTONE_DELETION_MIN_PER_CYCLE value of 1000000 (1 million) and a TOMBSTONE_DELETION_MAX_PER_CYCLE value of 10000000 (10 million) may be good starting points.

You can also set the TOMBSTONE_DELETION_CONCURRENCY environment variable to limit the number of threads used for tombstone cleanup. This can help prevent prevent the cleanup process from unnecessarily consuming too many resources, or the cleanup process from taking too long.

The default value for TOMBSTONE_DELETION_CONCURRENCY is set to half the number of CPU cores available to Weaviate.

In a cluster with a large number of cores, you may want to set TOMBSTONE_DELETION_CONCURRENCY to a lower value to prevent the cleanup process from consuming too many resources. Conversely, in a cluster with a small number of cores and a large number of deletions, you may want to set TOMBSTONE_DELETION_CONCURRENCY to a higher value to speed up the cleanup process.

PQ configuration parameters

Configure pq with these parameters.

ParameterTypeDefaultDetails
enabledbooleanfalseEnable PQ when true.

The Python client v4 does not use the enabled parameter. To enable PQ with the v4 client, set a quantizer in the collection definition.
trainingLimitinteger100000The maximum number of objects, per shard, used to fit the centroids. Larger values increase the time it takes to fit the centroids. Larger values also require more memory.
segmentsinteger--The number of segments to use. The number of vector dimensions must be evenly divisible by the number of segments.

Starting in v1.23, Weaviate uses the number of dimensions to optimize the number of segments.
centroidsinteger256The number of centroids to use (max: 256).

We generally recommend you do not change this value.

Due to the data structure used, smaller centroid value will not result in smaller vectors, but may result in faster compression at cost of recall.
encoderstringkmeansEncoder specification. There are two encoders. You can specify the type of encoder as either kmeans (default) or tile.
distributionstringlog-normalEncoder distribution type. Only used with the tile encoder. If you use the tile encoder, you can specify the distribution as log-normal (default) or normal.

HNSW Configuration tips

To determine reasonable settings for your use case, consider the following questions and compare your answers in the table below:

  1. How many queries do you expect per second?
  2. Do you expect a lot of imports or updates?
  3. How high should the recall be?
Number of queriesMany imports or updatesRecall levelConfiguration suggestions
not manynolowThis is the ideal scenario. Keep both the ef and efConstruction settings low. You don't need a big machine and you will still be happy with the results.
not manynohighHere the tricky thing is that your recall needs to be high. Since you're not expecting a lot of requests or imports, you can increase both the ef and efConstruction settings. Keep increasing them until you are happy with the recall. In this case, you can get pretty close to 100%.
not manyyeslowHere the tricky thing is the high volume of imports and updates. Be sure to keep efConstruction low. Since you don't need a high recall, and you're not expecting a lot of queries, you can adjust the ef setting until you've reached the desired recall.
not manyyeshighThe trade-offs are getting harder. You need high recall and you're dealing with a lot of imports or updates. This means you need to keep the efConstruction setting low, but you can significantly increase your ef setting because your queries per second rate is low.
manynolowMany queries per second means you need a low ef setting. Luckily you don't need high recall so you can significantly increase the efConstruction value.
manynohighMany queries per second means a low ef setting. Since you need a high recall but you are not expecting a lot of imports or updates, you can increase your efConstruction until you've reached the desired recall.
manyyeslowMany queries per second means you need a low ef setting. A high number of imports and updates also means you need a low efConstruction setting. Luckily your recall does not have to be as close to 100% as possible. You can set efConstruction relatively low to support your input or update throughput, and you can use the ef setting to regulate the query per second speed.
manyyeshighAha, this means you're a perfectionist or you have a use case that needs the best of all three worlds. Increase your efConstruction value until you hit the time limit of imports and updates. Next, increase your ef setting until you reach your desired balance of queries per second versus recall.

While many people think they need maximize all three dimensions, in practice that's usually not the case. We leave it up to you to decide, and you can always ask for help in our forum.
tip

This set of values is a good starting point for many use cases.

ParameterValue
ef64
efConstruction128
maxConnections32

Flat indexes

Added in v1.23

Flat indexes are recommended for use cases where the number of objects per index is low, such as in multi-tenancy use cases.

ParameterTypeDefaultChangeableDetails
vectorCacheMaxObjectsinteger1e12YesMaximum number of objects in the memory cache. By default, this limit is set to one trillion (1e12) objects when a new collection is created. For sizing recommendations, see Vector cache considerations.
bqobject--NoEnable and configure binary quantization (BQ) compression.

For BQ configuration details, see BQ configuration parameters.

BQ configuration parameters

Configure bq with these parameters.

ParameterTypeDefaultDetails
enabledbooleanfalseEnable BQ. Weaviate uses binary quantization (BQ) compression when true.
rescoreLimitinteger-1The minimum number of candidates to fetch before rescoring.
cachebooleanfalseWhether to use the vector cache.

Dynamic indexes

Experimental feature

Available starting in v1.25. Dynamic indexing is an experimental feature. Use with caution.

Dynamic index requires ASYNC_INDEXING

Dynamic indexes require asynchronous indexing. To enable asynchronous indexing in a self-hosted Weaviate instance, set the ASYNC_INDEXING environment variable to true. If your instance is hosted in Weaviate Cloud, use the Weaviate Cloud console to enable asynchronous indexing.

Using the dynamic index will initially create a flat index and once the number of objects exceeds a certain threshold (by default 10,000 objects) it will automatically switch you over to an HNSW index.

This is only a one-way switch that converts a flat index to a HNSW, the index does not support changing back to a flat index even if the object count goes below the threshold due to deletion.

The goal of dynamic indexing is to shorten latencies during query time at the cost of a larger memory footprint.

Dynamic index parameters

ParameterTypeDefaultDetails
distancestringcosineDistance metric. The metric that measures the distance between two arbitrary vectors.
hnswobjectdefault HNSWHNSW index configuration to be used.
flatobjectdefault FlatFlat index configuration to be used.
thresholdinteger10000Threshold object count at which flat to hnsw conversion happens

Index configuration parameters

Experimental feature

Available starting in v1.25. Dynamic indexing is an experimental feature. Use with caution.

Use these parameters to configure the index type and their properties. They can be set in the collection configuration.

ParameterTypeDefaultDetails
vectorIndexTypestringhnswOptional. The index type - can be hnsw, flat or dynamic.
vectorIndexConfigobject-Optional. Set parameters that are specific to the vector index type.
How to select the index type

Generally, the hnsw index type is recommended for most use cases. The flat index type is recommended for use cases where the data the number of objects per index is low, such as in multi-tenancy cases. You can also opt for the dynamic index which will initially configure a flat index and once the object count exceeds a specified threshold it will automatically convert to an hnsw index.

See this section for more information about the different index types and how to choose between them.

If faster import speeds are desired, asynchronous indexing allows de-coupling of indexing from object creation.

Asynchronous indexing

Experimental

Available starting in v1.22. This is an experimental feature. Use with caution.

Starting in Weaviate 1.22, you can use asynchronous indexing by opting in.

To enable asynchronous indexing, set the ASYNC_INDEXING environment variable to true in your Weaviate configuration (the docker-compose.yml file if you use Docker Compose). This setting enables asynchronous indexing for all collections.

Example Docker Compose configuration
---
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0
restart: on-failure:0
ports:
- 8080:8080
- 50051:50051
environment:
QUERY_DEFAULTS_LIMIT: 25
QUERY_MAXIMUM_RESULTS: 10000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
ENABLE_API_BASED_MODULES: 'true'
CLUSTER_HOSTNAME: 'node1'
AUTOSCHEMA_ENABLED: 'false'
ASYNC_INDEXING: 'true'
...

To get the index status, check the node status endpoint.

Node status example usage

The nodes/shards/vectorQueueLength field shows the number of objects that still have to be indexed.

import weaviate

client = weaviate.connect_to_local()

nodes_info = client.cluster.nodes(
collection="JeopardyQuestion", # If omitted, all collections will be returned
output="verbose" # If omitted, will be "minimal"
)
print(nodes_info)

finally:
client.close()

Then, you can check the status of the vector index queue by inspecting the output.


The vectorQueueLength field will show the number of remaining objects to be indexed. In the example below, the vector index queue has 425 objects remaining to be indexed on the TestArticle shard, out of a total of 1000 objects.

{
"nodes": [
{
"batchStats": {
"ratePerSecond": 0
},
"gitHash": "e6b37ce",
"name": "weaviate-0",
"shards": [
{
"class": "TestArticle",
"name": "nq1Bg9Q5lxxP",
"objectCount": 1000,
"vectorIndexingStatus": "INDEXING",
"vectorQueueLength": 425
},
],
"stats": {
"objectCount": 1000,
"shardCount": 1
},
"status": "HEALTHY",
"version": "1.22.1"
},
]
}

Multiple vectors (named vectors)

Added in v1.24.0

Weaviate collections support multiple named vectors.

Collections can have multiple named vectors.

The vectors in a collection can have their own configurations. Each vector space can set its own index, its own compression algorithm, and its own vectorizer. This means you can use different vectorization models, and apply different distance metrics, to the same object.

To work with named vectors, adjust your queries to specify a target vector for vector search or hybrid search queries.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.