Skip to main content

Vector indexes

Vector indexes facilitate efficient, vector-first data storage and retrieval.

Multiple vectors

Added in v1.24.0

Weaviate collections support multiple, named vectors.

Collections can have multiple, named vectors. Each vector is independent. Each vector space has its own index, its own compression, and its own vectorizer. This means you can create vectors for properties, use different vectorization models, and apply different metrics to the same object.

You do not have to use multiple vectors in your collections, but if you do, you need to adjust your queries to specify which vector you want to use.

Index configuration parameters

Use these parameters to configure the index type and their properties. They can be set in the collection configuration.

vectorIndexTypestringhnswOptional. The index type - can be hnsw or flat.
vectorIndexConfigobject-Optional. Set parameters that are specific to the vector index type.
How to select the index type

Generally, the hnsw index type is recommended for most use cases. The flat index type is recommended for use cases where the data the number of objects per index is low, such as in multi-tenancy cases.

See this section for more information about the different index types and how to choose between them.

If faster import speeds are desired, asynchronous indexing allows de-coupling of indexing from object creation.

HNSW indexes

HNSW indexes are scalable and super fast at query time, but HNSW algorithms are costly when you add data during the index building process.

HNSW index parameters

Some HNSW parameters are mutable, but others cannot be modified after you create your collection.

cleanupIntervalSecondsinteger300YesCleanup frequency. This value does not normally need to be adjusted. A higher value means cleanup runs less frequently, but it does more in a single batch. A lower value means cleanup is more frequent, but it may be less efficient on each run.
distancestringcosineNoDistance metric. The metric that measures the distance between two arbitrary vectors. For available distance metrics, see supported distance metrics.
efinteger-1YesBalance search speed and recall. ef is the size of the dynamic list that the HNSW uses during search. Search is more accurate when ef is higher, but it is also slower. ef values greater than 512 show diminishing improvements in recall.

Dynamic ef. Weaviate automatically adjusts the ef value and creates a dynamic ef list when ef is set to -1. For more details, see dynamic ef.
efConstructioninteger128NoBalance index search speed and build speed. A high efConstruction value means you can lower your ef settings, but importing is slower.

efConstruction must be greater than 0.
maxConnectionsinteger64NoMaximum number of connections per element. maxConnections is the connection limit per layer for layers above the zero layer. The zero layer can have (2 * maxConnections) connections.

maxConnections must be greater than 0.
dynamicEfMininteger100YesNew in v1.10.0.

Lower bound for dynamic ef. Protects against a creating search list that is too short.

This setting is only used when ef is -1.
dynamicEfMaxinteger500YesNew in v1.10.0.

Upper bound for dynamic ef. Protects against creating a search list that is too long.

If dynamicEfMax is higher than the limit, dynamicEfMax does not have any effect. In this case, ef is the limit.

This setting is only used when ef is -1.
dynamicEfFactorinteger8YesNew in v1.10.0.

Multiplier for dynamic ef. Sets the potential length of the search list.

This setting is only used when ef is -1.
flatSearchCutoffinteger40000YesOptional. Threshold for the flat-search cutoff. To force a vector index search, set "flatSearchCutoff": 0.
skipbooleanfalseNoWhen true, do not index the collection.

Weaviate decouples vector creation and vector storage. If you skip vector indexing, but a vectorizer is configured (or a vector is provided manually), Weaviate logs a warning each import.

To skip indexing and vector generation, set "vectorizer": "none" when you set "skip": true.

See When to skip indexing.
vectorCacheMaxObjectsinteger1e12YesMaximum number of objects in the memory cache. By default, this limit is set to one trillion (1e12) objects when a new collection is created. For sizing recommendations, see Vector cache considerations.
pqobject--YesEnable and configure product quantization (PQ) compression.

PQ assumes some data has already been loaded. You should have 10,000 to 100,000 vectors per shard loaded before you enable PQ.

For PQ configuration details, see PQ configuration parameters.

PQ configuration parameters

Configure pq with these parameters.

enabledbooleanfalseEnable PQ when true.

The Python client v4 does not use the enabled parameter. To enable PQ with the v4 client, set a quantizer in the collection definition.
trainingLimitinteger100000The maximum number of objects, per shard, used to fit the centroids. Larger values increase the time it takes to fit the centroids. Larger values also require more memory.
segmentsinteger--The number of segments to use. The number of vector dimensions must be evenly divisible by the number of segments.

Starting in v1.23, Weaviate uses the number of dimensions to optimize the number of segments.
centroidsinteger256The number of centroids to use. Reducing the number of centroids reduces the size of the quantized (PQ compressed) vectors at the price of recall.

If you use the kmeans encoder, centroids is set to 256 (one byte) by default.
encoderstringkmeansEncoder specification. There are two encoders. You can specify the type of encoder as either kmeans(default) or tile.
distributionstringlog-normalEncoder distribution type. Only used with the tile encoder. If you use the tile encoder, you can specify the distribution as log-normal (default) or normal.

HNSW Configuration tips

To determine reasonable settings for your use case, consider the following questions and compare your answers in the table below:

  1. How many queries do you expect per second?
  2. Do you expect a lot of imports or updates?
  3. How high should the recall be?
Number of queriesMany imports or updatesRecall levelConfiguration suggestions
not manynolowThis is the ideal scenario. Keep both the ef and efConstruction settings low. You don't need a big machine and you will still be happy with the results.
not manynohighHere the tricky thing is that your recall needs to be high. Since you're not expecting a lot of requests or imports, you can increase both the ef and efConstruction settings. Keep increasing them until you are happy with the recall. In this case, you can get pretty close to 100%.
not manyyeslowHere the tricky thing is the high volume of imports and updates. Be sure to keep efConstruction low. Since you don't need a high recall, and you're not expecting a lot of queries, you can adjust the ef setting until you've reached the desired recall.
not manyyeshighThe trade-offs are getting harder. You need high recall and you're dealing with a lot of imports or updates. This means you need to keep the efConstruction setting low, but you can significantly increase your ef setting because your queries per second rate is low.
manynolowMany queries per second means you need a low ef setting. Luckily you don't need high recall so you can significantly increase the efConstruction value.
manynohighMany queries per second means a low ef setting. Since you need a high recall but you are not expecting a lot of imports or updates, you can increase your efConstruction until you've reached the desired recall.
manyyeslowMany queries per second means you need a low ef setting. A high number of imports and updates also means you need a low efConstruction setting. Luckily your recall does not have to be as close to 100% as possible. You can set efConstruction relatively low to support your input or update throughput, and you can use the ef setting to regulate the query per second speed.
manyyeshighAha, this means you're a perfectionist or you have a use case that needs the best of all three worlds. Increase your efConstruction value until you hit the time limit of imports and updates. Next, increase your ef setting until you reach your desired balance of queries per second versus recall.

While many people think they need maximize all three dimensions, in practice that's usually not the case. We leave it up to you to decide, and you can always ask for help in our forum.

This set of values is a good starting point for many use cases.


Flat indexes

Added in v1.23

Flat indexes are recommended for use cases where the number of objects per index is low, such as in multi-tenancy use cases.

vectorCacheMaxObjectsinteger1e12YesMaximum number of objects in the memory cache. By default, this limit is set to one trillion (1e12) objects when a new collection is created. For sizing recommendations, see Vector cache considerations.
bqobject--NoEnable and configure binary quantization (BQ) compression.

For BQ configuration details, see BQ configuration parameters.

BQ configuration parameters

Configure bq with these parameters.

enabledbooleanfalseEnable BQ. Weaviate uses binary quantization (BQ) compression when true.
rescoreLimitinteger-1The minimum number of candidates to fetch before rescoring.
cachebooleanfalseWhether to use the vector cache.

Asynchronous indexing


Available starting in v1.22. This is an experimental feature. Please use with caution.

Starting in Weaviate 1.22, you can use asynchronous indexing by opting in.

To enable asynchronous indexing, set the ASYNC_INDEXING environment variable to true in your Weaviate configuration (the docker-compose.yml file if you use Docker Compose). This setting enables asynchronous indexing for all collections.

Example Docker Compose configuration
version: '3.4'
- --host
- --port
- '8080'
- --scheme
- http
restart: on-failure:0
- 8080:8080
- 50051:50051
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
ENABLE_MODULES: 'text2vec-cohere,text2vec-huggingface,text2vec-openai,text2vec-palm,generative-cohere,generative-openai,generative-palm'

To get the index status, call the node status endpoint.

Node status example usage

The nodes/shards/vectorQueueLength field shows the number of objects that still have to be indexed.

import weaviate

client = weaviate.connect_to_local()

nodes_info = client.cluster.nodes(
collection="JeopardyQuestion", # If omitted, all collections will be returned
output="verbose" # If omitted, will be "minimal"


Then, you can check the status of the vector index queue by inspecting the output.

The vectorQueueLength field will show the number of remaining objects to be indexed. In the example below, the vector index queue has 425 objects remaining to be indexed on the TestArticle shard, out of a total of 1000 objects.

"nodes": [
"batchStats": {
"ratePerSecond": 0
"gitHash": "e6b37ce",
"name": "weaviate-0",
"shards": [
"class": "TestArticle",
"name": "nq1Bg9Q5lxxP",
"objectCount": 1000,
"vectorIndexingStatus": "INDEXING",
"vectorQueueLength": 425
"stats": {
"objectCount": 1000,
"shardCount": 1
"status": "HEALTHY",
"version": "1.22.1"