Vector indexes
Vector indexes facilitate efficient, vector-first data storage and retrieval.
HNSW indexes
HNSW indexes are scalable and super fast at query time, but HNSW algorithms are costly when you add data during the index building process.
HNSW index parameters
Some HNSW parameters are mutable, but others cannot be modified after you create your collection.
Parameter | Type | Default | Changeable | Details |
---|---|---|---|---|
cleanupIntervalSeconds | integer | 300 | Yes | Cleanup frequency. This value does not normally need to be adjusted. A higher value means cleanup runs less frequently, but it does more in a single batch. A lower value means cleanup is more frequent, but it may be less efficient on each run. |
distance | string | cosine | No | Distance metric. The metric that measures the distance between two arbitrary vectors. For available distance metrics, see supported distance metrics. |
ef | integer | -1 | Yes | Balance search speed and recall. ef is the size of the dynamic list that the HNSW uses during search. Search is more accurate when ef is higher, but it is also slower. ef values greater than 512 show diminishing improvements in recall.Dynamic ef . Weaviate automatically adjusts the ef value and creates a dynamic ef list when ef is set to -1. For more details, see dynamic ef. |
efConstruction | integer | 128 | No | Balance index search speed and build speed. A high efConstruction value means you can lower your ef settings, but importing is slower.efConstruction must be greater than 0. |
maxConnections | integer | 32 | No | Maximum number of connections per element. maxConnections is the connection limit per layer for layers above the zero layer. The zero layer can have (2 * maxConnections) connections. maxConnections must be greater than 0. |
dynamicEfMin | integer | 100 | Yes | New in v1.10.0 . Lower bound for dynamic ef . Protects against a creating search list that is too short.This setting is only used when ef is -1. |
dynamicEfMax | integer | 500 | Yes | New in v1.10.0 . Upper bound for dynamic ef . Protects against creating a search list that is too long. If dynamicEfMax is higher than the limit, dynamicEfMax does not have any effect. In this case, ef is the limit.This setting is only used when ef is -1. |
dynamicEfFactor | integer | 8 | Yes | Added in v1.10.0 . Multiplier for dynamic ef . Sets the potential length of the search list. This setting is only used when ef is -1. |
filterStrategy | string | sweeping | Yes | Added in v1.27.0 . The filter strategy to use for filtering the search results. The filter strategy can be set to sweeping or acorn . - sweeping : The default filter strategy. - acorn : Uses Weaviate's ACORN implementation. Read more |
flatSearchCutoff | integer | 40000 | Yes | Optional. Threshold for the flat-search cutoff. To force a vector index search, set "flatSearchCutoff": 0 . |
skip | boolean | false | No | When true, do not index the collection. Weaviate decouples vector creation and vector storage. If you skip vector indexing, but a vectorizer is configured (or a vector is provided manually), Weaviate logs a warning each import. To skip indexing and vector generation, set "vectorizer": "none" when you set "skip": true . See When to skip indexing. |
vectorCacheMaxObjects | integer | 1e12 | Yes | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (1e12 ) objects when a new collection is created. For sizing recommendations, see Vector cache considerations. |
pq | object | -- | Yes | Enable and configure product quantization (PQ) compression. PQ assumes some data has already been loaded. You should have 10,000 to 100,000 vectors per shard loaded before you enable PQ. For PQ configuration details, see PQ configuration parameters. |
Database parameters for HNSW
Note that some database-level parameters are available to configure HNSW indexing behavior.
PERSISTENCE_HNSW_MAX_LOG_SIZE
is a database-level parameter that sets the maximum size of the HNSW write-ahead-log. The default value is500MiB
.
Increase this value to improve efficiency of the compaction process, but be aware that this will increase the memory usage of the database. Conversely, decreasing this value will reduce memory usage but may slow down the compaction process.
Preferably, the PERSISTENCE_HNSW_MAX_LOG_SIZE
should set to a value close to the size of the HNSW graph.
Tombstone cleanup parameters
TOMBSTONE_DELETION_CONCURRENCY
is available inv1.24.0
and up.TOMBSTONE_DELETION_MIN_PER_CYCLE
andTOMBSTONE_DELETION_MAX_PER_CYCLE
are available inv1.24.15
/v1.25.2
and up.
Tombstones are records that mark deleted objects. In an HNSW index, tombstones are regularly cleaned up, triggered periodically by the cleanupIntervalSeconds
parameter.
As the index grows in size, the cleanup process may take longer to complete and require more resources. For very large indexes, this may cause performance issues.
To control the number of tombstones deleted per cleanup cycle and prevent performance issues, set the TOMBSTONE_DELETION_MAX_PER_CYCLE
and TOMBSTONE_DELETION_MIN_PER_CYCLE
environment variables.
- Set
TOMBSTONE_DELETION_MIN_PER_CYCLE
to prevent occurrences of unnecessary cleanup cycles. - Set
TOMBSTONE_DELETION_MAX_PER_CYCLE
to prevent the cleanup process from taking too long and consuming too many resources.
As an example, for a cluster with 300 million objects per shard, a TOMBSTONE_DELETION_MIN_PER_CYCLE
value of 1000000 (1 million) and a TOMBSTONE_DELETION_MAX_PER_CYCLE
value of 10000000 (10 million) may be good starting points.
You can also set the TOMBSTONE_DELETION_CONCURRENCY
environment variable to limit the number of threads used for tombstone cleanup. This can help prevent prevent the cleanup process from unnecessarily consuming too many resources, or the cleanup process from taking too long.
The default value for TOMBSTONE_DELETION_CONCURRENCY
is set to half the number of CPU cores available to Weaviate.
In a cluster with a large number of cores, you may want to set TOMBSTONE_DELETION_CONCURRENCY
to a lower value to prevent the cleanup process from consuming too many resources. Conversely, in a cluster with a small number of cores and a large number of deletions, you may want to set TOMBSTONE_DELETION_CONCURRENCY
to a higher value to speed up the cleanup process.
PQ configuration parameters
Configure pq
with these parameters.
Parameter | Type | Default | Details |
---|---|---|---|
enabled | boolean | false | Enable PQ when true . The Python client v4 does not use the enabled parameter. To enable PQ with the v4 client, set a quantizer in the collection definition. |
trainingLimit | integer | 100000 | The maximum number of objects, per shard, used to fit the centroids. Larger values increase the time it takes to fit the centroids. Larger values also require more memory. |
segments | integer | -- | The number of segments to use. The number of vector dimensions must be evenly divisible by the number of segments. Starting in v1.23 , Weaviate uses the number of dimensions to optimize the number of segments. |
centroids | integer | 256 | The number of centroids to use (max: 256). We generally recommend you do not change this value. Due to the data structure used, smaller centroid value will not result in smaller vectors, but may result in faster compression at cost of recall. |
encoder | string | kmeans | Encoder specification. There are two encoders. You can specify the type of encoder as either kmeans (default) or tile . |
distribution | string | log-normal | Encoder distribution type. Only used with the tile encoder. If you use the tile encoder, you can specify the distribution as log-normal (default) or normal . |
HNSW Configuration tips
To determine reasonable settings for your use case, consider the following questions and compare your answers in the table below:
- How many queries do you expect per second?
- Do you expect a lot of imports or updates?
- How high should the recall be?
Number of queries | Many imports or updates | Recall level | Configuration suggestions |
---|---|---|---|
not many | no | low | This is the ideal scenario. Keep both the ef and efConstruction settings low. You don't need a big machine and you will still be happy with the results. |
not many | no | high | Here the tricky thing is that your recall needs to be high. Since you're not expecting a lot of requests or imports, you can increase both the ef and efConstruction settings. Keep increasing them until you are happy with the recall. In this case, you can get pretty close to 100%. |
not many | yes | low | Here the tricky thing is the high volume of imports and updates. Be sure to keep efConstruction low. Since you don't need a high recall, and you're not expecting a lot of queries, you can adjust the ef setting until you've reached the desired recall. |
not many | yes | high | The trade-offs are getting harder. You need high recall and you're dealing with a lot of imports or updates. This means you need to keep the efConstruction setting low, but you can significantly increase your ef setting because your queries per second rate is low. |
many | no | low | Many queries per second means you need a low ef setting. Luckily you don't need high recall so you can significantly increase the efConstruction value. |
many | no | high | Many queries per second means a low ef setting. Since you need a high recall but you are not expecting a lot of imports or updates, you can increase your efConstruction until you've reached the desired recall. |
many | yes | low | Many queries per second means you need a low ef setting. A high number of imports and updates also means you need a low efConstruction setting. Luckily your recall does not have to be as close to 100% as possible. You can set efConstruction relatively low to support your input or update throughput, and you can use the ef setting to regulate the query per second speed. |
many | yes | high | Aha, this means you're a perfectionist or you have a use case that needs the best of all three worlds. Increase your efConstruction value until you hit the time limit of imports and updates. Next, increase your ef setting until you reach your desired balance of queries per second versus recall. While many people think they need maximize all three dimensions, in practice that's usually not the case. We leave it up to you to decide, and you can always ask for help in our forum. |
This set of values is a good starting point for many use cases.
Parameter | Value |
---|---|
ef | 64 |
efConstruction | 128 |
maxConnections | 32 |
Flat indexes
v1.23
Flat indexes are recommended for use cases where the number of objects per index is low, such as in multi-tenancy use cases.
Parameter | Type | Default | Changeable | Details |
---|---|---|---|---|
vectorCacheMaxObjects | integer | 1e12 | Yes | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (1e12 ) objects when a new collection is created. For sizing recommendations, see Vector cache considerations. |
bq | object | -- | No | Enable and configure binary quantization (BQ) compression. For BQ configuration details, see BQ configuration parameters. |
BQ configuration parameters
Configure bq
with these parameters.
Parameter | Type | Default | Details |
---|---|---|---|
enabled | boolean | false | Enable BQ. Weaviate uses binary quantization (BQ) compression when true . |
rescoreLimit | integer | -1 | The minimum number of candidates to fetch before rescoring. |
cache | boolean | false | Whether to use the vector cache. |
Dynamic indexes
Available starting in v1.25
. Dynamic indexing is an experimental feature. Use with caution.
ASYNC_INDEXING
Dynamic indexes require asynchronous indexing. To enable asynchronous indexing in a self-hosted Weaviate instance, set the ASYNC_INDEXING
environment variable to true
. If your instance is hosted in Weaviate Cloud, use the Weaviate Cloud console to enable asynchronous indexing.
Using the dynamic
index will initially create a flat index and once the number of objects exceeds a certain threshold (by default 10,000 objects) it will automatically switch you over to an HNSW index.
This is only a one-way switch that converts a flat index to a HNSW, the index does not support changing back to a flat index even if the object count goes below the threshold due to deletion.
The goal of dynamic
indexing is to shorten latencies during query time at the cost of a larger memory footprint.
Dynamic index parameters
Parameter | Type | Default | Details |
---|---|---|---|
distance | string | cosine | Distance metric. The metric that measures the distance between two arbitrary vectors. |
hnsw | object | default HNSW | HNSW index configuration to be used. |
flat | object | default Flat | Flat index configuration to be used. |
threshold | integer | 10000 | Threshold object count at which flat to hnsw conversion happens |
Index configuration parameters
Available starting in v1.25
. Dynamic indexing is an experimental feature. Use with caution.
Use these parameters to configure the index type and their properties. They can be set in the collection configuration.
Parameter | Type | Default | Details |
---|---|---|---|
vectorIndexType | string | hnsw | Optional. The index type - can be hnsw , flat or dynamic . |
vectorIndexConfig | object | - | Optional. Set parameters that are specific to the vector index type. |
How to select the index type
Generally, the hnsw
index type is recommended for most use cases. The flat
index type is recommended for use cases where the data the number of objects per index is low, such as in multi-tenancy cases. You can also opt for the dynamic
index which will initially configure a flat
index and once the object count exceeds a specified threshold it will automatically convert to an hnsw
index.
See this section for more information about the different index types and how to choose between them.
If faster import speeds are desired, asynchronous indexing allows de-coupling of indexing from object creation.
Asynchronous indexing
Available starting in v1.22
. This is an experimental feature. Use with caution.
Starting in Weaviate 1.22
, you can use asynchronous indexing by opting in.
To enable asynchronous indexing, set the ASYNC_INDEXING
environment variable to true
in your Weaviate configuration (the docker-compose.yml
file if you use Docker Compose). This setting enables asynchronous indexing for all collections.
Example Docker Compose configuration
---
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.28.2
restart: on-failure:0
ports:
- 8080:8080
- 50051:50051
environment:
QUERY_DEFAULTS_LIMIT: 25
QUERY_MAXIMUM_RESULTS: 10000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
ENABLE_API_BASED_MODULES: 'true'
CLUSTER_HOSTNAME: 'node1'
AUTOSCHEMA_ENABLED: 'false'
ASYNC_INDEXING: 'true'
...
To get the index status, check the node status endpoint.
Node status
example usage
The nodes/shards/vectorQueueLength
field shows the number of objects that still have to be indexed.
- Python Client v4
- Python Client v3
- JS/TS Client (v3)
- JS/TS Client (v3)
- Go
- Java
- Curl
import weaviate
client = weaviate.connect_to_local()
nodes_info = client.cluster.nodes(
collection="JeopardyQuestion", # If omitted, all collections will be returned
output="verbose" # If omitted, will be "minimal"
)
print(nodes_info)
finally:
client.close()
import weaviate
client = weaviate.Client("http://localhost:8080")
nodes_status = client.cluster.get_nodes_status()
print(nodes_status)
import weaviate from 'weaviate-client';
const client = await weaviate.connectToLocal()
const response = await client.cluster.nodes({
collection: 'JeopardyQuestion',
output: 'minimal'
})
console.log(response)
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
const response = await client.cluster
.nodesStatusGetter()
.do();
console.log(response);
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
nodesStatus, err := client.Cluster().
NodesStatusGetter().
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", nodesStatus)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.cluster.model.NodesStatusResponse;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
Result<NodesStatusResponse> result = client.cluster()
.nodesStatusGetter()
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
curl http://localhost:8080/v1/nodes
Then, you can check the status of the vector index queue by inspecting the output.
The vectorQueueLength
field will show the number of remaining objects to be indexed. In the example below, the vector index queue has 425 objects remaining to be indexed on the TestArticle
shard, out of a total of 1000 objects.
{
"nodes": [
{
"batchStats": {
"ratePerSecond": 0
},
"gitHash": "e6b37ce",
"name": "weaviate-0",
"shards": [
{
"class": "TestArticle",
"name": "nq1Bg9Q5lxxP",
"objectCount": 1000,
"vectorIndexingStatus": "INDEXING",
"vectorQueueLength": 425
},
],
"stats": {
"objectCount": 1000,
"shardCount": 1
},
"status": "HEALTHY",
"version": "1.22.1"
},
]
}
Multiple vectors (named vectors)
Weaviate collections support multiple named vectors.
Collections can have multiple named vectors.
The vectors in a collection can have their own configurations. Each vector space can set its own index, its own compression algorithm, and its own vectorizer. This means you can use different vectorization models, and apply different distance metrics, to the same object.
To work with named vectors, adjust your queries to specify a target vector for vector search or hybrid search queries.
Related pages
Questions and feedback
If you have any questions or feedback, let us know in the user forum.