Weaviate ANN benchmarks

Weaviate on Stackoverflow badge Weaviate issues on Github badge Weaviate total Docker pulls badge

💡 You are looking at older or release candidate documentation. The current Weaviate version is v1.15.2

ANN benchmarks for the Weaviate vector search engine


About this benchmark

This benchmark is designed to measure and illustrate Weaviate’s ANN performance for a range of real-life use cases.

💡 this is not a comparative benchmark that runs Weaviate against competing solutions.

To make the most of this benchmark, you can look at it from different perspectives:

  • The overall performance – Review the benchmark result section below to draw conclusions about what to expect from Weaviate in a production setting.
  • Expectation for your use case – Find the dataset closest to your production use case, and estimate Weaviate’s expected performance for your use case.
  • Fine Tuning – If you don’t get the results you expect. Find the optimal combinations of the config parameters (efConstruction, maxConnections and ef) to achieve the best results for your production configuration.

What is being measured?

For each benchmark test, we picked parameters of:

  • efConstruction - The HNSW build parameter that controls the quality of the search at build time.
  • maxConnections - The HNSW build parameter controls how many outgoing edges a node can have in the HNSW graph.
  • ef - The HNSW query time parameter that controls the quality of the search.

For each set of parameters we’ve run 10000 requests and we measured:

  • The Recall@1, Recall@10, Recall@100 - by comparing Weaviate’s results to the ground truths specified in each dataset
  • Multi-threaded Queries per Second (QPS) - The overall throughput you can achieve with each configuration
  • Individual Request Latency (mean) - The mean latency over all 10,000 requests
  • P99 Latency - 99% of all requests (9.900 out of 10.000) have a latency that is lower than or equal to this number – this shows how fast
  • Import time - Since varying build parameters has an effect on import time, the import time is also included

By request, we mean: An unfiltered vector search across the entire dataset for the given test. All latency and throughput results represent the end-to-end time that your users would also experience. In particular, these means:

  • Each request time includes the network overhead for sending the results over the wire. In the test setup, the client and server machines were located in the same VPC.
  • Each request includes retrieving all the matched objects from disk. This is a significant difference from ann-benchmarks, where the embedded libraries only return the matched IDs.

Benchmark Setup

Scripts

This benchmark is produced using open-source scripts, so you can reproduce it yourself.

Hardware

Setup with Weaviate and benchmark machine

For the purpose of this benchmark we’ve used two GCP instances within the same VPC:

  • Benchmark – a c2-standard-30 instance with 30 vCPU cores and 120 GB memory – to host Weaviate.
  • Script – a smaller instance with 8 vCPU – to run benchmarking scripts.

💡 the c2-standard-30 was chosen for benchmarking for two reasons:

  • It is large enough to show that Weaviate is a highly-concurrent vector search engine and scales well while running thousands of searches across multiple threads.
  • It is small enough to represent a typical production case without inducing high costs.

Based on your throughput requirements, it is very likely that you will run Weaviate on a considerably smaller or larger machine in production.

In this section below we have outlined what you should expect when altering the configuration or setup parameters.

Experiment Setup

The selection of datasets is modeled after ann-benchmarks. The same test queries are used to test speed, throughput, and recall. The provided ground truths are used to calculate the recall.

The imports were performed using Weaviate’s python clients. The concurrent (multi-threaded) queries were measured using Go. Each language may have a slightly different performance, and you may experience different results if you send your queries using another language. For the maximum throughput, we recommend using the Go or Java clients.

The complete import and test scripts are available here.

Results

For each dataset, there is a highlighted configuration. The highlighted configuration is an opinionated pick about a good recall/latency/throughput trade-off. The highlight sections will give you a good overview of Weaviate’s performance with the respective dataset. Below the highlighted configuration, you can find alternative configurations.

SIFT1M (1M 128d vectors, L2 distance)

Highlighted Configuration

1.0M
Dataset Size
128
Dimensions
l2
Distance Metric
128
efConstruction
32
maxConnections
64
ef
98.83%
Recall@10
8905
QPS (Limit 10)
3.31ms
Mean Latency (Limit 10)
4.49ms
p99 Latency (Limit 10)

All Results

QPS vs Recall

efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
6486490.91%114453812.59ms3.44ms186s
51286495.74%113913802.6ms3.4ms286s
128166498.52%104433482.83ms3.77ms204s
512166498.69%102873432.87ms3.94ms314s
128326498.92%97603253.03ms4.15ms203s
256326499.0%94623153.13ms4.36ms243s
512326499.22%92493083.2ms4.68ms351s
5123212899.29%71552384.14ms5.84ms351s
1283225699.34%56941905.21ms6.94ms203s
2563251299.37%35781198.27ms11.2ms243s
efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
12886493.41%102373412.88ms4.24ms183s
51286494.03%101793392.89ms3.91ms286s
128166498.01%94413153.11ms3.98ms204s
512166498.44%93613123.14ms3.87ms314s
128326498.83%89052973.31ms4.49ms203s
128646498.95%87482923.37ms4.3ms200s
256326499.31%86332883.41ms4.57ms243s
512326499.48%84432813.49ms4.77ms351s
512646499.63%81292713.63ms4.66ms363s
5123212899.7%67112244.4ms5.83ms351s
5126412899.77%63652124.63ms6.05ms363s
5121625699.77%58471955.06ms6.58ms314s
1283225699.8%53791795.5ms7.23ms203s
1286425699.82%52321745.67ms7.57ms200s
2563225699.89%50671695.82ms7.52ms243s
5123225699.91%48661626.05ms7.97ms351s
5126425699.92%42071406.83ms14.57ms363s
5123251299.93%33031108.97ms12.08ms351s
efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
256166492.47%47841596.03ms7.93ms227s
1281612894.37%43091446.71ms16.56ms204s
5121612895.13%42661426.76ms16.02ms314s
2563212897.63%42391416.83ms9.07ms243s
5123212898.1%41771396.94ms8.88ms351s
5126412898.68%40521357.18ms9.27ms363s
5121625698.76%35801198.08ms18.79ms314s
1283225699.12%34151148.5ms19.64ms203s
5123225699.68%33891138.61ms11.05ms351s
5126425699.8%31891069.14ms12.3ms363s
1283251299.82%26418811.11ms21.75ms203s
1286451299.84%25598511.45ms23.31ms200s
2563251299.92%25018311.67ms23.59ms243s
5123251299.95%24118012.14ms25.73ms351s
5126451299.97%22657612.94ms25.88ms363s

Glove-25 (1.2M 25d vectors, cosine distance)

Highlighted Configuration

1.18M
Dataset Size
25
Dimensions
cosine
Distance Metric
64
efConstruction
16
maxConnections
64
ef
96.56%
Recall@10
15003
QPS (Limit 10)
1.93ms
Mean Latency (Limit 10)
2.94ms
p99 Latency (Limit 10)

All Results

QPS vs Recall

efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
12886494.47%197526581.48ms2.49ms178s
51286495.3%197046571.48ms2.61ms272s
512166499.26%175835861.65ms2.82ms308s
256166499.27%171775731.7ms2.67ms232s
128326499.72%154435151.9ms2.83ms184s
256326499.84%151875061.93ms2.82ms261s
512326499.89%144014802.04ms2.93ms354s
256646499.9%134904502.17ms3.17ms276s
512646499.96%126264212.32ms3.37ms388s
5126412899.98%86652893.38ms4.82ms388s
2563225699.98%71912404.07ms5.77ms261s
1286425699.99%69582324.18ms6.17ms195s
51232256100.0%66942234.39ms5.99ms354s
12832512100.0%45681526.4ms9.27ms184s
efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
12886491.33%165765531.75ms2.82ms178s
25686491.98%164745491.76ms2.87ms205s
51286492.13%163685461.77ms2.85ms272s
64166496.56%150035001.93ms2.94ms160s
512166497.95%149965001.92ms2.78ms308s
64646498.04%141974732.05ms3.14ms167s
128326499.06%134824492.17ms3.07ms184s
256326499.44%132374412.2ms3.22ms261s
512326499.56%126614222.31ms3.32ms354s
256646499.63%120144002.43ms3.37ms276s
512646499.76%113003772.58ms3.56ms388s
5123212899.76%93653123.14ms4.73ms354s
2566412899.79%86692893.34ms4.67ms276s
5126412899.89%79902663.65ms5.09ms388s
2563225699.95%67712264.32ms5.84ms261s
5123225699.97%62862104.66ms6.33ms354s
5126425699.99%52251745.55ms8.11ms388s
25632512100.0%42811436.84ms9.55ms261s
51232512100.0%39171317.47ms10.33ms354s
25664512100.0%36111208.03ms12.03ms276s
efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
6486478.16%62022074.55ms6.21ms152s
25686480.07%60442014.59ms8.59ms205s
64812881.93%59681994.73ms6.98ms152s
512166491.28%59301984.75ms6.86ms308s
64646492.52%57681924.91ms6.38ms167s
1281612893.17%56501885.02ms6.47ms185s
128326494.91%55431855.13ms6.81ms184s
256326496.07%55241845.12ms6.71ms261s
512326496.45%53211775.32ms7.51ms354s
1283212896.54%52541755.42ms7.01ms184s
2563212897.48%52351755.43ms7.34ms261s
5123212897.79%50451685.65ms7.15ms354s
2566412898.21%48891635.86ms7.75ms276s
5126412898.75%46671566.13ms7.85ms388s
1283225699.01%42981436.71ms8.76ms184s
2563225699.43%42421416.77ms8.74ms261s
5123225699.57%40691367.1ms9.01ms354s
2566425699.61%38541287.47ms10.13ms276s
5126425699.79%36341217.92ms10.88ms388s
2563251299.92%31581059.18ms12.12ms261s
5123251299.95%2956999.8ms12.86ms354s
5126451299.98%25818611.21ms15.68ms388s

Deep Image 96 (9.99M 96d vectors, cosine distance)

Highlighted Configuration

9.99M
Dataset Size
96
Dimensions
cosine
Distance Metric
128
efConstruction
32
maxConnections
64
ef
96.43%
Recall@10
6112
QPS (Limit 10)
4.7ms
Mean Latency (Limit 10)
15.87ms
p99 Latency (Limit 10)

All Results

QPS vs Recall

efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
64166494.44%93013103.14ms7.21ms3305s
128166496.06%89572993.28ms7.24ms3804s
64646496.84%87602923.36ms6.97ms3253s
128326497.88%84732823.48ms7.4ms3533s
128646498.27%79842663.66ms7.52ms3631s
256326498.78%79162643.71ms7.83ms4295s
512326498.95%78762633.73ms7.47ms5477s
256646499.06%78392613.75ms7.21ms4392s
512646499.32%72382414.05ms7.67ms6039s
2566412899.42%57671925.1ms8.39ms4392s
5126412899.52%55091845.34ms8.7ms6039s
2563225699.66%46721566.32ms10.11ms4295s
5123225699.82%44671496.62ms10.29ms5477s
5126425699.9%36831237.97ms12.72ms6039s
5123251299.94%28429510.37ms15.25ms5477s
5126451299.95%22887612.84ms20.72ms6039s
efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
64166491.58%86792893.35ms7.3ms3305s
128166493.68%84022803.47ms6.9ms3804s
64326494.11%82552753.55ms7.61ms3275s
64646494.67%81842733.58ms7.19ms3253s
128646496.95%75752533.88ms7.79ms3631s
256326497.53%75392513.87ms7.81ms4295s
512326497.92%73992473.96ms8.04ms5477s
256646498.15%72872434.02ms7.3ms4392s
512646498.76%68382284.27ms7.96ms6039s
2566412898.77%56581895.2ms8.7ms4392s
5126412899.23%52331745.62ms9.25ms6039s
2563225699.44%44541486.58ms10.11ms4295s
5123225699.61%42701426.89ms10.77ms5477s
5126425699.78%35341188.26ms12.97ms6039s
2563251299.8%29329810.04ms14.79ms4295s
5123251299.88%27679210.64ms15.67ms5477s
5126451299.93%22337413.12ms21.24ms6039s
efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
51286472.88%47341586.06ms9.94ms4327s
64166482.08%46451556.25ms9.31ms3305s
512166485.81%45561526.33ms9.56ms4922s
64326486.23%44921506.43ms9.82ms3275s
64646487.25%44881506.45ms9.36ms3253s
643212889.05%43471456.67ms10.05ms3275s
5121612889.08%43471456.65ms10.31ms4922s
646412889.88%42841436.78ms9.86ms3253s
256326491.99%41461387.01ms10.36ms4295s
512326492.7%40921367.08ms10.33ms5477s
256646493.85%39171317.39ms10.68ms4392s
2563212894.22%39131307.43ms10.74ms4295s
5123212894.83%38561297.54ms11.08ms5477s
512646495.14%38161277.6ms11.23ms6039s
2566412895.65%36881237.9ms11.12ms4392s
1283225696.9%33171118.78ms12.5ms3533s
2563225697.91%31821069.19ms12.91ms4295s
5123225698.29%30901039.48ms13.16ms5477s
2566425698.48%28969710.1ms14.27ms4392s
5126425699.02%27079010.78ms15.47ms6039s
2563251299.34%23107712.65ms17.56ms4295s
5123251299.52%22007313.27ms18.76ms5477s
2566451299.53%20326814.3ms21.44ms4392s
5126451299.75%18796315.56ms23.65ms6039s

GIST 960 (1.0M 960d vectors, cosine distance)

Highlighted Configuration

1.00M
Dataset Size
960
Dimensions
cosine
Distance Metric
512
efConstruction
32
maxConnections
128
ef
94.14%
Recall@10
1935
QPS (Limit 10)
15.05ms
Mean Latency (Limit 10)
19.86ms
p99 Latency (Limit 10)

All Results

QPS vs Recall

efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
6486466.6%27599210.59ms13.77ms1832s
12886470.7%27349110.7ms13.97ms1861s
51286475.0%27249110.78ms14.87ms2065s
64166479.8%26188711.04ms14.69ms1838s
128166483.9%25778611.21ms15.55ms1904s
256166487.1%25188411.54ms14.49ms2016s
128326489.6%24258111.85ms15.37ms1931s
256326492.6%23888012.09ms15.99ms2074s
256646494.1%22077413.08ms18.56ms2130s
512326494.6%20736914.11ms17.37ms2361s
5123212896.2%19856614.67ms19.32ms2361s
512646496.2%19516514.7ms19.61ms2457s
5121625696.2%18396115.9ms19.84ms2217s
5126412896.7%16035318.06ms24.44ms2457s
5123225698.7%15145019.16ms24.43ms2361s
5123251299.1%9993329.12ms38.89ms2361s
efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
12886465.88%26498811.02ms14.89ms1861s
51286469.68%26258811.03ms15.08ms2065s
64166474.17%25578511.29ms15.54ms1838s
128166480.23%25188411.5ms15.54ms1904s
64646481.98%23878012.11ms16.31ms1860s
256166482.87%23557912.37ms17.67ms2016s
128646488.11%23127712.56ms16.57ms1952s
256326489.83%22977712.55ms18.98ms2074s
512326491.85%20026714.47ms20.03ms2361s
256646492.04%19376514.93ms21.48ms2130s
5123212894.14%19356515.05ms19.86ms2361s
512646494.72%18606215.56ms21.91ms2457s
5126412895.99%15695218.38ms24.81ms2457s
2563225696.48%15565218.6ms24.68ms2074s
5123225697.76%14834919.53ms25.24ms2361s
5126425698.62%12864322.3ms30.06ms2457s
5123251299.16%9813329.53ms37.97ms2361s
5126451299.47%8802932.45ms44.66ms2457s
efConstructionmaxConnectionsefRecallQPSQPS/vCoreMean Latencyp99 LatencyImport time
51286456.05%19976714.5ms20.28ms2065s
256812860.26%19456514.66ms18.39ms1938s
64166461.8%18626215.42ms20.05ms1838s
128166468.05%18326115.61ms20.05ms1904s
5121612877.53%18026016.1ms19.07ms2217s
128646478.26%17445816.59ms21.48ms1952s
1283212879.71%17135716.68ms21.37ms1931s
256326480.3%16525517.49ms23.8ms2074s
5123212886.91%16245417.83ms23.28ms2361s
5121625688.31%15155119.08ms24.64ms2217s
1283225689.11%14774919.72ms25.54ms1931s
2563225692.63%13614521.34ms28.19ms2074s
5123225694.49%13084422.17ms29.1ms2361s
5126425696.44%11523824.88ms33.15ms2457s
2563251296.94%10013328.71ms36.11ms2074s
2566451297.87%8933031.91ms42.6ms2130s
5123251298.04%8702932.84ms42.31ms2361s
5126451298.8%8122734.96ms47.45ms2457s

Learn more & FAQ

What is the difference between latency and throughput?

The latency refers to the time it takes to complete a single request. This is typically measured by taking a mean or percentile distribution of all requests. For example, a mean latency of 5ms means that a single request takes on average 5ms to complete. This does not say anything about how many queries can be answered in a given timeframe.

If Weaviate were single-threaded, the throughput per second would roughly equal to 1s divided by mean latency. For example, with a mean latency of 5ms, this would mean that 200 requests can be answered in a second.

However, in reality, you often don’t have a single user sending one query after another. Instead, you have multiple users sending queries. This makes the querying-side concurrent. Similarly, Weaviate can handle concurrent incoming requests. We can identify how many concurrent requests can be served by measuring the throughput.

We can take our single-thread calculation from before and multiply it with the number of server CPU cores. This will give us a rough estimate of what the server can handle concurrently. However, it would be best never to trust this calculation alone and continuously measure the actual throughput. This is because such scaling may not always be linear. For example, there may be synchronization mechanisms used to make concurrent access safe, such as locks. Not only do these mechanisms have a cost themselves, but if implemented incorrectly, they can also lead to congestion which would further decrease the concurrent throughput. As a result, you cannot perform a single-threaded benchmark and extrapolate what the numbers would be like in a multi-threaded setting.

All throughput numbers (“QPS”) outlined in this benchmark are actual multi-threaded measurements on a 30-core machine, not estimations.

What is a p99 latency?

The mean latency gives you an average value of all requests measured. This is a good indication of how long a user will have to wait on average for their request to be completed. Based on this mean value, you cannot make any promises to your users about wait times. 90 out of 100 users might see a considerably better time, but the remaining 10 might see a significantly worse time.

To give a more precise indication, percentile-based latencies are used. A 99th-percentile latency - or “p99 latency” for short - indicates the slowest request that 99% of requests experience. In other words, 99% of your users will experience a time equal to or better than the stated value. This is a much better guarantee than a mean value.

In production settings, requirements - as stated in SLAs - are often a combination of throughput and a percentile latency. For example, the statement “3000 QPS at p95 latency of 20ms” conveys the following meaning.

  • 3000 requests need to be successfully completed per second
  • 95% of users must see a latency of 20ms or lower.
  • There is no assumption about the remaining 5% of users, implicitly tolerating that they will experience higher latencies than 20ms.

The higher the percentile (e.g. p99 over p95) the “safer” the quoted latency becomes. We have thus decided to use p99-latencies instead of p95-latencies in our measurements.

What happens if I run with fewer or more CPU cores than on the example test machine?

The benchmark outlines a QPS per core measurement. This can help you make a rough estimation of how the throughput would vary on smaller or larger machines. If you do not need the stated throughput, you can run with fewer CPU cores. If you need more throughput, you can run with more CPU cores.

Please note that there is a point of diminishing returns with adding more CPUs because of synchronization mechanisms, disk, and memory bottlenecks.​ ​Beyond that point, you can scale horizontally instead of vertically. Horizontal scaling with replication will be ​​available in Weaviate soon​​.

What are ef, efConstruction, and maxConnections?

These parameters refer to the HNSW build and query parameters. They represent a trade-off between recall, latency & throughput, index size, and memory consumption. This trade-off is highlighted in the benchmark results.

I can’t match the same latencies/throughput in my own setup, how can I debug this?

If you are encountering other numbers in your own dataset, here are a couple of hints to look at:

  • What CPU architecture are you using? The benchmarks above were run on a GCP c2 CPU type, which is based on amd64 architecture. Weaviate also supports arm64 architecture, but not all optimizations are present. If your machine shows maximum CPU usage but you cannot achieve the same throughput, consider switching the CPU type to the one used in this benchmark.

  • Are you using an actual dataset or random vectors? HNSW is known to perform considerably worse with random vectors than with real​ world​ datasets. This is due to the distribution of points in real world​ datasets compared to randomly generated vectors. If you cannot achieve the performance (or recall) outlined above with random vectors, switch to an actual dataset.

  • Are your disks fast enough? While the ANN search itself is CPU-bound, the objects must be read from disk after the search has been completed. Weaviate uses memory-mapped files to speed this process up. However, if not enough memory is present or the operating system has allocated the cached pages elsewhere, a physical disk read needs to occur. If your disk is slow, it could then be that your benchmark is bottlenecked by those disks.

  • Are you using more than 2 million vectors? If yes, make sure to set the vector cache large enough for maximum performance.

Where can I find the scripts to run this benchmark myself?

The repository is located here.