Skip to main content

Monitoring

Weaviate can expose Prometheus-compatible metrics for monitoring. A standard Prometheus/Grafana setup can be used to visualize metrics on various dashboards.

Metrics can be used to measure request latencies, import speed, time spent on vector vs object storage, memory usage, application usage, and more.

Configure Monitoring

Enable within Weaviate

To tell Weaviate to collect metrics and expose them in a Prometheus-compatible format, all that's required is to set the following environment variable:

PROMETHEUS_MONITORING_ENABLED=true

By default, Weaviate will expose the metrics at <hostname>:2112/metrics. You can optionally change the port to a custom port using the following environment variable:

PROMETHEUS_MONITORING_PORT=3456

Scrape metrics from Weaviate

Metrics are typically scraped into a time-series database, such as Prometheus. How you consume metrics depends on your setup and environment.

The Weaviate examples repo contains a fully pre-configured setup using Prometheus, Grafana and some example dashboards. You can start up a full-setup including monitoring and dashboards with a single command. In this setup the following components are used:

  • Docker Compose is used to provide a fully-configured setup that can be started with a single command.
  • Weaviate is configured to expose Prometheus metrics as outlined in the section above.
  • A Prometheus instance is started with the setup and configured to scrape metrics from Weaviate every 15s.
  • A Grafana instance is started with the setup and configured to use the Prometheus instance as a metrics provider. Additionally, it runs a dashboard provider that contains a few sample dashboards.

Multi-tenancy

When using multi-tenancy, we suggest setting the PROMETHEUS_MONITORING_GROUP environment variable as true so that data across all tenants are grouped together for monitoring.

Obtainable Metrics

The list of metrics that are obtainable through Weaviate's metric system is constantly being expanded. The complete list is in the prometheus.go source code file.

This page describes some noteworthy metrics and their uses.

Typically metrics are quite granular, as they can always be aggregated later on. For example if the granularity is "shard", you could aggregate all "shard" metrics of the same "class" to obtain a class metrics, or aggregate all metrics to obtain the metric for the entire Weaviate instance.

MetricDescriptionLabelsType
batch_durations_msDuration of a single batch operation in ms. The operation label further defines what operation as part of the batch (e.g. object, inverted, vector) is being used. Granularity is a shard of a class.operation, class_name, shard_nameHistogram
batch_delete_durations_msDuration of a batch delete in ms. The operation label further defines what operation as part of the batch delete is being measured. Granularity is a shard of a classclass_name, shard_nameHistogram
objects_durations_msDuration of an individual object operation, such as put, delete, etc. as indicated by the operation label, also as part of a batch. The step label adds additional precisions to each operation. Granularity is a shard of a class.class_name, shard_nameHistogram
object_countNumbers of objects present. Granularity is a shard of a classclass_name, shard_nameGauge
async_operations_runningNumber of currently running async operations. The operation itself is defined through the operation label.operation, class_name, shard_name, pathGauge
lsm_active_segmentsNumber of currently present segments per shard. Granularity is shard of a class. Grouped by strategy.strategy, class_name, shard_name, pathGauge
lsm_bloom_filter_duration_msDuration of a bloom filter operation per shard in ms. Granularity is shard of a class. Grouped by strategy.operation, strategy, class_name, shard_nameHistogram
lsm_segment_objectsNumber of entries per LSM segment by level. Granularity is shard of a class. Grouped by strategy and level.operation, strategy, class_name, shard_name, path, levelGauge
lsm_segment_sizeSize of LSM segment by level and unit.strategy, class_name, shard_name, path, level, unitGauge
lsm_segment_countNumber of segments by levelstrategy, class_name, shard_name, path, levelGauge
vector_index_tombstonesNumber of currently active tombstones in the vector index. Will go up on each incoming delete and go down after a completed repair operation.class_name, shard_nameGauge
vector_index_tombstone_cleanup_threadsNumber of currently active threads for repairing/cleaning up the vector index after deletes have occurred.class_name, shard_nameGauge
vector_index_tombstone_cleanedTotal number of deleted and removed vectors after repair operations.class_name, shard_nameCounter
vector_index_operationsTotal number of mutating operations on the vector index. The operation itself is defined by the operation label.operation, class_name, shard_nameGauge
vector_index_sizeThe total capacity of the vector index. Typically larger than the number of vectors imported as it grows proactively.class_name, shard_nameGauge
vector_index_maintenance_durations_msDuration of a sync or async vector index maintenance operation. The operation itself is defined through the operation label.opeartion, class_name, shard_nameHistogram
vector_index_durations_msDuration of regular vector index operation, such as insert or delete. The operation itself is defined through the operation label. The step label adds more granularity to each operation.operation, step, class_name, shard_nameHistogram
startup_durations_msDuration of individual startup operations in ms. The operation itself is defined through the operation label.operation, class_name, shard_nameHistogram
startup_diskio_throughputDisk I/O throughput in bytes/s at startup operations, such as reading back the HNSW index or recovering LSM segments. The operation itself is defined by the operation label.operation, step, class_name, shard_nameHistogram
requests_totalMetric that tracks all user requests to determine if it was successful or failed.api, query_type, class_nameGaugeVec
index_queue_push_duration_msDuration of pushing one or more vectors to the index queue.class_name, shard_name, target_vectorSummary
index_queue_delete_duration_msDuration of deleting one or more vectors from the index queue and the underlying index.class_name, shard_name, target_vectorSummary
index_queue_preload_duration_msDuration of preloading un-indexed vectors to the index queue.class_name, shard_name, target_vectorSummary
index_queue_preload_countNumber of vectors preloaded to the index queue.class_name, shard_name, target_vectorGauge
index_queue_search_duration_msDuration of searching for vectors in the index queue and the underlying index.class_name, shard_name, target_vectorSummary
index_queue_pausedWhether the index queue is paused.class_name, shard_name, target_vectorGauge
index_queue_sizeNumber of vectors in the index queue.class_name, shard_name, target_vectorGauge
index_queue_stale_countNumber of times the index queue has been marked as stale.class_name, shard_name, target_vectorCounter
index_queue_vectors_dequeuedNumber of vectors sent to the workers per tick.class_name, shard_name, target_vectorGauge
index_queue_wait_duration_msDuration of waiting for the workers to finish.class_name, shard_name, target_vectorSummary

Extending Weaviate with new metrics is very easy. To suggest a new metric, see the contributor guide.

Versioning

Be aware that metrics do not follow the semantic versioning guidelines of other Weaviate features. Weaviate's main APIs are stable and breaking changes are extremely rare. Metrics, however, have shorter feature lifecycles. It can sometimes be necessary to introduce an incompatible change or entirely remove a metric, for example, because the cost of observing a specific metric in production has grown too high. As a result, it is possible that a Weaviate minor release contains a breaking change for the Monitoring system. If so, it will be clearly highlighted in the release notes.

Sample Dashboards

Weaviate does not ship with any dashboards by default, but here is a list of dashboards being used by the various Weaviate teams, both during development, and when helping users. These do not come with any support, but may still be helpful. Treat them as inspiration to design your own dashboards which fit your uses perfectly:

DashboardPurposePreview
Importing Data Into WeaviateVisualize speed of import operations (including its components, such as object store, inverted index, and vector index).Importing Data into Weaviate
Object OperationsVisualize speed of whole object operations, such as GET, PUT, etc.Objects
Vector IndexVisualize the current state, as well as operations on the HNSW vector indexVector Index
LSM StoresGet insights into the internals (including segments) of the various LSM stores within Weaviate.LSM Store
StartupVisualize the startup process, including recovery operationsStartup
UsageObtain usage metrics, such as number of objects imported, etc.Usage
Aysnc index queueObserve index queue activityAsync index queue

nodes API Endpoint

To get collection details programmatically, use the nodes REST endpoint.

The nodes endpoint returns an array of nodes. The nodes have the following fields:

  • name: Name of the node.
  • status: Status of the node (one of: HEALTHY, UNHEALTHY, UNAVAILABLE, INDEXING).
  • version: Version of Weaviate running on the node.
  • gitHash: Short git hash of the latest commit of Weaviate running on the node.
  • stats: Statistics for the node.
    • shardCount: Total number of shards on the node.
    • objectCount Total number of indexed objects on the node.
  • shards: Array of shard statistics. To see shards details, set output == verbose.
    • name: Name of the shard.
    • class: Name of the collection stored on the shard.
    • objectCount: Number of indexed objects on the shard.
    • vectorQueueLength: Number of objects waiting to be indexed on the shard. (Available starting in Weaviate 1.22 when ASYNC_INDEXING is enabled.)

Questions and feedback

If you have any questions or feedback, let us know in the user forum.