Vector index: Overview
A database index is a data structure that organizes data to make searches more efficient. Think of it as a table of contents in a book, or an index in a library, that helps you find the information you need quickly.
A vector index is a specialized type of index that is designed to store and search vectors.
The choice and configuration of your vector index can significantly impact the performance of your imports as well as searches, and the resource requirements of your Weaviate instance.
For this reason, the vector index a critical component in Weaviate.
Why use a vector index?
Databases can quickly grow very large, to the point where time to search for a specific item can become unacceptably long, or the resources required to search become too high.
A vector index is designed to improve users' experience of searching for items in a large database.
It usually makes a trade-off to balance three key factors: search speed, accuracy, and resource requirements.
Vector index types
Many different types of vector indexes exist. A majority of them are designed to speed up searches by reducing the number of vectors that need to be compared. However, they do this in different ways, and each has its own strengths and weaknesses.
Graph indexes
Graph indexes form a network of vectors, such that similar vectors are connected to each other. This allows for fast "traversal" of the graph to find similar vectors to a query vector.
HNSW, or "Hierarchical Navigable Small World", is the most common graph index type. It creates a set of "layers" of vectors, to enable fast traversal of the graph.
They are very scalable, allow incremental updates, and efficient for high-dimensional vectors.
This is the default index type in Weaviate.
Tree-based indexes
Tree-based indexes divide the vectors into a tree structure.
ANNOY, or "Approximate Nearest Neighbors Oh Yeah", is a well-known tree-based index. It divides the vectors into a binary tree structure.
They can be memory-efficient, and are good for low-dimensional vectors.
Given its nature as a tree, it may be costly to update the index over time. This would depend on whether the tree needs to be rebuilt, or if it can be updated incrementally.
ANNOY itself does not support incremental updates.
Cluster-based indexes
Cluster-based indexes group vectors based on their similarity. As a result, the search space is reduced to only the cluster(s) that is most likely to contain the nearest neighbors.
Their search accuracy (recall and precision) may generally be lower than graph-based indexes, but they can be more memory-efficient.
Flat index
A flat index is the simplest type of index. It stores all vectors in a single list, and searches through all of them to find the nearest neighbors.
This is extremely memory-efficient, but does not scale well at all, as the search time grows linearly with the number of vectors.
Available vector indexes in Weaviate
Weaviate supports multiple types of vector indexes - namely, hnsw
, flat
, and dynamic
.
We will discuss these in more detail in the following sections.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.