Multi-vector encodings
Multi-vector embeddings represent a single data object, like a document or image, using a set of multiple vectors rather than a single vector. This approach allows for a more granular capture of semantic information, as each vector can represent different parts of the object. However, this leads to a significant increase in memory consumption, as multiple vectors are stored for each item.
Compression techniques become especially crucial for multi-vector systems to manage storage costs and improve query latency. Encodings transform the entire set of multi-vectors into a new, more compact single vector representation while aiming to preserve semantic relationships.
MUVERA encoding
MUVERA, which stands for Multi-Vector Retrieval via Fixed Dimensional Encodings, tackles the higher memory usage and slower processing times of multi-vector embeddings by encoding them into single, fixed-dimensional vectors. This leads to reduced memory usage compared to traditional multi-vector approaches.
- Python Client v4
- JS/TS Client v3
- Java
- Go
from weaviate.classes.config import Configure
client.collections.create(
"DemoCollection",
vectorizer_config=[
# Example 1 - Use a model integration
Configure.NamedVectors.text2colbert_jinaai(
name="jina_colbert",
source_properties=["text"],
vector_index_config=Configure.VectorIndex.hnsw(
multi_vector=Configure.VectorIndex.MultiVector.multi_vector(
encoding=Configure.VectorIndex.MultiVector.Encoding.muvera(
# Optional parameters for tuning MUVERA
# ksim: 4,
# dprojections: 16,
# repetitions: 20,
)
)
),
),
# Example 2 - User-provided multi-vector representations
Configure.NamedVectors.none(
name="custom_multi_vector",
vector_index_config=Configure.VectorIndex.hnsw(
multi_vector=Configure.VectorIndex.MultiVector.multi_vector(
encoding=Configure.VectorIndex.MultiVector.Encoding.muvera()
)
),
),
],
# Additional parameters not shown
)
// TS/JS support coming soon
// Java support coming soon
// Go support coming soon
The final dimensionality of the MUVERA encoded vector will be
repetition * 2^ksim * dprojections
. Carefully tuning these parameters
is crucial to balance memory usage and retrieval accuracy.
These parameters can be used to fine-tune MUVERA:
ksim
(int
): The number of Gaussian vectors sampled for the SimHash partitioning function. This parameter determines the number of bits in the hash, and consequently, the number of buckets created in the space partitioning step. The total number of buckets will be . A higher value ofksim
leads to a finer-grained partitioning of the embedding space, potentially improving the accuracy of the approximation but also increasing the dimensionality of the intermediate encoded vectors.dprojections
(int
): The dimensionality of the sub-vectors after the random linear projection in the dimensionality reduction step. After partitioning the multi-vector embedding into buckets, each bucket's aggregated vector is projected down todprojections
dimensions using a random matrix. A smaller value ofdprojections
helps in reducing the overall dimensionality of the final fixed-dimensional encoding, leading to lower memory consumption but potentially at the cost of some information loss and retrieval accuracy.repetition
(int
): The number of times the space partitioning and dimensionality reduction steps are repeated. This repetition allows for capturing different perspectives of the multi-vector embedding and can improve the robustness and accuracy of the final fixed-dimensional encoding. The resulting single vectors from each repetition are concatenated. A higher number of repetitions increases the dimensionality of the final encoding but can lead to better approximation of the original multi-vector similarity.
Further resources
Questions and feedback
If you have any questions or feedback, let us know in the user forum.