Multi-vector encodings

Multi-vector embeddings represent a single data object, like a document or image, using a set of multiple vectors rather than a single vector. This approach allows for a more granular capture of semantic information, as each vector can represent different parts of the object. However, this leads to a significant increase in memory consumption, as multiple vectors are stored for each item.

Compression techniques become especially crucial for multi-vector systems to manage storage costs and improve query latency. Encodings transform the entire set of multi-vectors into a new, more compact single vector representation while aiming to preserve semantic relationships.

MUVERA encoding

MUVERA, which stands for Multi-Vector Retrieval via Fixed Dimensional Encodings, tackles the higher memory usage and slower processing times of multi-vector embeddings by encoding them into single, fixed-dimensional vectors. This leads to reduced memory usage compared to traditional multi-vector approaches.

Python Client v4
JS/TS Client v3
Java
Go

from weaviate.classes.config import Configure

client.collections.create(
    "DemoCollection",
    vectorizer_config=[
        # Example 1 - Use a model integration
        Configure.NamedVectors.text2colbert_jinaai(
            name="jina_colbert",
            source_properties=["text"],
            vector_index_config=Configure.VectorIndex.hnsw(
                multi_vector=Configure.VectorIndex.MultiVector.multi_vector(
                    encoding=Configure.VectorIndex.MultiVector.Encoding.muvera(
                        # Optional parameters for tuning MUVERA
                        # ksim: 4,
                        # dprojections: 16,
                        # repetitions: 20,
                    )
                )
            ),
        ),
        # Example 2 - User-provided multi-vector representations
        Configure.NamedVectors.none(
            name="custom_multi_vector",
            vector_index_config=Configure.VectorIndex.hnsw(
                multi_vector=Configure.VectorIndex.MultiVector.multi_vector(
                    encoding=Configure.VectorIndex.MultiVector.Encoding.muvera()
                )
            ),
        ),
    ],
    # Additional parameters not shown
)

API docs

// TS/JS support coming soon

// Java support coming soon

// Go support coming soon

The final dimensionality of the MUVERA encoded vector will be repetition * 2^ksim * dprojections. Carefully tuning these parameters is crucial to balance memory usage and retrieval accuracy.

These parameters can be used to fine-tune MUVERA:

ksim (int): The number of Gaussian vectors sampled for the SimHash partitioning function. This parameter determines the number of bits in the hash, and consequently, the number of buckets created in the space partitioning step. The total number of buckets will be $2^{ksim}$ . A higher value of ksim leads to a finer-grained partitioning of the embedding space, potentially improving the accuracy of the approximation but also increasing the dimensionality of the intermediate encoded vectors.
dprojections (int): The dimensionality of the sub-vectors after the random linear projection in the dimensionality reduction step. After partitioning the multi-vector embedding into buckets, each bucket's aggregated vector is projected down to dprojections dimensions using a random matrix. A smaller value of dprojections helps in reducing the overall dimensionality of the final fixed-dimensional encoding, leading to lower memory consumption but potentially at the cost of some information loss and retrieval accuracy.
repetition (int): The number of times the space partitioning and dimensionality reduction steps are repeated. This repetition allows for capturing different perspectives of the multi-vector embedding and can improve the robustness and accuracy of the final fixed-dimensional encoding. The resulting single vectors from each repetition are concatenated. A higher number of repetitions increases the dimensionality of the final encoding but can lead to better approximation of the original multi-vector similarity.

Quantization

Quantization is also available as a compression technique for multi-vector embeddings. It reduces the memory footprint of individual vectors by approximating their values with less precision. Just like with single vectors, multi-vectors support PQ, BQ and SQ quantization.

Further resources

Questions and feedback

If you have any questions or feedback, let us know in the user forum.

MUVERA encoding​

Further resources​

Questions and feedback​

MUVERA encoding

Further resources

Questions and feedback