Multi-vector embeddings (ColBERT, ColPali, etc.)

In this section, we will explore how to use multi-vector embeddings in Weaviate. Multi-vector embeddings (implemented through models like ColBERT, ColPali, or ColQwen) represent each object or query using multiple vectors instead of a single vector. This approach enables more precise searching through "late interaction" - a technique that matches individual parts of texts rather than comparing them as whole units.

Prerequisites

Before starting this tutorial, ensure you have the following:

An instance of Weaviate (e.g. on Weaviate Cloud, or locally), version v1.29 or newer.
Your preferred Weaviate client library installed.
An API key for Jina AI
- A free, "toy" key can be obtained from Jina AI.

See the Quickstart guide

For information on how to set up Weaviate and install the client library, see the cloud or local Quickstart guide.

Introduction

If you have used vector databases before, you may be familiar with the concept of a single vector representing an object. For example, the text "A very nice cat" could be represented by a vector such as:

[0.0412, 0.1056, 0.5021, ...]

A multi-vector embedding, on the other hand, represents the same object using a set of nested, or two-dimensional vectors. For example, the text "A very nice cat" could be represented by a ColBERT embedding as:

[
    [0.0543, 0.1941, 0.0451, ...],
    [0.0123, 0.0567, 0.1234, ...],
    ...,
    [0.4299, 0.0491, 0.9811, ...]
]

The core idea behind this representation is that the meaning of different parts of the text can be captured by different vectors. For example, the first vector might represent the token "A", the second vector might represent the token "very", and so on.

Single vs Multi-vector embedding comparison visualization

Multi-vector representations allow for more nuanced comparisons between objects, and therefore improved retrieval of similar objects.

Weaviate 1.29 introduces support for multi-vector embeddings, allowing you to store and search for objects using multi-vector embeddings.

This tutorial will show you how to use multi-vector embeddings in Weaviate, using either a ColBERT model integration (with JinaAI's model) or user-provided embeddings.

Jump to the section that interests you, or follow along with both.

ColBERT model integration
User-provided embeddings

In depth: Understanding "late interaction"

Late interaction is an approach for computing similarity between texts that preserves fine-grained meaning by comparing individual parts of the text (like words or phrases). Models like ColBERT use this technique to achieve more precise text matching than traditional single-vector methods.

The following visualization shows how late interaction works in a ColBERT model, in comparison to a single-vector model.

ColBERT late interaction vs single-vector visualization

Figure: Late interaction vs single-vector comparison

More about late interaction

In a single-vector approach, two embeddings have the same dimensionality (e.g. 768). So, their similarity is calculated directly, e.g. by calculating their dot product, or cosine distance. In this case, the only interaction occurs when the two vectors are compared.

Another approach is a "early interaction" search, as seen in some "cross-encoder" models. In this approach, the query and the object are used throughout the embedding generation and comparison process. While this can lead to more accurate results, the challenge is that embeddings cannot be pre-calculated, before the query is known. So, this approach is often used for "reranker" models where the dataset is small.

Late interaction is a middle ground between these two approaches, using multi-vector embeddings.

Each multi-vector embedding is composed of multiple vectors, where a vector represents a portion of the object, such as a token. For example, one object's embedding may have a shape of (30, 64), meaning it has 30 vectors, each with 64 dimensions. But another object's embedding may have a shape of (20, 64), meaning it has 20 vectors, each with 64 dimensions.

Late interaction takes advantage of this structure by finding the best match for each query token among all tokens in the target text (using MaxSim operation). For example, when searching for 'data science', each token-level vector is compared with the most relevant part of a document, rather than trying to match the vector for the entire phrase at once. The final similarity score combines these individual best matches. This token-level matching helps capture nuanced relationships and word order, making it especially effective for longer texts.

A late interaction search:

Compares each query vector against each object vector
Combines these token-level comparisons to produce a final similarity score

This approach often leads to better search results, as it can capture more nuanced relationships between objects.

When to use multi-vector embeddings

Multi-vector embeddings are particularly useful for search tasks where word order and exact phrase matching are important. This is due to multi-vector embeddings preserving token-level information and enabling late interaction. However, multi-vector embeddings will typically require more resources than single-vector embeddings.

Although each vector in a multi-vector embedding is smaller than a single-vector embedding, the total size of the multi-vector embedding typically larger, as each embedding contains many vectors. As an example, single-vector embedding of 1536 dimensions is (1536 * 4 bytes) = 6 kB, while a multi-vector embedding of 64 vectors of 96 dimensions is (64 * 96 * 4 bytes) = 25 kB - over 4 times larger.

Multi-vector embeddings therefore require more memory to store and more compute to search.

The inference time and/or cost for embedding generation may also be higher, as multi-vector embeddings require more compute to generate.

Therefore, multi-vector embeddings are best suited for tasks where the benefits of late interaction are important, and the additional resources required are acceptable.

Option 1: ColBERT model integration

In this section, we will use Weaviate's model integration with JinaAI's ColBERT model to generate multi-vector embeddings for text data.

1.1. Connect to Weaviate

First, connect to your Weaviate instance using your preferred client library. In this example, we assume you are connecting to a local Weaviate instance. For other types of instances, replace the connection details as needed (connection examples).

Python Client v4

import os
import weaviate

# Recommended: save sensitive data as environment variables
jinaai_key = os.getenv("JINAAI_APIKEY")

client = weaviate.connect_to_local(
    headers={"X-JinaAI-Api-Key": jinaai_key}
)

Prerequisites​

Introduction​

When to use multi-vector embeddings​

Option 1: ColBERT model integration​

1.1. Connect to Weaviate​

1.2. Collection configuration​

1.3. Import data​

1.3.1. Confirm embedding shape​

1.4. Perform queries​

1.4.1. Near text search​

1.4.2. Hybrid search (simple)​

1.4.3. Vector search​

1.4.4. Hybrid search (manual vector)​

Option 2: User-provided embeddings​

2.1. Connect to Weaviate​

2.2. Collection configuration​

2.3. Import data​

2.4. Perform queries​

2.4.1. Vector search​

2.4.2. Hybrid search (manual vector)​

Summary​

Further resources​

Questions and feedback​

Prerequisites

Introduction

When to use multi-vector embeddings

Option 1: ColBERT model integration

1.1. Connect to Weaviate

1.2. Collection configuration

1.3. Import data

1.3.1. Confirm embedding shape

1.4. Perform queries

1.4.1. Near text search

1.4.2. Hybrid search (simple)

1.4.3. Vector search

1.4.4. Hybrid search (manual vector)

Option 2: User-provided embeddings

2.1. Connect to Weaviate

2.2. Collection configuration

2.3. Import data

2.4. Perform queries

2.4.1. Vector search

2.4.2. Hybrid search (manual vector)

Summary

Further resources

Questions and feedback