← Back to Blogs

Skip to main content

Token Pooling to Scale Multi-Vector Retrieval Systems

June 30, 2024 · 2 min read

Developer Advocate

A preview of the paper

🏹Multi-vector retrieval approaches, like ColBERT, have great retrieval quality but vector count can balloon, AnswerAI propose a solution!

Below is an explanation of how ColBERT works and AnswerAI's proposed modification!

Breakdown of different types of encoders:

Cross-encoders:

Document text & query text strings concatenated and passed into a cross-encoder which then outputs a rank/score.

Bi-encoders:

Document text passed into an encoder and generates a document embedding
Query text separately passed into an encoder and generates a query embedding
Similarity of query and doc embedding calculated
Retrieval performance can suffer especially on Out-Of-Domain data

Multi-vector bi-encoder - (eg. ColBERT):

Functions as a bi-encoder: all documents representations are pre-computed in isolation
Similarity computation occurs between individual query and document token vectors, as opposed to the full document.

Main weakness of multi-vector approaches:

Storage and memory usage balloons up, each token in a document requires a vector(as opposed to one document = one vector)
Complicated to efficiently search through multiple vectors

AnswerAI's Proposed Token Pooling Solution:

Main Idea: a lot of the tokens are likely to carry somewhat redundant semantic information, we can semantically cluster them!
Requires no model modification whatsoever, nor any complex processing, while greatly improving the scalability of easily updatable indexing methods - like HNSW, which are typically harder to use with ColBERT.

Approach:

Token pooling by clustering similar tokens within a given document and averaging (mean pooling) their representation.
After being pooled, the vectors are then quantised to 2-bits using the ColBERTv2 quantisation approach
Each cluster is represented by a single token by averaging the values of contained tokens

Results:

All results compared to non-pooled ColBERT vector approach
Pooling by a factor 2 achieves a 50% vector count reduction and 100.6% retrieval performance on average.
Pool factor = 3 achieves 66% reduction while reaching 99% performance.
Pool factor = 4 achieves 75% reduction while reaching 97% performance.

Ready to start building?

Check out the Quickstart tutorial, or build amazing apps with a free trial of Weaviate Cloud (WCD).

GitHub

Forum

Slack

X (Twitter)

Don't want to miss another blog post?

Sign up for our bi-weekly newsletter to stay updated!

By submitting, I agree to the Terms of Service and Privacy Policy.

🏹Multi-vector retrieval approaches, like ColBERT, have great retrieval quality but vector count can balloon, AnswerAI propose a solution!
Breakdown of different types of encoders:
Main weakness of multi-vector approaches:
AnswerAI's Proposed Token Pooling Solution:
Results: