Hybrid Search Explained

Hybrid search is a technique that combines multiple search algorithms to improve the accuracy and relevance of search results. It uses the best features of both keyword-based search algorithms with vector search techniques. By leveraging the strengths of different algorithms, it provides a more effective search experience for users.
The hybrid search feature was introduced in Weaviate 1.17. It uses sparse and dense vectors to represent the semantic meaning and context of search queries and documents.
In this blog post, you will learn the definition of hybrid search, the role of sparse and dense vectors, when to use hybrid search, and more about the implementation of hybrid search in Weaviate and how to use it.
Sparse and Dense Vectors
Sparse and dense vectors are calculated with distinct algorithms. Sparse vectors have mostly zero values with only a few non-zero values, while dense vectors mostly contain non-zero values. Sparse embeddings are generated from algorithms like BM25 and SPLADE. Dense embeddings are generated from machine learning models like GloVe and Transformers.
Note, the current implementation of hybrid search in Weaviate uses BM25/BM25F and vector search.
If you’re interested to learn about how dense vector indexes are built and optimized in Weaviate, check out this article.
BM25
BM25 builds on the keyword scoring method TF-IDF (Term-Frequency Inverse-Document Frequency) by taking the Binary Independence Model from the IDF calculation and adding a normalization penalty that weighs a document’s length relative to the average length of all the documents in the database.
The formula below presents the scoring calculation of BM25: