A hybrid search combines
bm25 searches that you just learned about with a vector search, producing rankings from a combination of the two results.
This can produce helpful results when a vector search or a keyword search alone is not producing desired results. For example, it may be useful when a vector search alone is producing too many irrelevant results, and you want particular keywords to weight the results a certain way.
How it works
A hybrid search works by combining the results of a
bm25 search with the results of a vector search. More specifically, it uses a combination of each result's BM25F search ranking and its vector search ranking among the set of results.
The sum of the inverse of the BM25F ranking and the vector search ranking is used to produce a final score for each result, with any weighting (
alpha) applied if applicable. The final score is then used to rank the results.
This has the effect of rewarding results that score high in at least one of the searches. For example, take the following five results:
- Result 1: BM25F ranking = 5, vector search ranking = 1 -> Total score: 1.2
- Result 2: BM25F ranking = 4, vector search ranking = 2 -> Total score: 0.75
- Result 3: BM25F ranking = 3, vector search ranking = 3 -> Total score: 0.67
- Result 4: BM25F ranking = 2, vector search ranking = 4 -> Total score: 0.75
- Result 5: BM25F ranking = 1, vector search ranking = 5 -> Total score: 1.2
In this example, results 1 and 5 end up being the top results, because they scored high in at least one of the searches. On the other hand, result 3, which was middle-of-the-pack in both searches, ends up being the lowest-ranked result.
So, hybrid search will bring to the top results that score high in at least one of the searches, while middling results will end up in the lower end of the re-ranking.
hybrid query syntax
A hybrid query is shown below. Each hybrid query:
- Must include a query string, which can be any length,
- Can optionally include a list of
- Can optionally include an
- Can optionally include a
vectorto search for,
- Can optionally request a
explainScorevalue for each result.
response = (
.get("JeopardyQuestion", ["question", "answer"])
query="food", # Query string
properties=["question", "answer"], # Searched properties
vector=None # Manually provide a vector; if not, Weaviate will vectorize the query string
.with_additional(["score", "explainScore"]) # Include score & explainScore in the response
The above query will return the top 3 objects based on its BM25F score and
nearText similarity, based on the query string
"food". The query will search the
answer properties of the objects for the BM25F score (while the object vectors remain unaffected by the
See the JSON response
"explainScore": "(bm25)\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.003968253968253968 to the score\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.012295081967213115 to the score",
"answer": "a closer grocer",
"question": "A nearer food merchant"
"explainScore": "(vector) [0.022335753 -0.027532013 -0.0061008437 0.0023294748 -0.00041679747 -0.007862403 -0.018513374 -0.037407625 -0.004291675 -0.012575763]... \n(hybrid) Document ec776112-e651-519d-afd1-b48e6237bbcb contributed 0.012096774193548387 to the score",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
"explainScore": "(vector) [0.022335753 -0.027532013 -0.0061008437 0.0023294748 -0.00041679747 -0.007862403 -0.018513374 -0.037407625 -0.004291675 -0.012575763]... \n(hybrid) Document 98807640-cd16-507d-86a1-801902d784de contributed 0.011904761904761904 to the score",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
hybrid search parameters
hybrid search includes multiple parameters, some of which you may be familiar with from the earlier
bm25 search discussions.
query parameter and
properties parameter are the same as in a
bm25 search, with the exception that currently, the boost parameter is not supported in a
hybrid search. Some of the parameters, however, are unique to a
alpha parameter determines the weighting of the BM25 search ranking and the vector search ranking. If you do not include an
alpha parameter, the
hybrid search will use a default value of
0.5, which weights each equally.
alpha value of 1 is the same as a pure vector search, whereas an
alpha value of 0 is the same as a pure BM25 search.
Try varying the
alpha parameter above. What happens to the results?
- A hybrid search combines
bm25search with vector search, producing rankings from a combination of the two results.
- Hybrid search is helpful when a vector search or a keyword search alone is not producing desired results.
- Hybrid search orders its search results by summing the inverse of the vector and