Hybrid searches
Overview
A hybrid search combines bm25
searches that you just learned about with a vector search, producing rankings from a combination of the two results.
This can produce helpful results when a vector search or a keyword search alone is not producing desired results. For example, it may be useful when a vector search alone is producing too many irrelevant results, and you want particular keywords to weight the results a certain way.
About hybrid
queries
How it works
A hybrid search works by combining the results of a bm25
search with the results of a vector search. More specifically, it uses a combination of each result's BM25F search ranking and its vector search ranking among the set of results.
The sum of the inverse of the BM25F ranking and the vector search ranking is used to produce a final score for each result, with any weighting (alpha
) applied if applicable. The final score is then used to rank the results.
This has the effect of rewarding results that score high in at least one of the searches. For example, take the following five results:
- Result 1: BM25F ranking = 5, vector search ranking = 1 -> Total score: 1.2
- Result 2: BM25F ranking = 4, vector search ranking = 2 -> Total score: 0.75
- Result 3: BM25F ranking = 3, vector search ranking = 3 -> Total score: 0.67
- Result 4: BM25F ranking = 2, vector search ranking = 4 -> Total score: 0.75
- Result 5: BM25F ranking = 1, vector search ranking = 5 -> Total score: 1.2
In this example, results 1 and 5 end up being the top results, because they scored high in at least one of the searches. On the other hand, result 3, which was middle-of-the-pack in both searches, ends up being the lowest-ranked result.
So, hybrid search will bring to the top results that score high in at least one of the searches, while middling results will end up in the lower end of the re-ranking.
hybrid
query syntax
A hybrid query is shown below. Each hybrid query:
- Must include a query string, which can be any length,
- Can optionally include a list of
properties
to search, - Can optionally include an
alpha
value, - Can optionally include a
vector
to search for, - Can optionally request a
score
and anexplainScore
value for each result.
- Python
response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_hybrid(
query="food", # Query string
properties=["question", "answer"], # Searched properties
vector=None # Manually provide a vector; if not, Weaviate will vectorize the query string
)
.with_additional(["score", "explainScore"]) # Include score & explainScore in the response
.with_limit(3)
.do()
)
print(json.dumps(response, indent=2))
The above query will return the top 3 objects based on its BM25F score and nearText
similarity, based on the query string "food"
. The query will search the question
and answer
properties of the objects for the BM25F score (while the object vectors remain unaffected by the properties
selection).
See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"explainScore": "(bm25)\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.003968253968253968 to the score\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.012295081967213115 to the score",
"score": "0.016263336"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"explainScore": "(vector) [0.022335753 -0.027532013 -0.0061008437 0.0023294748 -0.00041679747 -0.007862403 -0.018513374 -0.037407625 -0.004291675 -0.012575763]... \n(hybrid) Document ec776112-e651-519d-afd1-b48e6237bbcb contributed 0.012096774193548387 to the score",
"score": "0.012096774"
},
"answer": "Famine",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
},
{
"_additional": {
"explainScore": "(vector) [0.022335753 -0.027532013 -0.0061008437 0.0023294748 -0.00041679747 -0.007862403 -0.018513374 -0.037407625 -0.004291675 -0.012575763]... \n(hybrid) Document 98807640-cd16-507d-86a1-801902d784de contributed 0.011904761904761904 to the score",
"score": "0.011904762"
},
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
}
]
}
}
}
hybrid
search parameters
A hybrid
search includes multiple parameters, some of which you may be familiar with from the earlier bm25
search discussions.
The query
parameter and properties
parameter are the same as in a bm25
search, with the exception that currently, the boost parameter is not supported in a hybrid
search. Some of the parameters, however, are unique to a hybrid
search.
alpha
The optional alpha
parameter determines the weighting of the BM25 search ranking and the vector search ranking. If you do not include an alpha
parameter, the hybrid
search will use a default value of 0.5
, which weights each equally.
Otherwise, an alpha
value of 1 is the same as a pure vector search, whereas an alpha
value of 0 is the same as a pure BM25 search.
Try varying the alpha
parameter above. What happens to the results?
Review
Review exercise
Key takeaways
- A hybrid search combines
bm25
search with vector search, producing rankings from a combination of the two results. - Hybrid search is helpful when a vector search or a keyword search alone is not producing desired results.
- Hybrid search orders its search results by summing the inverse of the vector and
bm25
rankings.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.