Skip to main content

Hybrid searches

Overview

A hybrid search combines bm25 searches that you just learned about with a vector search, producing rankings from a combination of the two results.

This can produce helpful results when a vector search or a keyword search alone is not producing desired results. For example, it may be useful when a vector search alone is producing too many irrelevant results, and you want particular keywords to weight the results a certain way.

About hybrid queries

How it works

A hybrid search works by combining the results of a bm25 search with the results of a vector search. More specifically, it uses a combination of each result's BM25F search ranking and its vector search ranking among the set of results.

The sum of the inverse of the BM25F ranking and the vector search ranking is used to produce a final score for each result, with any weighting (alpha) applied if applicable. The final score is then used to rank the results.

This has the effect of rewarding results that score high in at least one of the searches. For example, take the following five results:

  • Result 1: BM25F ranking = 5, vector search ranking = 1 -> Total score: 1.2
  • Result 2: BM25F ranking = 4, vector search ranking = 2 -> Total score: 0.75
  • Result 3: BM25F ranking = 3, vector search ranking = 3 -> Total score: 0.67
  • Result 4: BM25F ranking = 2, vector search ranking = 4 -> Total score: 0.75
  • Result 5: BM25F ranking = 1, vector search ranking = 5 -> Total score: 1.2

In this example, results 1 and 5 end up being the top results, because they scored high in at least one of the searches. On the other hand, result 3, which was middle-of-the-pack in both searches, ends up being the lowest-ranked result.

So, hybrid search will bring to the top results that score high in at least one of the searches, while middling results will end up in the lower end of the re-ranking.

hybrid query syntax

A hybrid query is shown below. Each hybrid query:

  • Must include a query string, which can be any length,
  • Can optionally include a list of properties to search,
  • Can optionally include an alpha value,
  • Can optionally include a vector to search for,
  • Can optionally request a score and an explainScore value for each result.
response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_hybrid(
query="food", # Query string
properties=["question", "answer"], # Searched properties
vector=None # Manually provide a vector; if not, Weaviate will vectorize the query string
)
.with_additional(["score", "explainScore"]) # Include score & explainScore in the response
.with_limit(3)
.do()
)

print(json.dumps(response, indent=2))

The above query will return the top 3 objects based on its BM25F score and nearText similarity, based on the query string "food". The query will search the question and answer properties of the objects for the BM25F score (while the object vectors remain unaffected by the properties selection).

See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"explainScore": "(bm25)\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.003968253968253968 to the score\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.012295081967213115 to the score",
"score": "0.016263336"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"explainScore": "(vector) [0.022335753 -0.027532013 -0.0061008437 0.0023294748 -0.00041679747 -0.007862403 -0.018513374 -0.037407625 -0.004291675 -0.012575763]... \n(hybrid) Document ec776112-e651-519d-afd1-b48e6237bbcb contributed 0.012096774193548387 to the score",
"score": "0.012096774"
},
"answer": "Famine",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
},
{
"_additional": {
"explainScore": "(vector) [0.022335753 -0.027532013 -0.0061008437 0.0023294748 -0.00041679747 -0.007862403 -0.018513374 -0.037407625 -0.004291675 -0.012575763]... \n(hybrid) Document 98807640-cd16-507d-86a1-801902d784de contributed 0.011904761904761904 to the score",
"score": "0.011904762"
},
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
}
]
}
}
}

hybrid search parameters

A hybrid search includes multiple parameters, some of which you may be familiar with from the earlier bm25 search discussions.

The query parameter and properties parameter are the same as in a bm25 search, with the exception that currently, the boost parameter is not supported in a hybrid search. Some of the parameters, however, are unique to a hybrid search.

alpha

The optional alpha parameter determines the weighting of the BM25 search ranking and the vector search ranking. If you do not include an alpha parameter, the hybrid search will use a default value of 0.5, which weights each equally.

Otherwise, an alpha value of 1 is the same as a pure vector search, whereas an alpha value of 0 is the same as a pure BM25 search.

Exercise

Try varying the alpha parameter above. What happens to the results?

Review

  Question
How do hybrid searches order its search results?

Review exercise

Key takeaways

  • A hybrid search combines bm25 search with vector search, producing rankings from a combination of the two results.
  • Hybrid search is helpful when a vector search or a keyword search alone is not producing desired results.
  • Hybrid search orders its search results by summing the inverse of the vector and bm25 rankings.

Questions and feedback

If you have any questions or feedback, please let us know on our forum. For example, you can: