Skip to main content

Hybrid search

Hybrid search combines results of a vector search and a keyword (BM25F) search. You can set the weights or the ranking method.

Named vectors

Added in v1.24

A hybrid on collections with named vectors configured must include a target vector name in the query. This allows Weaviate to find the correct vector to compare with the query vector.

reviews = client.collections.get("WineReviewNV")
response = reviews.query.hybrid(
query="A French Riesling",
target_vector="title_country",
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:


Combines results of a vector search and a keyword search based on the query string.

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "Famine",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
},
{
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
}
]
}
}
}

Explain the search results

Use the metadata properties to understand why an object is selected.

from weaviate.classes.query import MetadataQuery

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
alpha=0.5,
return_metadata=MetadataQuery(score=True, explain_score=True),
limit=3
)

for o in response.objects:
print(o.properties)
print(o.metadata.score, o.metadata.explain_score)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"explainScore": "(bm25)\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.003968253968253968 to the score\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.012295081967213115 to the score",
"score": "0.016263336"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"explainScore": "(vector) [0.0223698 -0.02752683 -0.0061537363 0.0023812135 -0.00036100898 -0.0078375945 -0.018505432 -0.037500713 -0.0042215516 -0.012620432]... \n(hybrid) Document ec776112-e651-519d-afd1-b48e6237bbcb contributed 0.012096774193548387 to the score",
"score": "0.012096774"
},
"answer": "Famine",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
},
{
"_additional": {
"explainScore": "(vector) [0.0223698 -0.02752683 -0.0061537363 0.0023812135 -0.00036100898 -0.0078375945 -0.018505432 -0.037500713 -0.0042215516 -0.012620432]... \n(hybrid) Document 98807640-cd16-507d-86a1-801902d784de contributed 0.011904761904761904 to the score",
"score": "0.011904762"
},
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
}
]
}
}
}

Use the alpha argument to change how much each search affects the results.

  • An alpha of 1 is a pure vector search.
  • An alpha of 0 is a pure keyword search.
client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
alpha=0.25,
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
}
]
}
}
}

Change the ranking method

Added in v1.20

Ranked Fusion is the default fusion algorithm.

  • To use objects' keyword and vector search scores instead of ranks, use Relative Score Fusion.
  • To use autocut with the hybrid operator, use Relative Score Fusion.
from weaviate.classes.query import HybridFusion

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
fusion_type=HybridFusion.RELATIVE_SCORE,
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
}
]
}
}
}
Additional information

For a discussion of fusion methods, see this blog post and this reference page

Added in v1.19.0

The keyword search portion of hybrid search can be directed to only search a subset of object properties. This does not affect the vector search portion.

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
query_properties=["question"],
alpha=0.25,
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"answer": "honey",
"question": "The primary source of this food is the Apis mellifera"
}
]
}
}
}

Set weights on property values

Specify the relative value of an object's properties in the keyword search. Higher values increase the property's contribution to the search score.

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
query_properties=["question^2", "answer"],
alpha=0.25,
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
}
]
}
}
}

Specify a vector

To specify a vector instead of using a vector of the query string, pass it in addition to the query string for the keyword search.

query_vector = [-0.02] * 1536  # Some vector that is compatible with object vectors

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
query_properties=["question^2", "answer"],
vector=query_vector,
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "Risotto",
"question": "From the Italian word for rice, it's a rice dish cooked with broth & often grated cheese"
},
{
"answer": "arrabiata",
"question": "Italian for \"angry\", it describes a pasta sauce spiced up with plenty of chiles"
},
{
"answer": "Fettucine Alfredo",
"question": "Ribbon-shaped noodles, sweet butter, cream, parmesan cheese & black pepper make up this pasta dish"
}
]
}
}
}

limit & offset

Use limit to set a fixed maximum number of objects to return.

Optionally, use offset to paginate the results.

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
limit=3
)

for o in response.objects:
print(o.properties)

Limit result groups

To limit results to groups with similar distances from the query, use the autocut filter. Specify the Relative Score Fusion ranking method when you use autocut with hybrid search.

from weaviate.classes.query import HybridFusion

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
fusion_type=HybridFusion.RELATIVE_SCORE,
auto_limit=1
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "Guards",
"question": "Life, Security, Shin",
"_additional": {
"score": "0.75"
},
},
# ... trimmed for brevity
]
}
}
}

Filter results

For more specific results, use a filter to narrow your search.

from weaviate.classes.query import Filter

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
filters=Filter.by_property("round").equal("Double Jeopardy!"),
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other",
"round": "Double Jeopardy!"
},
{
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute",
"round": "Double Jeopardy!"
},
{
"answer": "gastronomy",
"question": "This word for the art & science of good eating goes back to Greek for \"belly\"",
"round": "Double Jeopardy!"
}
]
}
}
}

Tokenization

Weaviate converts filter terms into tokens. The default tokenization is word. The word tokenizer keeps alphanumeric characters and splits on whitespace. It converts a string like "test_domain_weaviate" into "test", "domain", and "weaviate".

For details and additional tokenization methods, see Tokenization.

Questions and feedback

If you have any questions or feedback, please let us know on our forum. For example, you can: