Skip to main content

Hybrid search

Hybrid search combines the results of a vector search and a keyword (BM25F) search by fusing the two result sets.

The fusion method and the relative weights are configurable.

Combine the results of a vector search and a keyword search. The search uses a single query string.

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(query="food", limit=3)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "Famine",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
},
{
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
}
]
}
}
}

Named vectors

Added in v1.24

A hybrid search on a collection that has named vectors must specify a target vector. Weaviate uses the query vector to search the target vector space.

reviews = client.collections.get("WineReviewNV")
response = reviews.query.hybrid(
query="A French Riesling",
target_vector="title_country",
limit=3
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:


Explain the search results

To see the object rankings, set the explain score field in your query. The search rankings are part of the object metadata. Weaviate uses the score to order the search results.

from weaviate.classes.query import MetadataQuery

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
alpha=0.5,
return_metadata=MetadataQuery(score=True, explain_score=True),
limit=3,
)

for o in response.objects:
print(o.properties)
print(o.metadata.score, o.metadata.explain_score)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"explainScore": "(bm25)\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.003968253968253968 to the score\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.012295081967213115 to the score",
"score": "0.016263336"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"explainScore": "(vector) [0.0223698 -0.02752683 -0.0061537363 0.0023812135 -0.00036100898 -0.0078375945 -0.018505432 -0.037500713 -0.0042215516 -0.012620432]... \n(hybrid) Document ec776112-e651-519d-afd1-b48e6237bbcb contributed 0.012096774193548387 to the score",
"score": "0.012096774"
},
"answer": "Famine",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
},
{
"_additional": {
"explainScore": "(vector) [0.0223698 -0.02752683 -0.0061537363 0.0023812135 -0.00036100898 -0.0078375945 -0.018505432 -0.037500713 -0.0042215516 -0.012620432]... \n(hybrid) Document 98807640-cd16-507d-86a1-801902d784de contributed 0.011904761904761904 to the score",
"score": "0.011904762"
},
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
}
]
}
}
}

Hybrid search results can favor the keyword component or the vector component. To change the relative weights of the keyword and vector components, set the alpha value in your query.

  • An alpha of 1 is a pure vector search.
  • An alpha of 0 is a pure keyword search.
client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
alpha=0.25,
limit=3,
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
}
]
}
}
}

Change the fusion method

Relative Score Fusion is the default fusion method starting in v1.24.

  • To use the keyword and vector search relative scores instead of the search rankings, use Relative Score Fusion.
  • To use autocut with the hybrid operator, use Relative Score Fusion.
from weaviate.classes.query import HybridFusion

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
fusion_type=HybridFusion.RELATIVE_SCORE,
limit=3,
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
}
]
}
}
}
Additional information

For a discussion of fusion methods, see this blog post and this reference page

Specify keyword search properties

Added in v1.19.0

The keyword search portion of hybrid search can be directed to only search a subset of object properties. This does not affect the vector search portion.

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
query_properties=["question"],
alpha=0.25,
limit=3,
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"answer": "honey",
"question": "The primary source of this food is the Apis mellifera"
}
]
}
}
}

Set weights on property values

Specify the relative value of an object's properties in the keyword search. Higher values increase the property's contribution to the search score.

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
query_properties=["question^2", "answer"],
alpha=0.25,
limit=3,
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
}
]
}
}
}

Specify a search vector

The vector component of hybrid search can use a query string or a query vector. To specify a query vector instead of a query string, provide a query vector (for the vector search) and a query string (for the keyword search) in your query.

query_vector = [-0.02] * 1536  # Some vector that is compatible with object vectors

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
vector=query_vector,
alpha=0.25,
limit=3,
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "Risotto",
"question": "From the Italian word for rice, it's a rice dish cooked with broth & often grated cheese"
},
{
"answer": "arrabiata",
"question": "Italian for \"angry\", it describes a pasta sauce spiced up with plenty of chiles"
},
{
"answer": "Fettucine Alfredo",
"question": "Ribbon-shaped noodles, sweet butter, cream, parmesan cheese & black pepper make up this pasta dish"
}
]
}
}
}

Vector search parameters

Added in v1.25

You can specify vector similarity search parameters similar to near text or near vector searches, such as group by and move to / move away. An equvalent distance threshold for vector search can be specified with the max vector distance parameter.

from weaviate.classes.query import HybridVector, Move, HybridFusion

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="California",
max_vector_distance=0.4, # Maximum threshold for the vector search component
vector=HybridVector.near_text(
query="large animal",
move_away=Move(force=0.5, concepts=["mammal", "terrestrial"]),
),
alpha=0.75,
limit=5,
)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "Rhinoceros",
"points": 400,
"question": "The \"black\" species of this large horned mammal can grasp twigs with its upper lip"
},
{
"answer": "the hippopotamus",
"points": 400,
"question": "Close relative of the pig, though its name means \"river horse\""
},
{
"answer": "buffalo",
"points": 400,
"question": "Animal that was the main staple of the Plains Indians economy"
},
{
"answer": "California",
"points": 200,
"question": "Its state animal is the grizzly bear, & the state tree is a type of redwood"
},
{
"answer": "California",
"points": 200,
"question": "This western state sent its first refrigerated trainload of oranges back east February 14, 1886"
}
]
}
}
}

Group results

Added in v1.25

Define criteria to group search results.

# Grouping parameters
group_by = GroupBy(
prop="round", # group by this property
objects_per_group=3, # maximum objects per group
number_of_groups=2, # maximum number of groups
)

# Query
jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
alpha=0.75,
query="California",
group_by=group_by
)

for grp_name, grp_content in response.groups.items():
print(grp_name, grp_content.objects)
Example response

The response is like this:

'Jeopardy!'
'Double Jeopardy!'

limit & offset

Use limit to set a fixed maximum number of objects to return.

Optionally, use offset to paginate the results.

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
limit=3,
offset=1
)

for o in response.objects:
print(o.properties)

Limit result groups

To limit results to groups with similar distances from the query, use the autocut filter. Specify the Relative Score Fusion ranking method when you use autocut with hybrid search.

from weaviate.classes.query import HybridFusion

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
fusion_type=HybridFusion.RANKED,
auto_limit=1
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "Guards",
"question": "Life, Security, Shin",
"_additional": {
"score": "0.75"
},
},
# ... trimmed for brevity
]
}
}
}

Filter results

To narrow your search results, use a filter.

from weaviate.classes.query import Filter


jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
filters=Filter.by_property("round").equal("Double Jeopardy!"),
limit=3,
)

for o in response.objects:
print(o.properties)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other",
"round": "Double Jeopardy!"
},
{
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute",
"round": "Double Jeopardy!"
},
{
"answer": "gastronomy",
"question": "This word for the art & science of good eating goes back to Greek for \"belly\"",
"round": "Double Jeopardy!"
}
]
}
}
}

Tokenization

Weaviate converts filter terms into tokens. The default tokenization is word. The word tokenizer keeps alphanumeric characters, lowercase them and splits on whitespace. It converts a string like "Test_domain_weaviate" into "test", "domain", and "weaviate".

For details and additional tokenization methods, see Tokenization.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.