Skip to main content

BM25 search

Overviewโ€‹

This page shows you how to perform keyword searches using Weaviate using the bm25 parameter.

The bm25 parameter uses the BM25F ranking function to score the results.

Prerequisites

If you haven't yet, we recommend going through the Quickstart tutorial first to get the most out of this section.

To use BM25 search, you must provide a search string as a minimum.

The below example uses default settings, looking for objects containing the keyword food anywhere in the object.

It ranks the results using BM25F, and returns the top 3.

response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_bm25(
query="food"
)
.with_limit(3)
.do()
)

print(json.dumps(response, indent=2))
Example response

It should produce a response like the one below:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
}
]
}
}
}

Scoreโ€‹

The score sub-property is the BM25F score used to rank the outputs. It can be retrieved under the _additional property.

The below example adds the score property to the list of retrieved properties.

response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_bm25(
query="food"
)
.with_additional("score")
.with_limit(3)
.do()
)

print(json.dumps(response, indent=2))
Example response

It should produce a response like the one below:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"score": "3.0140665"
},
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"_additional": {
"score": "2.8725255"
},
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"_additional": {
"score": "2.7672548"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
}
]
}
}
}

Selected properties onlyโ€‹

You can specify the object properties to search in.

The below example searches for objects containing the keyword food in the question property only, ranks them using BM25F scores of the searched property, and returns the top 3.

response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_bm25(
query="food",
properties=["question"]
)
.with_additional("score")
.with_limit(3)
.do()
)

print(json.dumps(response, indent=2))
Example response

It should produce a response like the one below:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"score": "3.7079012"
},
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"_additional": {
"score": "3.4311616"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"score": "2.8312314"
},
"answer": "honey",
"question": "The primary source of this food is the Apis mellifera"
}
]
}
}
}

Weighing searched propertiesโ€‹

You can specify weighting of object properties in how they affect the BM25F score.

The below example searches for objects containing the keyword food in the question property and the answer property. Weaviate then scores the results with question property's weighting boosted by 2, and returns the top 3.

response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_bm25(
query="food",
properties=["question^2", "answer"]
)
.with_additional("score")
.with_limit(3)
.do()
)

print(json.dumps(response, indent=2))
Example response

It should produce a response like the one below:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"score": "4.0038033"
},
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"_additional": {
"score": "3.8706005"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"score": "3.2457707"
},
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
}
]
}
}
}

Add a conditional (where) filterโ€‹

You can add a conditional filter to any BM25 search query, which will filter the outputs but not impact the ranking.

The below example searches for objects containing the keyword food in any field and has the round property of Double Jeopardy!, ranks them using BM25F, and returns the top 3.

response = (
client.query
.get("JeopardyQuestion", ["question", "answer", "round"])
.with_bm25(
query="food"
)
.with_where({
"path": ["round"],
"operator": "Equal",
"valueText": "Double Jeopardy!"
})
.with_additional("score")
.with_limit(3)
.do()
)

print(json.dumps(response, indent=2))
Example response

It should produce a response like the one below:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"score": "3.0140665"
},
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other",
"round": "Double Jeopardy!"
},
{
"_additional": {
"score": "1.9633813"
},
"answer": "honey",
"question": "The primary source of this food is the Apis mellifera",
"round": "Double Jeopardy!"
},
{
"_additional": {
"score": "1.6719631"
},
"answer": "pseudopods",
"question": "Amoebas use temporary extensions called these to move or to surround & engulf food",
"round": "Double Jeopardy!"
}
]
}
}
}

More Resourcesโ€‹

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For more involved discussion: Weaviate Community Forum. Or,
  5. We also have a Slack channel.