BM25 search
Overviewโ
This page shows you how to perform keyword searches using Weaviate using the bm25
parameter.
The bm25
parameter uses the BM25F ranking function to score the results.
If you haven't yet, we recommend going through the Quickstart tutorial first to get the most out of this section.
Basic BM25 searchโ
To use BM25 search, you must provide a search string as a minimum.
The below example uses default settings, looking for objects containing the keyword food
anywhere in the object.
It ranks the results using BM25F, and returns the top 3.
- Python
- JavaScript/TypeScript
- GraphQL
response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_bm25(
query="food"
)
.with_limit(3)
.do()
)
print(json.dumps(response, indent=2))
result = await client.graphql
.get()
.withClassName('JeopardyQuestion')
.withBm25({
query: 'food',
})
.withLimit(3)
.withFields('question answer')
.do();
console.log(JSON.stringify(result, null, 2));
{
Get {
JeopardyQuestion(
limit: 3
bm25: {
query: "food"
}
) {
question
answer
}
}
}
Example response
It should produce a response like the one below:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
}
]
}
}
}
Scoreโ
The score
sub-property is the BM25F score used to rank the outputs. It can be retrieved under the _additional
property.
The below example adds the score
property to the list of retrieved properties.
- Python
- JavaScript/TypeScript
- GraphQL
response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_bm25(
query="food"
)
.with_additional("score")
.with_limit(3)
.do()
)
print(json.dumps(response, indent=2))
result = await client.graphql
.get()
.withClassName('JeopardyQuestion')
.withBm25({
query: 'food',
})
.withFields('question answer _additional { score }')
.withLimit(3)
.do();
console.log(JSON.stringify(result, null, 2));
{
Get {
JeopardyQuestion(
limit: 3
bm25: {
query: "food"
}
) {
question
answer
_additional {
score
}
}
}
}
Example response
It should produce a response like the one below:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"score": "3.0140665"
},
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"_additional": {
"score": "2.8725255"
},
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"_additional": {
"score": "2.7672548"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
}
]
}
}
}
Selected properties onlyโ
You can specify the object properties
to search in.
The below example searches for objects containing the keyword food
in the question
property only, ranks them using BM25F scores of the searched property, and returns the top 3.
- Python
- JavaScript/TypeScript
- GraphQL
response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_bm25(
query="food",
properties=["question"]
)
.with_additional("score")
.with_limit(3)
.do()
)
print(json.dumps(response, indent=2))
result = await client.graphql
.get()
.withClassName('JeopardyQuestion')
.withBm25({
query: 'food',
properties: ['question'],
})
.withLimit(3)
.withFields('question answer _additional { score }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Get {
JeopardyQuestion(
limit: 3
bm25: {
query: "food"
properties: ["question"]
}
) {
question
answer
_additional {
score
}
}
}
}
Example response
It should produce a response like the one below:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"score": "3.7079012"
},
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"_additional": {
"score": "3.4311616"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"score": "2.8312314"
},
"answer": "honey",
"question": "The primary source of this food is the Apis mellifera"
}
]
}
}
}
Weighing searched propertiesโ
You can specify weighting of object properties
in how they affect the BM25F score.
The below example searches for objects containing the keyword food
in the question
property and the answer
property. Weaviate then scores the results with question
property's weighting boosted by 2, and returns the top 3.
- Python
- JavaScript/TypeScript
- GraphQL
response = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_bm25(
query="food",
properties=["question^2", "answer"]
)
.with_additional("score")
.with_limit(3)
.do()
)
print(json.dumps(response, indent=2))
result = await client.graphql
.get()
.withClassName('JeopardyQuestion')
.withBm25({
query: 'food',
properties: ['question^2', 'answer'],
})
.withLimit(3)
.withFields('question answer _additional { score }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Get {
JeopardyQuestion(
limit: 3
bm25: {
query: "food"
properties: ["question^2", "answer"]
}
) {
question
answer
_additional {
score
}
}
}
}
Example response
It should produce a response like the one below:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"score": "4.0038033"
},
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"_additional": {
"score": "3.8706005"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"score": "3.2457707"
},
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
}
]
}
}
}
Add a conditional (where
) filterโ
You can add a conditional filter to any BM25 search query, which will filter the outputs but not impact the ranking.
The below example searches for objects containing the keyword food
in any field and has the round
property of Double Jeopardy!
, ranks them using BM25F, and returns the top 3.
- Python
- JavaScript/TypeScript
- GraphQL
response = (
client.query
.get("JeopardyQuestion", ["question", "answer", "round"])
.with_bm25(
query="food"
)
.with_where({
"path": ["round"],
"operator": "Equal",
"valueText": "Double Jeopardy!"
})
.with_additional("score")
.with_limit(3)
.do()
)
print(json.dumps(response, indent=2))
result = await client.graphql
.get()
.withClassName('JeopardyQuestion')
.withBm25({
query: 'food',
})
.withWhere({
path: ['round'],
operator: 'Equal',
valueText: 'Double Jeopardy!',
})
.withLimit(3)
.withFields('question answer round _additional { score }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Get {
JeopardyQuestion(
limit: 3
bm25: {
query: "food"
}
where: {
path: ["round"]
operator: Equal
valueText: "Double Jeopardy!"
}
) {
question
answer
_additional {
score
}
}
}
}
Example response
It should produce a response like the one below:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"score": "3.0140665"
},
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other",
"round": "Double Jeopardy!"
},
{
"_additional": {
"score": "1.9633813"
},
"answer": "honey",
"question": "The primary source of this food is the Apis mellifera",
"round": "Double Jeopardy!"
},
{
"_additional": {
"score": "1.6719631"
},
"answer": "pseudopods",
"question": "Amoebas use temporary extensions called these to move or to surround & engulf food",
"round": "Double Jeopardy!"
}
]
}
}
}
More Resourcesโ
If you can't find the answer to your question here, please look at the:
- Frequently Asked Questions. Or,
- Knowledge base of old issues. Or,
- For questions: Stackoverflow. Or,
- For more involved discussion: Weaviate Community Forum. Or,
- We also have a Slack channel.