Skip to main content

Keyword & Hybrid search

You can also perform keyword (BM25) searches to find items based on their keyword similarity, or hybrid searches that combine BM25 and semantic/vector searches.

Code

This example finds entries in "Movie" with the highest keyword search scores for the term "history", and prints out the title and release year of the top 5 matches.

import weaviate, { WeaviateClient, WeaviateReturn } from "weaviate-client";
let client: WeaviateClient;
let response: WeaviateReturn<undefined>

// Instantiate your client (not shown). e.g.:
// const requestHeaders = {'X-VoyageAI-Api-Key': process.env.VOYAGEAI_API_KEY as string,}
// client = weaviate.connectToWeaviateCloud(..., headers: requestHeaders) or
// client = weaviate.connectToLocal(..., headers: requestHeaders)

async function urlToBase64(imageUrl: string) {
const response = await fetch(imageUrl);
const arrayBuffer = await response.arrayBuffer();
const content = Buffer.from(arrayBuffer);
return content.toString('base64');
}

// Get the collection
const movies = client.collections.get("Movie")

// Perform query
const srcImgPath = "https://github.com/weaviate-tutorials/edu-datasets/blob/main/img/International_Space_Station_after_undocking_of_STS-132.jpg?raw=true"
const queryB64 = await urlToBase64(srcImgPath)

response = await movies.query.bm25("history", {
limit: 5,
returnMetadata: ['score'],
},
)

// Inspect the response
for (let item of response.objects) {
// Print the title and release year (note the release date is a datetime object)
console.log(`${item.properties.title} - ${item.properties.release_date}`)
// Print the distance of the object from the query
console.log(`BM25 score: ${item.metadata?.score}`)
}

client.close()

Explain the code

The results are based on a keyword search score using what's called the BM25f algorithm.

The limit parameter here sets the maximum number of results to return.

The returnMetadata parameter takes an array of strings to set metadata to return in the search results. The current query returns the score, which is the BM25 score of the result.

Example results
American History X 1998
BM25 score: 2.707

A Beautiful Mind 2001
BM25 score: 1.896

Legends of the Fall 1994
BM25 score: 1.663

Hacksaw Ridge 2016
BM25 score: 1.554

Night at the Museum 2006
BM25 score: 1.529

Code

This example finds entries in "Movie" with the highest hybrid search scores for the term "history", and prints out the title and release year of the top 5 matches.

import weaviate, { WeaviateClient, WeaviateReturn } from "weaviate-client";
let client: WeaviateClient;
let response: WeaviateReturn<undefined>

// Instantiate your client (not shown). e.g.:
// const requestHeaders = {'X-VoyageAI-Api-Key': process.env.VOYAGEAI_API_KEY as string,}
// client = weaviate.connectToWeaviateCloud(..., headers: requestHeaders) or
// client = weaviate.connectToLocal(..., headers: requestHeaders)

async function urlToBase64(imageUrl: string) {
const response = await fetch(imageUrl);
const arrayBuffer = await response.arrayBuffer();
const content = Buffer.from(arrayBuffer);
return content.toString('base64');
}

// Get the collection
const movies = client.collections.get("Movie")

// Perform query
const srcImgPath = "https://github.com/weaviate-tutorials/edu-datasets/blob/main/img/International_Space_Station_after_undocking_of_STS-132.jpg?raw=true"
const queryB64 = await urlToBase64(srcImgPath)

response = await movies.query.hybrid("history", {
limit: 5,
returnMetadata: ['score'],
returnProperties: ["title", "tmdb_id", "release_date"]
},
)

// Inspect the response
for (let item of response.objects) {
// Print the title and release year (note the release date is a datetime object)
console.log(`${item.properties.title} - ${item.properties.release_date}`)
// Print the hybrid score of the object from the query
console.log(`Hybrid score: ${item.metadata?.score}`)
}

client.close()

Explain the code

The results are based on a hybrid search score. A hybrid search blends results of BM25 and semantic/vector searches.

The limit parameter here sets the maximum number of results to return.

The returnMetadata parameter takes an array of strings to set metadata to return in the search results. The current query returns the score, which is the hybrid score of the result.

Example results
Legends of the Fall 1994
Hybrid score: 0.016

Hacksaw Ridge 2016
Hybrid score: 0.016

A Beautiful Mind 2001
Hybrid score: 0.015

The Butterfly Effect 2004
Hybrid score: 0.015

Night at the Museum 2006
Hybrid score: 0.012

Questions and feedback

If you have any questions or feedback, let us know in the user forum.