Skip to main content

Search operators

Overview

This page covers the search operators that can be used in queries, such as vector search operators (nearText, nearVector, nearObject, etc), keyword search operator (bm25), hybrid search operator (hybrid).

Only one search operator can be added to queries on the collection level.

Operator availability

Built-in operators

These operators are available in all Weaviate instances regardless of configuration.

Module-specific operators

Module-specific search operators are made available in certain Weaviate modules.

By adding relevant modules, you can use the following operators:

Vector search operators

nearXXX operators allow you to find data objects based on their vector similarity to the query. They query can be a raw vector (nearVector) or an object UUID (nearObject).

If the appropriate vectorizer model is enabled, a text query (nearText), an image (nearImage), or another media input may be be used as the query.

All vector search operators can be used with a certainty or distance threshold specified, as well as a limit operator or an autocut operator to specify the desired similarity or distance between the query and the results

nearVector

nearVector finds data objects closest to an input vector.

Variables

VariableRequiredTypeDescription
vectoryes[float]This variable takes a vector embedding in the form of an array of floats. The array should have the same length as the vectors in this collection.
distancenofloatThe maximum allowed distance to the provided search input. Cannot be used together with the certainty variable. The interpretation of the value of the distance field depends on the distance metric used.
certaintynofloatNormalized Distance between the result item and the search vector. Normalized to be between 0 (perfect opposite) and 1 (identical vectors). Can't be used together with the distance variable.

Example

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("Article")

response = collection.query.near_vector(
near_vector=query_vector,
distance=0.7,
limit=5,
)

for o in response.objects:
print(o.properties)

finally:
client.close()

nearObject

nearVector finds data objects closest to an existing object in the same collection. The object is typically specified by its UUID.

  • Note: You can specify an object's id or beacon in the argument, along with a desired certainty.
  • Note that the first result will always be the object used for search.

Variables

VariableRequiredTypeDescription
idyesUUIDData object identifier in the uuid format.
beaconnourlData object identifier in the beacon URL format. E.g., weaviate://<hostname>/<kind>/id.
distancenofloatThe maximum allowed distance to the provided search input. Cannot be used together with the certainty variable. The interpretation of the value of the distance field depends on the distance metric used.
certaintynofloatNormalized Distance between the result item and the search vector. Normalized to be between 0 (perfect opposite) and 1 (identical vectors). Can't be used together with the distance variable.

Example

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("Article")

response = collection.query.near_object(
near_object=object_id,
distance=0.6,
limit=5,
)

for o in response.objects:
print(o.properties)

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Publication": [
{
"_additional": {
"distance": -1.1920929e-07
},
"name": "The New York Times Company"
},
{
"_additional": {
"distance": 0.059879005
},
"name": "New York Times"
},
{
"_additional": {
"distance": 0.09176409
},
"name": "International New York Times"
},
{
"_additional": {
"distance": 0.13954824
},
"name": "New Yorker"
},
...
]
}
}
}

nearText

The nearText operator finds data objects based on their vector similarity to a natural language query.

This operator is enabled if a compatible vectorizer module is configured for the collection. Compatible vectorizer modules are:

  • Any text2vec module
  • Any multi2vec module

Variables

VariableRequiredTypeDescription
conceptsyes[string]An array of strings that can be natural language queries, or single words. If multiple strings are used, a centroid is calculated and used. Learn more about how the concepts are parsed here.
distancenofloatThe maximum allowed distance to the provided search input. Cannot be used together with the certainty variable. The interpretation of the value of the distance field depends on the distance metric used.
certaintynofloatNormalized Distance between the result item and the search vector. Normalized to be between 0 (perfect opposite) and 1 (identical vectors). Can't be used together with the distance variable.
autocorrectnobooleanAutocorrect input text values. Requires the text-spellcheck module to be present & enabled.
moveTonoobject{}Move your search term closer to another vector described by keywords
moveTo{concepts}no[string]An array of strings - natural language queries or single words. If multiple strings are used, a centroid is calculated and used.
moveTo{objects}no[UUID]Object IDs to move the results to. This is used to "bias" NLP search results into a certain direction in vector space.
moveTo{force}nofloatThe force to apply to a particular movement. Must be between 0 and 1 where 0 is equivalent to no movement and 1 is equivalent to largest movement possible.
moveAwayFromnoobject{}Move your search term away from another vector described by keywords
moveAwayFrom{concepts}no[string]An array of strings - natural language queries or single words. If multiple strings are used, a centroid is calculated and used.
moveAwayFrom{objects}no[UUID]Object IDs to move the results from. This is used to "bias" NLP search results into a certain direction in vector space.
moveAwayFrom{force}nofloatThe force to apply to a particular movement. Must be between 0 and 1 where 0 is equivalent to no movement and 1 is equivalent to largest movement possible.

Example I

This example shows an example usage the nearText operator, including how to bias results towards another search query.

import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Move
import os

client = weaviate.connect_to_local()

try:
publications = client.collections.get("Publication")

response = publications.query.near_text(
query="fashion",
distance=0.6,
move_to=Move(force=0.85, concepts="haute couture"),
move_away=Move(force=0.45, concepts="finance"),
return_metadata=wvc.query.MetadataQuery(distance=True),
limit=2
)

for o in response.objects:
print(o.properties)
print(o.metadata)

finally:
client.close()

Example II

You can also bias results toward other data objects. For example, in this query, we move our query about "travelling in asia", towards an article on food.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("Article")

response = collection.query.near_text(
query="travelling in Asia",
certainty=0.7,
move_to=wvc.query.Move(
force=0.75,
objects="c4209549-7981-3699-9648-61a78c2124b9"
),
return_metadata=wvc.query.MetadataQuery(certainty=True),
limit=5,
)

for o in response.objects:
print(o.properties)
print(o.metadata.certainty)

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"_additional": {
"certainty": 0.9619976580142975
},
"summary": "We've scoured the planet for what we think are 50 of the most delicious foods ever created. A Hong Kong best food, best enjoyed before cholesterol checks. When you have a best food as naturally delicious as these little fellas, keep it simple. Courtesy Matt@PEK/Creative Commons/FlickrThis best food Thai masterpiece teems with shrimp, mushrooms, tomatoes, lemongrass, galangal and kaffir lime leaves. It's a result of being born in a land where the world's most delicious food is sold on nearly every street corner.",
"title": "World food: 50 best dishes"
},
{
"_additional": {
"certainty": 0.9297388792037964
},
"summary": "The look reflects the elegant ambiance created by interior designer Joyce Wang in Hong Kong, while their mixology program also reflects the original venue. MONO Hong Kong , 5/F, 18 On Lan Street, Central, Hong KongKoral, The Apurva Kempinski Bali, IndonesiaKoral's signature dish: Tomatoes Bedugul. Esterre at Palace Hotel TokyoLegendary French chef Alain Ducasse has a global portfolio of restaurants, many holding Michelin stars. John Anthony/JW Marriott HanoiCantonese cuisine from Hong Kong is again on the menu, this time at the JW Marriott in Hanoi. Stanley takes its name from the elegant Hong Kong waterside district and the design touches reflect this legacy with Chinese antiques.",
"title": "20 best new Asia-Pacific restaurants to try in 2020"
}
...
]
}
}
}

Additional information

Concept parsing

A nearText query will interpret each term in an array input as distinct strings to be vectorized. If multiple strings are passed, the query vector will be an average vector of the individual string vectors.

  • ["New York Times"] = one vector position is determined based on the occurrences of the words
  • ["New", "York", "Times"] = all concepts have a similar weight.
  • ["New York", "Times"] = a combination of the two above.

A practical example would be: concepts: ["beatles", "John Lennon"]

Semantic Path
  • Only available in txt2vec-contextionary module

The semantic path returns an array of concepts from the query to the data object. This allows you to see which steps Weaviate took and how the query and data object are interpreted.

PropertyDescription
conceptthe concept that is found in this step.
distanceToNextthe distance to the next step (null for the last step).
distanceToPreviousthis distance to the previous step (null for the first step).
distanceToQuerythe distance of this step to the query.
distanceToResultthe distance of the step to this result.

Note: Building a semantic path is only possible if a nearText: {} operator is set as the explore term represents the beginning of the path and each search result represents the end of the path. Since nearText: {} queries are currently exclusively possible in GraphQL, the semanticPath is therefore not available in the REST API.

Example: showing a semantic path without edges.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
# Semantic path is not yet supported by the V4 client. Please use a raw GraphQL query instead.
response = client.graphql_raw_query(
"""
{
Get {
Publication (
nearText:{
concepts: ["fashion"],
distance: 0.6,
moveAwayFrom: {
concepts: ["finance"],
force: 0.45
},
moveTo: {
concepts: ["haute couture"],
force: 0.85
}
}
) {
name
_additional {
semanticPath {
path {
concept
distanceToNext
distanceToPrevious
distanceToQuery
distanceToResult
}
}
}
}
}
}
"""
)

finally:
client.close()

Depending on the vectorizer module, you can use additional modalities such as images, audio, or video as the query, and retrieve corresponding, compatible objects.

Some modules, such as multi2vec-clip and multi2vec-bind allow you to search across modalities. For example, you can search for images using a text query, or search for text using an image query.

Please refer to specific module pages such as:

hybrid

This operator allows you to combine BM25 and vector search to get a "best of both worlds" type search results set.

Variables

VariablesRequiredTypeDescription
queryyesstringsearch query
alphanofloatweighting for each search algorithm, default 0.75
vectorno[float]optional to supply your own vector
propertiesno[string]list of properties to limit the BM25 search to, default all text properties
fusionTypenostringthe type of hybrid fusion algorithm (available from v1.20.0)
  • Notes:
    • alpha can be any number from 0 to 1, defaulting to 0.75.
      • alpha = 0 forces using a pure keyword search method (BM25)
      • alpha = 1 forces using a pure vector search method
      • alpha = 0.5 weighs the BM25 and vector methods evenly
    • fusionType can be rankedFusion or relativeScoreFusion
      • rankedFusion (default) adds inverted ranks of the BM25 and vector search methods
      • relativeScoreFusion adds normalized scores of the BM25 and vector search methods

Fusion algorithms

Ranked fusion

The rankedFusion algorithm is Weaviate's original hybrid fusion algorithm.

In this algorithm, each object is scored according to its position in the results for that search (vector or keyword). The top-ranked objects in each search get the highest scores. Scores decrease going from top to least ranked. The total score is calculated by adding the rank-based scores from the vector and keyword searches.

Relative score fusion

New in Weaviate version 1.20.

In relativeScoreFusion the vector search and keyword search scores are scaled between 0 and 1. The highest raw score becomes 1 in the scaled scores. The lowest value is assigned 0. The remaining values are ranked between 0 and 1. The total score is a scaled sum of the normalized vector similarity and normalized BM25 scores.

Fusion scoring comparison

This example uses a small search result set to compare the ranked fusion and relative fusion algorithms. The table shows the following information:

  • document id, from 0 to 4
  • keyword score, sorted
  • vector search score, sorted
Search Type(id): score(id): score(id): score(id): score(id): score
Keyword(1): 5(0): 2.6(2): 2.3(4): 0.2(3): 0.09
Vector(2): 0.6(4): 0.598(0): 0.596(1): 0.594(3): 0.009

The ranking algorithms use these scores to derive the hybrid ranking.

Ranked Fusion

The score depends on the rank of the result. The score is equal to 1/(RANK + 60):

Search Type(id): score(id): score(id): score(id): score(id): score
Keyword(1): 0.0154(0): 0.0160(2): 0.0161(4): 0.0167(3): 0.0166
Vector(2): 0.016502(4): 0.016502(0): 0.016503(1): 0.016503(3): 0.016666

As you can see, the results of each rank is identical, regardless of the input score.

Relative Score Fusion

Here, we normalize the scores – the largest score is set to 1 and the lowest to 0, and all entries in-between are scaled according to their relative distance to the maximum and minimum values.

Search Type(id): score(id): score(id): score(id): score(id): score
Keyword(1): 1.0(0): 0.511(2): 0.450(4): 0.022(3): 0.0
Vector(2): 1.0(4): 0.996(0): 0.993(1): 0.986(3): 0.0

Here, the scores reflect the relative distribution of the original scores. For example, the vector search scores of the first 4 documents were almost identical, which is still the case for the normalized scores.

Weighting & final scores

Before adding these scores up, they are weighted according to the alpha parameter. Let’s assume alpha=0.5, meaning both search types contribute equally to the final result and therefore each score is multiplied by 0.5.

Now, we can add the scores for each document up and compare the results from both fusion algorithms.

Algorithm Type(id): score(id): score(id): score(id): score(id): score
Ranked(2): 0.016301(1): 0.015952(0): 0.015952(4): 0.016600(3): 0.016630
Relative(1): 0.993(0): 0.752(2): 0.725(4): 0.509(3): 0.0

What can we learn from this?

For the vector search, the scores for the top 4 objects (IDs 2, 4, 0, 1) were almost identical, and all of them were good results. While for the keyword search, one object (ID 1) was much better than the rest.

This is captured in the final result of relativeScoreFusion, which identified the object ID 1 the top result. This is justified because this document was the best result in the keyword search with a big gap to the next-best score and in the top group of vector search.

In contrast, for rankedFusion, the object ID 2 is the top result, closely followed by objects ID 1 and ID 0.

For a fuller discussion of fusion methods, see this blog post

Additional metadata response

Hybrid search results are sorted by a score, derived as a fused combination of their BM25F score and nearText similarity (higher is more relevant). This score, and additionally the explainScore metadata can be optionally retrieved in the response.

Example

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("Article")

response = collection.query.hybrid(
query="Fisherman that catches salmon",
alpha=0.5,
return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True),
limit=5,
)

for o in response.objects:
print(o.properties)
print(o.metadata.score)
print(o.metadata.explain_score)

finally:
client.close()

Example with vector specified

You can optionally supply the vector query to the vector variable. This will override the query variable for the vector search component of the hybrid search.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("Article")

response = collection.query.hybrid(
query="Fisherman that catches salmon",
vector=query_vector,
alpha=0.5,
return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True),
limit=5,
)

for o in response.objects:
print(o.properties)
print(o.metadata.score)
print(o.metadata.explain_score)

finally:
client.close()

Hybrid with a conditional filter

Added in v1.18.0

A conditional (where) filter can be used with hybrid.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("Article")

response = collection.query.hybrid(
query="How to catch an Alaskan Pollock",
alpha=0.5,
filters=wvc.query.Filter.by_property("wordCount").less_than(1000),
limit=5,
)

for o in response.objects:
print(o.properties)

finally:
client.close()
Added in v1.19

A hybrid operator can accept an array of strings to limit the set of properties for the BM25 component of the search. If unspecified, all text properties will be searched.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("JeopardyQuestion")

response = collection.query.hybrid(
query="Venus",
alpha=0.25,
query_properties=["question"],
return_metadata=wvc.query.MetadataQuery(score=True),
limit=5,
)

for o in response.objects:
print(o.properties)
print(o.metadata.score)

finally:
client.close()

Oversearch with relativeScoreFusion

Added in v1.21

When relativeScoreFusion is used as the fusionType with a small search limit, a result set can be very sensitive to the limit parameter due to the normalization of the scores.

To mitigate this effect, Weaviate automatically performs a search with a higher limit (100) and then trims the results down to the requested limit.

BM25

The bm25 operator performs a keyword (sparse vector) search, and uses the BM25F ranking function to score the results. BM25F (Best Match 25 with Extension to Multiple Weighted Fields) is an extended version of BM25 that applies the scoring algorithm to multiple fields (properties), producing better results.

The search is case-insensitive, and case matching does not confer a score advantage. Stop words are removed. Stemming is not supported yet.

Schema configuration

The free parameters k1 and b are configurable and optional. See the schema reference for more details.

Variables

The bm25 operator supports the following variables:

VariablesRequiredDescription
queryyesThe keyword search query.
propertiesnoArray of properties (fields) to search in, defaulting to all properties in the collection.
Boosting properties

Specific properties can be boosted by a factor specified as a number after the caret sign, for example properties: ["title^3", "summary"].

Additional metadata response

The BM25F score metadata can be optionally retrieved in the response. A higher score indicates higher relevance.

Example query

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("Article")

response = collection.query.bm25(
query="fox",
query_properties=["title"],
return_metadata=wvc.query.MetadataQuery(score=True),
limit=5,
)

for o in response.objects:
print(o.properties)
print(o.metadata.score)

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"_additional": {
"certainty": null,
"distance": null,
"score": "3.4985464"
},
"title": "Tim Dowling: is the dog’s friendship with the fox sweet – or a bad omen?"
}
]
}
},
"errors": null
}

BM25 with a conditional filter

Added in v1.18

A conditional (where) filter can be used with bm25.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
collection = client.collections.get("Article")

response = collection.query.bm25(
query="how to fish",
return_metadata=wvc.query.MetadataQuery(score=True),
filters=wvc.query.Filter.by_property("wordCount").less_than(1000),
limit=5,
)

for o in response.objects:
print(o.properties)
print(o.metadata.score)

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"summary": "Sometimes, the hardest part of setting a fishing record is just getting the fish weighed. A Kentucky fisherman has officially set a new record in the state after reeling in a 9.05-pound saugeye. While getting the fish in the boat was difficult, the angler had just as much trouble finding an officially certified scale to weigh it on. In order to qualify for a state record, fish must be weighed on an officially certified scale. The previous record for a saugeye in Kentucky ws an 8 pound, 8-ounce fish caught in 2019.",
"title": "Kentucky fisherman catches record-breaking fish, searches for certified scale"
},
{
"summary": "Unpaid last month because there wasn\u2019t enough money. Ms. Hunt picks up shifts at JJ Fish & Chicken, bartends and babysits. three daughters is subsidized,and cereal fromErica Hunt\u2019s monthly budget on $12 an hourErica Hunt\u2019s monthly budget on $12 an hourExpensesIncome and benefitsRent, $775Take-home pay, $1,400Varies based on hours worked. Daycare, $600Daycare for Ms. Hunt\u2019s three daughters is subsidized, as are her electricity and internet costs. Household goods, $300Child support, $350Ms. Hunt picks up shifts at JJ Fish & Chicken, bartends and babysits to make more money.",
"title": "Opinion | What to Tell the Critics of a $15 Minimum Wage"
},
...
]
}
}
}

ask

Enabled by the module: Question Answering.

This operator allows you to return answers to questions by running the results through a Q&A model.

Variables

VariableRequiredTypeDescription
questionyesstringThe question to be answered.
certaintynofloatDesired minimal certainty or confidence of answer to the question. The higher the value, the stricter the search becomes. The lower the value, the fuzzier the search becomes. If no certainty is set, any answer that could be extracted will be returned.
propertiesno[string]The properties of the queries collection which contains text. If no properties are set, all are considered.
reranknobooleanIf enabled, the qna module will rerank the result based on the answer score. For example, if the 3rd result - as determined by the previous (semantic) search contained the most likely answer, result 3 will be pushed to position 1, etc. Supported since v1.10.0

Example

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

try:
# QnA module use is not yet supported by the V4 client. Please use a raw GraphQL query instead.
response = client.graphql_raw_query(
"""
{
Get {
Article(
ask: {
question: "Who is the king of the Netherlands?",
properties: ["summary"],
},
limit: 1
) {
title
_additional {
answer {
hasAnswer
property
result
startPosition
endPosition
}
}
}
}
}
"""

finally:
client.close()

Additional metadata response

The answer and a certainty can be retrieved.