Skip to main content

GraphQL - Vector search parameters

LICENSEย Weaviate on Stackoverflow badgeย Weaviate issues on GitHub badgeย Weaviate version badgeย Weaviate total Docker pulls badgeย Go Report Card

TIP: Try these queries

You can try these queries on our demo instance (https://edu-demo.weaviate.network). You can authenticate against it with the read-only Weaviate API key learn-weaviate, and run the query with your preferred Weaviate client.


We include client instantiation examples below:

edu-demo client instantiation
import weaviate

# Instantiate the client with the auth config
client = weaviate.Client(
url="https://edu-demo.weaviate.network",
auth_client_secret=weaviate.AuthApiKey(api_key="learn-weaviate"),
additional_headers={
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY", # Only needed for `nearText` or `hybrid` queries
},
)

Setting search parametersโ€‹

Vector search parameters are added to GraphQL queries on the class level.

For example:

{
Get {
<Class> (
<filter>: {
variables: values
}
){
property
}
}
}

Built-in parametersโ€‹

Built-in search parameters are available in all Weaviate instances and don't require any modules.

Weaviate provides the following built-in parameters:

Module-specific parametersโ€‹

Module-specific search parameters are made available in certain Weaviate modules.

By adding relevant modules, you can use the following parameters:

nearVectorโ€‹

This filter allows you to find data objects in the vicinity of an input vector. It's supported by the Get{} function.

  • Note: this argument is different from the GraphQL Explore{} function
  • Note: Cannot use multiple 'near' arguments, or a 'near' argument along with an 'ask' filter

Variablesโ€‹

VariablesRequiredTypeDescription
vectoryes[float]This variable takes a vector embedding in the form of an array of floats. The array should have the same length as the vectors in this class.
distancenofloatThe required degree of similarity between an object's characteristics and the provided filter values. Can't be used together with the certainty variable. The interpretation of the value of the distance field depends on the distance metric used.
certaintynofloatNormalized Distance between the result item and the search vector. Normalized to be between 0 (perfect opposite) and 1 (identical vectors). Can't be used together with the distance variable.

Exampleโ€‹

import weaviate

client = weaviate.Client("http://localhost:8080")

nearVector = {
"vector": [0.1, -0.15, 0.3.. ] # Replace with a compatible vector
}

result = (
client.query
.get("Publication", "name")
.with_additional("distance")
.with_near_vector(nearVector)
.do()
)

print(result)

Additional informationโ€‹

If the distance metric is cosine you can also use certainty instead of distance. Certainty normalizes the distance in a range of 0..1, where 0 represents a perfect opposite (cosine distance of 2) and 1 represents vectors with an identical angle (cosine distance of 0). Certainty is not available on non-cosine distance metrics.

nearObjectโ€‹

This filter allows you to find data objects in the vicinity of other data objects by UUID. It's supported by the Get{} function.

  • Note: You cannot use multiple near<Media> arguments, or a near<Media> argument along with an ask argument.
  • Note: You can specify an object's id or beacon in the argument, along with a desired certainty.
  • Note that the first result will always be the object in the filter itself.
  • Near object search can also be combined with text2vec modules.

Variablesโ€‹

VariablesRequiredTypeDescription
idyesUUIDData object identifier in the uuid format.
beaconyesurlData object identifier in the beacon URL format. E.g., weaviate://<hostname>/<kind>/id.
distancenofloatThe required degree of similarity between an object's characteristics and the provided filter values. Can't be used together with the certainty variable. The interpretation of the value of the distance field depends on the distance metric used.
certaintynofloatNormalized Distance between the result item and the search vector. Normalized to be between 0 (perfect opposite) and 1 (identical vectors). Can't be used together with the distance variable.

Exampleโ€‹

import weaviate

client = weaviate.Client("http://localhost:8080")

nearObject = {"id": "32d5a368-ace8-3bb7-ade7-9f7ff03eddb6"} # or {"beacon": "weaviate://localhost/32d5a368-ace8-3bb7-ade7-9f7ff03eddb6"}

result = (
client.query
.get("Publication", "name")
.with_additional("distance") # "certainty" only supported if distance==cosine
.with_near_object(nearObject)
.do()
)

print(result)
Expected response
{
"data": {
"Get": {
"Publication": [
{
"_additional": {
"distance": -1.1920929e-07
},
"name": "The New York Times Company"
},
{
"_additional": {
"distance": 0.059879005
},
"name": "New York Times"
},
{
"_additional": {
"distance": 0.09176409
},
"name": "International New York Times"
},
{
"_additional": {
"distance": 0.13954824
},
"name": "New Yorker"
},
...
]
}
}
}

hybridโ€‹

This filter allows you to combine BM25 and vector search to get the best of both search methods. It's supported by the Get{} function.

Variablesโ€‹

VariablesRequiredTypeDescription
queryyesstringsearch query
alphanofloatweighting for each search algorithm, default 0.75
vectorno[float]optional to supply your own vector
propertiesno[string]list of properties to limit the BM25 search to, default all text properties
  • Note: alpha can be any number from 0 to 1, defaulting to 0.75.
    • alpha = 0 forces using a pure keyword search method (BM25)
    • alpha = 1 forces using a pure vector search method
    • alpha = 0.5 weighs the BM25 and vector methods evenly

GraphQL responseโ€‹

The _additional property in the GraphQL result exposes the score. Results are sorted descending by the score.

{
"_additional": {
"score": "0.016390799"
}
}

Exampleโ€‹


result = (
client.query
.get("Article", ["title", "summary"])
.with_additional(["score", "explainScore"])
.with_hybrid("Fisherman that catches salmon", alpha=0.5)
.do()
)

Example with vector parameterโ€‹

If you're providing your own embeddings, you can supply the vector query to the vector variable. If Weaviate is handling the vectorization, then you can ignore the vector variable and use the example code snippets above.

result = (
client.query
.get("Article", ["title", "summary"])
.with_additional(["score"])
.with_hybrid("Fisherman that catches salmon", alpha=0.5, vector=[1, 2, 3])
.do()
)

Hybrid with Where filterโ€‹

Starting with v1.18, you can use where filters with hybrid.

where_filter = {
"path": ["wordCount"],
"operator": "LessThan",
"valueInt": "1000"
}
query_result = (
client.query
.get("Article", ["title", "summary"])
.with_where(where_filter)
.with_hybrid(query= "How to catch an Alaskan Pollock",alpha=0.5)
.do()
)

Limiting BM25 propertiesโ€‹

Starting with v1.19, hybrid accepts a properties array of strings that limits the set of properties that will be searched by the BM25 component of the search. If not specified, all text properties will be searched.

In the examples below, the alpha parameter is set close to 0 to favor BM25 search, and changing the properties from "question" to "answer" will yield a different set of results.

result = (
client.query
.get("JeopardyQuestion", ["question", "answer"])
.with_additional(["score"])
.with_hybrid(
"Venus",
alpha=0.25, # closer to pure keyword search
properties=["question"] # changing to "answer" will yield a different result set
)
.with_limit(3)do()
)

print(json.dumps(result, indent=4))

BM25โ€‹

The bm25 operator performs a keyword (sparse vector) search, and uses the BM25F ranking function to score the results. BM25F (Best Match 25 with Extension to Multiple Weighted Fields) is an extended version of BM25 that applies the scoring algorithm to multiple fields (properties), producing better results.

The search is case-insensitive, and case matching does not confer a score advantage. Stop words are removed. Stemming is not supported yet.

Schema configurationโ€‹

The free parameters k1 and b are configurable and optional. See the schema reference for more details.

Variablesโ€‹

The bm25 operator supports the following variables:

VariablesRequiredDescription
queryyesthe keyword search query
propertiesnoarray of properties (fields) to search in, defaulting to all properties in the class
Boosting properties

Specific properties can be boosted by a factor specified as a number after the caret sign, for example properties: ["title^3", "summary"].

Example queryโ€‹

import weaviate

client = weaviate.Client("http://localhost:8080")

bm25 = {
"query": "fox",
"properties": ["title"], # by default, all properties are searched
}

result = (
client.query
.get("Article", ["title", "_additional {score} "])
.with_bm25(**bm25)
.do()
)

print(result)

GraphQL responseโ€‹

The _additional property in the GraphQL result exposes the score:

{
"_additional": {
"score": "5.3201",
"distance": null, # always null
"certainty": null # always null
}
}
Expected response
{
"data": {
"Get": {
"Article": [
{
"_additional": {
"certainty": null,
"distance": null,
"score": "3.4985464"
},
"title": "Tim Dowling: is the dogโ€™s friendship with the fox sweet โ€“ or a bad omen?"
}
]
}
},
"errors": null
}

BM25 with Where Filterโ€‹

Introduced in v1.18, you can now use where filters with bm25.

where_filter = {
"path": ["wordCount"],
"operator": "LessThan",
"valueInt": "1000"
}

query_result = (
client.query
.get("Article", ["title", "summary"])
.with_where(where_filter)
.with_bm25(query="how to fish")
.do()
)
Expected response
{
"data": {
"Get": {
"Article": [
{
"summary": "Sometimes, the hardest part of setting a fishing record is just getting the fish weighed. A Kentucky fisherman has officially set a new record in the state after reeling in a 9.05-pound saugeye. While getting the fish in the boat was difficult, the angler had just as much trouble finding an officially certified scale to weigh it on. In order to qualify for a state record, fish must be weighed on an officially certified scale. The previous record for a saugeye in Kentucky ws an 8 pound, 8-ounce fish caught in 2019.",
"title": "Kentucky fisherman catches record-breaking fish, searches for certified scale"
},
{
"summary": "Unpaid last month because there wasn\u2019t enough money. Ms. Hunt picks up shifts at JJ Fish & Chicken, bartends and babysits. three daughters is subsidized,and cereal fromErica Hunt\u2019s monthly budget on $12 an hourErica Hunt\u2019s monthly budget on $12 an hourExpensesIncome and benefitsRent, $775Take-home pay, $1,400Varies based on hours worked. Daycare, $600Daycare for Ms. Hunt\u2019s three daughters is subsidized, as are her electricity and internet costs. Household goods, $300Child support, $350Ms. Hunt picks up shifts at JJ Fish & Chicken, bartends and babysits to make more money.",
"title": "Opinion | What to Tell the Critics of a $15 Minimum Wage"
},
...
]
}
}
}

groupโ€‹

You can use a group operator to combine similar concepts (aka entity merging). There are two ways of grouping objects with a semantic similarity together.

Variablesโ€‹

VariablesRequiredTypeDescription
typeyesstringYou can only show the closest concept (closest) or merge all similar entities into one single string (merge).
forceyesfloatThe force to apply for a particular movements. Must be between 0 and 1 where 0 is equivalent to no movement and 1 is equivalent to largest movement possible.

Exampleโ€‹

import weaviate

client = weaviate.Client("http://localhost:8080")

get_articles_group = """
{
Get {
Publication(
group:{
type: merge,
force:0.05
}
) {
name
}
}
}
"""

query_result = client.query.raw(get_articles_group)
print(query_result)

This results in the following. Note that publications International New York Times, The New York Times Company and New York Times are merged. The property values that do not have an exact overlap will all be shown, with the value of the most central concept before the brackets.

Expected response
{
"data": {
"Get": {
"Publication": [
{
"name": "Fox News"
},
{
"name": "Wired"
},
{
"name": "The New York Times Company (New York Times, International New York Times)"
},
{
"name": "Game Informer"
},
{
"name": "New Yorker"
},
{
"name": "Wall Street Journal"
},
{
"name": "Vogue"
},
{
"name": "The Economist"
},
{
"name": "Financial Times"
},
{
"name": "The Guardian"
},
{
"name": "CNN"
}
]
}
}
}

nearTextโ€‹

Enabled by the modules:

This filter allows you to find data objects in the vicinity of the vector representation of a single or multiple concepts. It's supported by the Get{} function.

Variablesโ€‹

VariablesRequiredTypeDescription
conceptsyes[string]An array of strings that can be natural language queries, or single words. If multiple strings are used, a centroid is calculated and used. Learn more about how the concepts are parsed here.
certaintynofloatThe required degree of similarity between an object's characteristics and the provided filter values.
Values can be between 0 (no match) and 1 (perfect match).
Can't be used together with the distance variable.
distancenofloatNormalized Distance between the result item and the search vector.
The interpretation of the value of the distance field depends on the distance metric used.
Can't be used together with the certainty variable.
autocorrectnobooleanAutocorrect input text values
moveTonoobject{}Move your search term closer to another vector described by keywords
moveTo{concepts}no[string]An array of strings - natural language queries or single words. If multiple strings are used, a centroid is calculated and used.
moveTo{objects}no[UUID]Object IDs to move the results to. This is used to "bias" NLP search results into a certain direction in vector space.
moveTo{force}nofloatThe force to apply to a particular movement. Must be between 0 and 1 where 0 is equivalent to no movement and 1 is equivalent to largest movement possible.
moveAwayFromnoobject{}Move your search term away from another vector described by keywords
moveAwayFrom{concepts}no[string]An array of strings - natural language queries or single words. If multiple strings are used, a centroid is calculated and used.
moveAwayFrom{objects}no[UUID]Object IDs to move the results from. This is used to "bias" NLP search results into a certain direction in vector space.
moveAwayFrom{force}nofloatThe force to apply to a particular movement. Must be between 0 and 1 where 0 is equivalent to no movement and 1 is equivalent to largest movement possible.

Example Iโ€‹

This example shows a basic overview of using the nearText filter.

import weaviate

client = weaviate.Client("http://localhost:8080")

nearText = {
"concepts": ["fashion"],
"distance": 0.6, # prior to v1.14 use "certainty" instead of "distance"
"moveAwayFrom": {
"concepts": ["finance"],
"force": 0.45
},
"moveTo": {
"concepts": ["haute couture"],
"force": 0.85
}
}

result = (
client.query
.get("Publication", "name")
.with_additional(["certainty OR distance"]) # note that certainty is only supported if distance==cosine
.with_near_text(nearText)
.do()
)

print(result)

Example IIโ€‹

You can also bias results toward other data objects' vector representations. For example, in this query, we move our query about "travelling in asia", towards an article on food.

import weaviate

client = weaviate.Client("http://localhost:8080")

nearText = {
"concepts": ["travelling in Asia"],
"certainty": 0.7,
"moveTo": {
"objects": [{"id": "c4209549-7981-3699-9648-61a78c2124b9"}],
"force": 0.85
}
}

result = (
client.query
.get("Article", ["title", "summary", "_additional { certainty }"])
.with_near_text(nearText)
.do()
)

print(result)
Expected response
{
"data": {
"Get": {
"Article": [
{
"_additional": {
"certainty": 0.9619976580142975
},
"summary": "We've scoured the planet for what we think are 50 of the most delicious foods ever created. A Hong Kong best food, best enjoyed before cholesterol checks. When you have a best food as naturally delicious as these little fellas, keep it simple. Courtesy Matt@PEK/Creative Commons/FlickrThis best food Thai masterpiece teems with shrimp, mushrooms, tomatoes, lemongrass, galangal and kaffir lime leaves. It's a result of being born in a land where the world's most delicious food is sold on nearly every street corner.",
"title": "World food: 50 best dishes"
},
{
"_additional": {
"certainty": 0.9297388792037964
},
"summary": "The look reflects the elegant ambiance created by interior designer Joyce Wang in Hong Kong, while their mixology program also reflects the original venue. MONO Hong Kong , 5/F, 18 On Lan Street, Central, Hong KongKoral, The Apurva Kempinski Bali, IndonesiaKoral's signature dish: Tomatoes Bedugul. Esterre at Palace Hotel TokyoLegendary French chef Alain Ducasse has a global portfolio of restaurants, many holding Michelin stars. John Anthony/JW Marriott HanoiCantonese cuisine from Hong Kong is again on the menu, this time at the JW Marriott in Hanoi. Stanley takes its name from the elegant Hong Kong waterside district and the design touches reflect this legacy with Chinese antiques.",
"title": "20 best new Asia-Pacific restaurants to try in 2020"
}
...
]
}
}
}

Additional informationโ€‹

Distance metricsโ€‹

If the distance metric is cosine you can also use certainty instead of distance. Certainty normalizes the distance in a range of 0..1, where 0 represents a perfect opposite (cosine distance of 2) and 1 represents vectors with an identical angle (cosine distance of 0). Certainty is not available on non-cosine distance metrics.

Concept parsingโ€‹

Strings written in the concepts array are your fuzzy search terms. An array of concepts is required to set in the Explore query, and all words in this array should be present in the Contextionary.

There are three ways to define the concepts array argument in the filter.

  • ["New York Times"] = one vector position is determined based on the occurrences of the words
  • ["New", "York", "Times"] = all concepts have a similar weight.
  • ["New York", "Times"] = a combination of the two above.

A practical example would be: concepts: ["beatles", "John Lennon"]

Semantic Pathโ€‹

  • Only available in txt2vec-contextionary module

The semantic path returns an array of concepts from the query to the data object. This allows you to see which steps Weaviate took and how the query and data object are interpreted.

PropertyDescription
conceptthe concept that is found in this step.
distanceToNextthe distance to the next step (null for the last step).
distanceToPreviousthis distance to the previous step (null for the first step).
distanceToQuerythe distance of this step to the query.
distanceToResultthe distance of the step to this result.

Note: Building a semantic path is only possible if a nearText: {} filter is set as the explore term represents the beginning of the path and each search result represents the end of the path. Since nearText: {} queries are currently exclusively possible in GraphQL, the semanticPath is therefore not available in the REST API.

Example: showing a semantic path without edges.

import weaviate

client = weaviate.Client("http://localhost:8080")

near_text_filter = {
"concepts": ["fashion"],
"distance": 0.6, #prior to v1.14 use certainty: 0.7
"moveAwayFrom": {
"concepts": ["finance"],
"force": 0.45
},
"moveTo": {
"concepts": ["haute couture"],
"force": 0.85
}
}

additional_props = {
"semanticPath": "path {distanceToNext distanceToPrevious distanceToQuery distanceToResult}"
}

query_result = (
client.query
.get("Publication", "name")
.with_additional(additional_props)
.with_near_text(near_text_filter)
.do()
)

print(query_result)

askโ€‹

Enabled by the module: Question Answering.

This filter allows you to return answers to questions by running the results through a Q&A model.

Variablesโ€‹

VariablesRequiredTypeDescription
questionyesstringThe question to be answered.
certaintynofloatDesired minimal certainty or confidence of answer to the question. The higher the value, the stricter the search becomes. The lower the value, the fuzzier the search becomes. If no certainty is set, any answer that could be extracted will be returned.
propertiesno[string]The properties of the queries Class which contains text. If no properties are set, all are considered.
reranknobooleanIf enabled, the qna module will rerank the result based on the answer score. For example, if the 3rd result - as determined by the previous (semantic) search contained the most likely answer, result 3 will be pushed to position 1, etc. Supported since v1.10.0

Exampleโ€‹

import weaviate

client = weaviate.Client("http://localhost:8080")

ask = {
"question": "Who is the king of the Netherlands?",
"properties": ["summary"]
}

result = (
client.query
.get("Article", ["title", "_additional {answer {hasAnswer certainty property result startPosition endPosition} }"])
.with_ask(ask)
.with_limit(1)
.do()
)

print(result)

GraphQL responseโ€‹

The _additional{} property is extended with the answer and a certainty of the answer.

More Resourcesโ€‹

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For more involved discussion: Weaviate Community Forum. Or,
  5. We also have a Slack channel.