Skip to main content

Additional operators

Syntax

Functions such as limit, autocut, and sort modify queries at the class level.

Limit argument

The limit argument restricts the number of results. These functions support limit:

  • Get
  • Explore
  • Aggregate
import os
import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Sort

client = weaviate.connect_to_local()

try:
articles = client.collections.get("Article")
response = articles.query.fetch_objects(
limit=5
)

for o in response.objects:
print(f"Answer: {o.properties['title']}")

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"title": "Backs on the rack - Vast sums are wasted on treatments for back pain that make it worse"
},
{
"title": "Graham calls for swift end to impeachment trial, warns Dems against calling witnesses"
},
{
"title": "Through a cloud, brightly - Obituary: Paul Volcker died on December 8th"
},
{
"title": "Google Stadia Reviewed \u2013 Against The Stream"
},
{
"title": "Managing Supply Chain Risk"
}
]
}
}
}

Pagination with offset

To return sets of results, "pages", use offset and limit together to specify a sub-set of the query response.

For example, to list the first ten results, set limit: 10 and offset: 0. To display the next ten results, set offset: 10. To continue iterating over the results, increase the offset again. For more details, see performance considerations

The Get and Explore functions support offset.

import os
import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Sort

client = weaviate.connect_to_local()

try:
articles = client.collections.get("Article")
response = articles.query.fetch_objects(
limit=5,
offset=2
)

for o in response.objects:
print(f"Answer: {o.properties['title']}")

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"title": "Through a cloud, brightly - Obituary: Paul Volcker died on December 8th"
},
{
"title": "Google Stadia Reviewed \u2013 Against The Stream"
},
{
"title": "Managing Supply Chain Risk"
},
{
"title": "Playing College Football In Madden"
},
{
"title": "The 50 best albums of 2019, No 3: Billie Eilish \u2013 When We All Fall Asleep, Where Do We Go?"
}
]
}
}
}

Performance considerations

Pagination is not a cursor-based implementation. This has the following implications:

  • Response time and system load increase as the number of pages grows. As the offset grows, each additional page request requires a new, larger call against your collection. For example, if your offset and limit specify results from 21-30, Weaviate retrieves 30 objects and drops the first 20. On the next call, Weaviate retrieves 40 objects and drops the first 30.
  • Resource requirements are amplified in multi-shard configurations. Each shard retrieves a full list of objects. Each shard also drops the objects before the offset. If you have 10 shards configured and ask for results 91-100, Weaviate retrieves 1000 objects (100 per shard) and drops 990 of them.
  • The number of objects you can retrieve is limited. A single query returns up to QUERY_MAXIMUM_RESULTS. If the sum of offset and limit exceeds QUERY_MAXIMUM_RESULTS, Weaviate returns an error. To change the limit, edit the QUERY_MAXIMUM_RESULTS environment variable. If you increase QUERY_MAXIMUM_RESULTS, use the lowest value possible to avoid performance problems.
  • Pagination is not stateful. If the database state changes between calls, your pages might miss results. An insertion or a deletion will change the object count. An update could change object order. However, if there are no writes the overall results set is the same if you retrieve a large single page or many smaller ones.

Autocut

Added in v1.20

The autocut function limits results based on discontinuities in the result set. Specifically, autocut looks for discontinuities, or jumps, in result metrics such as vector distance or search score.

To use autocut, specify how many jumps there should be in your query. The query stops returning results after the specified number of jumps.

For example, consider a nearText search that returns objects with these distance values:

[0.1899, 0.1901, 0.191, 0.21, 0.215, 0.23].

Autocut returns the following:

  • autocut: 1: [0.1899, 0.1901, 0.191]
  • autocut: 2: [0.1899, 0.1901, 0.191, 0.21, 0.215]
  • autocut: 3: [0.1899, 0.1901, 0.191, 0.21, 0.215, 0.23]

Autocut works with these functions:

  • nearXXX
  • bm25
  • hybrid

To use autocut with the hybrid search, specify the relativeScoreFusion ranking method.

Autocut is disabled by default. To explicitly disable autocut, set the number of jumps to 0 or a negative value.

If autocut is combined with the limit filter, autocut only considers the first objects returned up to the value of limit.

Sample client code:

    import weaviate.classes as wvc

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.near_text(
query="animals in movies",
auto_limit=1, # number of close groups
return_metadata=wvc.query.MetadataQuery(distance=True)
)

for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Example response

The output is like this:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.17591828
},
"answer": "meerkats",
"question": "Group of mammals seen <a href=\"http://www.j-archive.com/media/1998-06-01_J_28.jpg\" target=\"_blank\">here</a>: [like Timon in <i>The Lion King</i>]"
},
{
"_additional": {
"distance": 0.17837524
},
"answer": "dogs",
"question": "Scooby-Doo, Goofy & Pluto are cartoon versions"
},
{
"_additional": {
"distance": 0.18658042
},
"answer": "The Call of the Wild Thornberrys",
"question": "Jack London story about the dog Buck who joins a Nick cartoon about Eliza, who can talk to animals"
},
{
"_additional": {
"distance": 0.18755406
},
"answer": "fox",
"question": "In titles, animal associated with both Volpone and Reynard"
},
{
"_additional": {
"distance": 0.18817466
},
"answer": "Lion Tamers/Wild Animal Trainers",
"question": "Mabel Stark, Clyde Beatty & Gunther Gebel-Williams"
},
{
"_additional": {
"distance": 0.19061792
},
"answer": "a fox",
"question": "\"Sly\" creature sought by sportsmen riding to hounds"
},
{
"_additional": {
"distance": 0.191764
},
"answer": "a lion",
"question": "The animal featured both in Rousseau's \"The Sleeping Gypsy\" & \"The Dream\""
}
]
}
}
}

For more client code examples for each functional category, see these pages:

Cursor with after

Starting with version v1.18, you can use after to retrieve objects sequentially. For example, you can use after to retrieve a complete set of objects from a collection.

after creates a cursor that is compatible with single shard and multi-shard configurations.

The after function relies on object ids, and thus it only works with list queries. after is not compatible with where, near<Media>, bm25, hybrid, or similar searches, or in combination with filters. For those use cases, use pagination with offset and limit.

import os
import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Sort

client = weaviate.connect_to_local()

try:
articles = client.collections.get("Article")
response = articles.query.fetch_objects(
limit=5,
after="002d5cb3-298b-380d-addb-2e026b76c8ed"
)

for o in response.objects:
print(f"Answer: {o.properties['title']}")

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"_additional": {
"id": "00313a4c-4308-30b0-af4a-01773ad1752b"
},
"title": "Managing Supply Chain Risk"
},
{
"_additional": {
"id": "0042b9d0-20e4-334e-8f42-f297c150e8df"
},
"title": "Playing College Football In Madden"
},
{
"_additional": {
"id": "0047c049-cdd6-3f6e-bb89-84ae20b74f49"
},
"title": "The 50 best albums of 2019, No 3: Billie Eilish \u2013 When We All Fall Asleep, Where Do We Go?"
},
{
"_additional": {
"id": "00582185-cbf4-3cd6-8c59-c2d6ec979282"
},
"title": "How artificial intelligence is transforming the global battle against human trafficking"
},
{
"_additional": {
"id": "0061592e-b776-33f9-8109-88a5bd41df78"
},
"title": "Masculine, feminist or neutral? The language battle that has split Spain"
}
]
}
}
}

Sorting

info

Added in v1.13.0.

You can sort results by any primitive property, such as text, number, or int. When query results, for example, near<Media> vector search results, have a natural order, sort functions override that order.

Sorting considerations

Weaviate's sorting implementation does not lead to massive memory spikes. Weaviate does not load all object properties into memory; only the property values being sorted are kept in memory.

Weaviate does not use any sorting-specific data structures on disk. When objects are sorted, Weaviate identifies the object and extracts the relevant properties. This works reasonably well for small scales (100s of thousand or millions of objects). It is expensive if you sort large lists of objects (100s of millions, billions). In the future, Weaviate may add a column-oriented storage mechanism to overcome this performance limitation.

Sort order

boolean values

false is considered smaller than true. false comes before true in ascending order and after true in descending order.

null values

null values are considered smaller than any non-null values. null values come first in ascending order and last in descending order.

arrays

Arrays are compared by each element separately. Elements at the same position are compared to each other, starting from the beginning of an array. When Weaviate finds an array element in one array that is smaller than its counterpart in the second array, Weaviate considers the whole first array to be smaller than the second one.

Arrays are equal if they have the same length and all elements are equal. If one array is subset of another array it is considered smaller.

Examples:

  • [1, 2, 3] = [1, 2, 3]
  • [1, 2, 4] < [1, 3, 4]
  • [2, 2] > [1, 2, 3, 4]
  • [1, 2, 3] < [1, 2, 3, 4]

Sorting API

Sorting can be performed by one or more properties. If the values for the first property are identical, Weaviate uses the second property to determine the order, and so on.

The sort function takes either an object, or an array of objects, that describe a property and a sort order.

ParameterRequiredTypeDescription
pathyestextThe path to the sort field is an single element array that contains the field name. GraphQL supports specifying the field name directly.
ordervaries by clientasc or descThe sort order, ascending (default) or descending.
import os
import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Sort

client = weaviate.connect_to_local()

try:
article=client.collections.get("JeopardyQuestion")
response = article.query.fetch_objects(
sort=Sort.by_property(name="answer", ascending=True),
limit=3
)

for o in response.objects:
print(f"Answer: {o.properties['answer']}")
print(f"Points: {o.properties['points']}")
print(f"Question: {o.properties['question']}")

finally:
client.close()
Expected response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "$5 (Lincoln Memorial in the background)",
"points": 600,
"question": "A sculpture by Daniel Chester French can be seen if you look carefully on the back of this current U.S. bill"
},
{
"answer": "(1 of 2) Juneau, Alaska or Augusta, Maine",
"points": 0,
"question": "1 of the 2 U.S. state capitals that begin with the names of months"
},
{
"answer": "(1 of 2) Juneau, Alaska or Honolulu, Hawaii",
"points": 0,
"question": "One of the 2 state capitals whose names end with the letter \"U\""
}
]
}
}
}

Sorting by multiple properties

To sort by more than one property, pass an array of { path, order } objects to the sort function:

import os
import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Sort

client = weaviate.connect_to_local()

try:
questions=client.collections.get("JeopardyQuestion")
response = questions.query.fetch_objects(
# Note: To sort by multiple properties, chain the relevant `by_xxx` methods.
sort=Sort.by_property(name="points", ascending=False).by_property(name="answer", ascending=True),
limit=3
)

for o in response.objects:
print(f"Answer: {o.properties['answer']}")
print(f"Points: {o.properties['points']}")
print(f"Question: {o.properties['question']}")

finally:
client.close()

Metadata properties

To sort with metadata, add an underscore to the property name.

Property NameSort Property Name
id_id
creationTimeUnix_creationTimeUnix
lastUpdateTimeUnix_lastUpdateTimeUnix
import os
import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Sort

client = weaviate.connect_to_local()

try:
article=client.collections.get("JeopardyQuestion")
response = article.query.fetch_objects(
return_metadata=wvc.query.MetadataQuery(creation_time=True),
sort=Sort.by_property(name="_creationTimeUnix", ascending=True),
limit=3
)

for o in response.objects:
print(f"Answer: {o.properties['answer']}")
print(f"Points: {o.properties['points']}")
print(f"Question: {o.properties['question']}")
print(f"Creation time: {o.metadata.creation_time}")

finally:
client.close()
Python client v4 property names
Property NameSort Property Name
uuid_id
creation_time_creationTimeUnix
last_update_time_lastUpdateTimeUnix

Grouping

You can use a group to combine similar concepts (also known as entity merging). There are two ways of grouping semantically similar objects together, closest and merge. To return the closest concept, set type: closest. To combine similar entities into a single string, set type: merge

Variables

VariableRequiredTypeDescription
typeyesstringEither closest or merge
forceyesfloatThe force to apply for a particular movements.
Must be between 0 and 1. 0 is no movement. 1 is maximum movement.

Example

import os
import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Sort

client = weaviate.connect_to_local()

try:
article=client.graphql_raw_query(
"""
{
Get {
Publication(
group:{
type: merge,
force:0.05
}
) {
name
}
}
}
"""
)

for a in article.get["Publication"]:
print(a)

finally:
client.close()

The query merges the results for International New York Times, The New York Times Company and New York Times.

The central concept in the group, The New York Times Company, leads the group. Related values follow in parentheses.

Expected response
{
"data": {
"Get": {
"Publication": [
{
"name": "Fox News"
},
{
"name": "Wired"
},
{
"name": "The New York Times Company (New York Times, International New York Times)"
},
{
"name": "Game Informer"
},
{
"name": "New Yorker"
},
{
"name": "Wall Street Journal"
},
{
"name": "Vogue"
},
{
"name": "The Economist"
},
{
"name": "Financial Times"
},
{
"name": "The Guardian"
},
{
"name": "CNN"
}
]
}
}
}