Skip to main content

Conditional filters

Overview

Conditional filters may be added to queries such as Object-level and Aggregate queries, as well as batch deletion. The operator used for filtering is also called a where filter.

A filter may consist of one or more conditions, which are combined using the And or Or operators. Each condition consists of a property path, an operator, and a value.

Single operand (condition)

Each set of algebraic conditions is called an "operand". For each operand, the required properties are:

  • The operator type,
  • The property path, and
  • The value as well as the value type.

For example, this filter will only allow objects from the class Article with a wordCount that is GreaterThan than 1000.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.query.fetch_objects(
filters=wvc.query.Filter.by_property("wordCount").greater_than(1000),
limit=5
)

for o in response.objects:
print(o.properties) # Inspect returned objects

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"title": "Anywhere but Washington: an eye-opening journey in a deeply divided nation"
},
{
"title": "The world is still struggling to implement meaningful climate policy"
},
...
]
}
}
}

Filter structure

The where filter is an algebraic object, which takes the following arguments:

  • Operator (which takes one of the following values)
    • And
    • Or
    • Equal
    • NotEqual
    • GreaterThan
    • GreaterThanEqual
    • LessThan
    • LessThanEqual
    • Like
    • WithinGeoRange
    • IsNull
    • ContainsAny (*Only for array and text properties)
    • ContainsAll (*Only for array and text properties)
  • Path: Is a list of strings in XPath style, indicating the property name of the collection.
    • If the property is a cross-reference, the path should be followed as a list of strings. For a inPublication reference property that refers to Publication collection, the path selector for name will be ["inPublication", "Publication", "name"].
  • valueType
    • valueInt: For int data type.
    • valueBoolean: For boolean data type.
    • valueString: For string data type (note: string has been deprecated).
    • valueText: For text, uuid, geoCoordinates, phoneNumber data types.
    • valueNumber: For number data type.
    • valueDate: For date (ISO 8601 timestamp, formatted as RFC3339) data type.

If the operator is And or Or, the operands are a list of where filters.

Example filter structure (GraphQL)
{
Get {
<Class>(where: {
operator: <operator>,
operands: [{
path: [path],
operator: <operator>
<valueType>: <value>
}, {
path: [<matchPath>],
operator: <operator>,
<valueType>: <value>
}]
}) {
<propertyWithBeacon> {
<property>
... on <ClassOfWhereBeaconGoesTo> {
<propertyOfClass>
}
}
}
}
}
Example response
{
"data": {
"Get": {
"Article": [
{
"title": "Opinion | John Lennon Told Them ‘Girls Don't Play Guitar.' He Was So Wrong."
}
]
}
},
"errors": null
}
Not operator

An operator to invert a filter (e.g. Not Like ... ) is not supported in Weaviate. If you would to see such an operator to be implemented, please let us know by upvoting the issue here.

Filter behaviors

Multi-word queries in Equal filters

The behavior for the Equal operator on multi-word textual properties in where filters depends on the tokenization of the property.

See the Schema property tokenization section for the difference between the available tokenization types.

Stopwords in text filters

Starting with v1.12.0 you can configure your own stopword lists for the inverted index.

Multiple operands

You can set multiple operands or nest conditions.

tip

You can filter datetimes similarly to numbers, with the valueDate given as string in RFC3339 format.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.query.fetch_objects(
filters=(wvc.query.Filter.by_property("wordCount").greater_than(1000) & wvc.query.Filter.by_property("title").like("*economy*")),
limit=5
)

for o in response.objects:
print(o.properties) # Inspect returned objects

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"title": "China\u2019s long-distance lorry drivers are unsung heroes of its economy"
},
{
"title": "\u2018It\u2019s as if there\u2019s no Covid\u2019: Nepal defies pandemic amid a broken economy"
},
{
"title": "A tax hike threatens the health of Japan\u2019s economy"
}
]
}
}
}

Filter operators

Like

The Like operator filters text data based on partial matches. It can be used with the following wildcard characters:

  • ? -> exactly one unknown character
    • car? matches cart, care, but not car
  • * -> zero, one or more unknown characters
    • car* matches car, care, carpet, etc
    • *car* matches car, healthcare, etc.
import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.query.fetch_objects(
filters=wvc.query.Filter.by_property("title").like("New *"),
limit=5
)

for o in response.objects:
print(o.properties) # Inspect returned objects

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Publication": [
{
"name": "The New York Times Company"
},
{
"name": "International New York Times"
},
{
"name": "New York Times"
},
{
"name": "New Yorker"
}
]
}
}
}

Performance of Like

Each Like filter iterates over the entire inverted index for that property. The search time will go up linearly with the dataset size, and may become slow for large datasets.

Wildcard literal matches with Like

Currently, the Like filter is not able to match wildcard characters (? and *) as literal characters. For example, it is currently not possible to only match the string car* and not car, care or carpet. This is a known limitation and may be addressed in future versions of Weaviate.

ContainsAny / ContainsAll

The ContainsAny and ContainsAll operators filter objects using values of an array as criteria.

Both operators expect an array of values and return objects that match based on the input values.

ContainsAny and ContainsAll notes:
  • The ContainsAny and ContainsAll operators treat texts as an array. The text is split into an array of tokens based on the chosen tokenization scheme, and the search is performed on that array.
  • When using ContainsAny or ContainsAll with the REST api for batch deletion, the text array must be specified with the valueTextArray argument. This is different from the usage in search, where the valueText argument that can be used.

ContainsAny

ContainsAny returns objects where at least one of the values from the input array is present.

Consider a dataset of Person, where each object represents a person with a languages_spoken property with a text datatype.

A ContainsAny query on a path of ["languages_spoken"] with a value of ["Chinese", "French", "English"] will return objects where at least one of those languages is present in the languages_spoken array.

ContainsAll

ContainsAll returns objects where all the values from the input array are present.

Using the same dataset of Person objects as above, a ContainsAll query on a path of ["languages_spoken"] with a value of ["Chinese", "French", "English"] will return objects where all three of those languages are present in the languages_spoken array.

Filter performance

In some edge cases, filter performance may be slow due to a mismatch between the filter architecture and the data structure. For example, if a property has very large cardinality (i.e. a large number of unique values), its range-based filter performance may be slow.

If you are experiencing slow filter performance, we suggest further restricting your query by adding more conditions to the where operator, or adding a limit parameter to your query.

We are working on improving the performance of these filters in a future release. Please upvote this feature if this is important to you, so we can prioritize it accordingly.

Special cases

By id

You can filter object by their unique id or uuid, where you give the id as valueText.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.query.fetch_objects(
filters=wvc.query.Filter.by_id().equal("00037775-1432-35e5-bc59-443baaef7d80")
)

for o in response.objects:
print(o.properties) # Inspect returned objects

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"title": "Backs on the rack - Vast sums are wasted on treatments for back pain that make it worse"
}
]
}
}
}

By timestamps

Filtering can be performed with internal timestamps as well, such as creationTimeUnix and lastUpdateTimeUnix. These values can be represented either as Unix epoch milliseconds, or as RFC3339 formatted datetimes. Note that epoch milliseconds should be passed in as a valueText, and an RFC3339 datetime should be a valueDate.

info

Filtering by timestamp requires the target class to be configured to index timestamps. See here for details.

import weaviate
import weaviate.classes as wvc
import os
from datetime import datetime

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
year2k = datetime.strptime("2000-01-01T00:00:00Z", "%Y-%m-%dT%H:%M:%SZ")

response = collection.query.fetch_objects(
filters=wvc.query.Filter.by_creation_time().greater_or_equal(year2k),
return_metadata=wvc.query.MetadataQuery(creation_time=True),
limit=2
)

for o in response.objects:
print(o.properties) # Inspect returned objects
print(o.metadata) # Inspect returned creation time

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"title": "Army builds new body armor 14-times stronger in the face of enemy fire"
},
...
]
}
}
}

By property length

Filtering can be performed with the length of properties.

The length of properties is calculated differently depending on the type:

  • array types: the number of entries in the array is used, where null (property not present) and empty arrays both have the length 0.
  • strings and texts: the number of characters (unicode characters such as 世 count as one character).
  • numbers, booleans, geo-coordinates, phone-numbers and data-blobs are not supported.
{
Get {
<Class>(
where: {
operator: <Operator>,
valueInt: <value>,
path: ["len(<property>)"]
}
)
}
}

Supported operators are (not) equal and greater/less than (equal) and values need to be 0 or larger.

Note that the path value is a string, where the property name is wrapped in len(). For example, to filter for objects based on the length of the title property, you would use path: ["len(title)"].

To filter for Article class objects with title length greater than 10, you would use:

{
Get {
Article(
where: {
operator: GreaterThan,
valueInt: 10,
path: ["len(title)"]
}
)
}
}
note

Filtering by property length requires the target class to be configured to index the length.

By cross-references

You can also search for the value of the property of a cross-references, also called beacons.

For example, these filters select based on the class Article but who have inPublication set to New Yorker.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")

response = collection.query.fetch_objects(
filters=wvc.query.Filter.by_ref(link_on="inPublication").by_property("name").like("*New*"),
return_references=wvc.query.QueryReference(link_on="inPublication", return_properties=["name"]),
limit=2
)

for o in response.objects:
print(o.properties) # Inspect returned objects
for ref_o in o.references["inPublication"].objects:
print(ref_o.properties)

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"inPublication": [
{
"name": "New Yorker"
}
],
"title": "The Hidden Costs of Automated Thinking"
},
{
"inPublication": [
{
"name": "New Yorker"
}
],
"title": "The Real Deal Behind the U.S.\u2013Iran Prisoner Swap"
},
...
]
}
}
}

By count of reference

Above example shows how filter by reference can solve straightforward questions like "Find all articles that are published by New Yorker". But questions like "Find all articles that are written by authors that wrote at least two articles", cannot be answered by the above query structure. It is however possible to filter by reference count. To do so, simply provide one of the existing compare operators (Equal, LessThan, LessThanEqual, GreaterThan, GreaterThanEqual) and use it directly on the reference element. For example:

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
response = collection.query.fetch_objects(
filters=wvc.query.Filter.by_ref_count(link_on="inPublication").greater_than(2),
return_references=wvc.query.QueryReference(link_on="inPublication", return_properties=["name"]),
limit=2
)

for o in response.objects:
print(o.properties) # Inspect returned objects
for ref_o in o.references["inPublication"].objects:
print(ref_o.properties)

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Author": [
{
"name": "Agam Shah",
"writesFor": [
{
"name": "Wall Street Journal"
},
{
"name": "Wall Street Journal"
}
]
},
{
"name": "Costas Paris",
"writesFor": [
{
"name": "Wall Street Journal"
},
{
"name": "Wall Street Journal"
}
]
},
...
]
}
}
}

By geo coordinates

A special case of the Where filter is with geoCoordinates. This filter is only supported by the Get{} function. If you've set the geoCoordinates property type, you can search in an area based on kilometers.

For example, this curious returns all in a radius of 2KM around a specific geo-location:

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
response = publications.query.fetch_objects(
filters=(
wvc.query.Filter
.by_property("headquartersGeoLocation")
.within_geo_range(
coordinate=wvc.data.GeoCoordinate(
latitude=33.7579,
longitude=84.3948
),
distance=10000 # In meters
)
),
)

for o in response.objects:
print(o.properties) # Inspect returned objects

finally:
client.close()
Expected response
{
"data": {
"Get": {
"Publication": [
{
"headquartersGeoLocation": {
"latitude": 51.512737,
"longitude": -0.0962234
},
"name": "Financial Times"
},
{
"headquartersGeoLocation": {
"latitude": 51.512737,
"longitude": -0.0962234
},
"name": "International New York Times"
}
]
}
}
}

Note that geoCoordinates uses a vector index under the hood.

By null state

Using the IsNull operator allows you to do filter for objects where given properties are null or not null. Note that zero-length arrays and empty strings are equivalent to a null value.

{
Get {
<Class>(where: {
operator: IsNull,
valueBoolean: <true/false>
path: [<property>]
}
}
note

Filtering by null-state requires the target class to be configured to index this. See here for details.