Skip to main content

Filters

Overview


Available operators

So far, you've seen different query functions such as Get, and Aggregate, and search operators such as nearVector, nearObject and nearText.

Now, let's take a look at filters.

A filter is a way to specify additional criteria to be applied to the results. There are a number of available filters in Weaviate.

Available filters

There exist many available filters, but we do not need to cover them all at this moment. For now, let's explore a few of the most commonly used filters:

  • where: Apply a Boolean condition to filter the available data.
  • limit: Restrict the maximum objects to be retrieved.
  • offset: For pagination of search results.

Filter data with where

The where filter is analogous to the WHERE clause in a SQL query. As in the SQL clause, the where filter can be used to apply a boolean conditional to the data.

Single operand example

We ran an example query like this earlier:

response = client.query.get(
"JeopardyQuestion",
["question", "answer"]
).with_limit(2).with_near_text(
{"concepts": "Intergalactic travel"}
).with_additional(
["distance", "id"]
).do()

print(json.dumps(response, indent=2))

Which returned these answers:

{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.1791926,
"id": "b1645a32-0c22-5814-8f35-58f142eadf7e"
},
"answer": "escaping the Earth's gravity (and go off into outer space, on your way to the moon, for instance)",
"question": "It takes approximately 24,840 MPH to achieve this"
},
{
"_additional": {
"distance": 0.18123823,
"id": "ef263438-b152-5540-97f7-99f4076bd124"
},
"answer": "the Milky Way",
"question": "This is the name of our own galaxy"
}
]
}
}
}

So let's extend our query to now include a where argument that uses a Like operator.

response = client.query.get(
"JeopardyQuestion",
["question", "answer"]
).with_limit(2).with_near_text(
{"concepts": "Intergalactic travel"}
).with_additional(
["distance", "id"]
).with_where({
"path": ["question"],
"operator": "Like",
"valueText": "*rocket*"
}).do()

print(json.dumps(response, indent=2))

Can you guess how you would expect the earlier response to change, if at all?

Here is the actual response:

See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.18400955,
"id": "ddcc3f06-5410-5944-85c4-3cb56ab27088"
},
"answer": "space shuttles",
"question": "These transports, first sent up in 1981, lift off like a rocket & land like a plane"
},
{
"_additional": {
"distance": 0.2267003,
"id": "36ffe6ca-9b73-5a54-80eb-a93f01822956"
},
"answer": "Robert Goddard",
"question": "He's been called the \"Father of Modern Rocketry\""
}
]
}
}
}
Explain this query

Observe that the results have changed. The previous results have been removed as they do not contain the text rocket in the question property.

This approach of combining a vector search with a filter is a powerful way to find objects that are similar to a given input, but also meet additional criteria as you see. And while filtering may remove some objects which might otherwise be "closer" to the query vector than the remaining ones, it provides a powerful strategy to find the most relevant objects by removing false positive.

We can apply the query to filter the data in any number of ways. For example, consider this query:

response = client.query.get(
"JeopardyQuestion",
["question", "answer", "points"]
).with_limit(2).with_near_text(
{"concepts": "Intergalactic travel"}
).with_additional(
["distance", "id"]
).with_where({
"path": ["points"],
"operator": "GreaterThan",
"valueInt": 200
}).do()

print(json.dumps(response, indent=2))

How do you expect that this query will be different to the earlier queries?

See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.18251508,
"id": "15f06117-012c-506d-b5c5-24df2e750f35"
},
"answer": "the Milky Way",
"points": 400,
"question": "Into the 20th Century it was thought the universe was one big galaxy--this one"
},
{
"_additional": {
"distance": 0.19289112,
"id": "584a6c68-0ebe-561f-b32a-3a735eadf02e"
},
"answer": "Asteroid",
"points": 400,
"question": "A 1991 photo of Gaspra taken by the Galileo probe was the first close-up of one of these minor planets"
}
]
}
}
}
Explain this query

This query has been modified to only return JeopardyQuestion objects with a points value greater than 200.

Accordingly, the returned data set is very different.

Exercise

Try filtering for JeopardyQuestion objects with:

  • a points value equal to 200
  • a points value greater than or equal to 600

You can find the list of available operators on this page.

Multiple operands example

The query syntax can extend to beyond a single operand to take advantage of multiple conditions:

response = client.query.get(
"JeopardyQuestion",
["question", "answer", "points"]
).with_limit(2).with_near_text(
{"concepts": "Intergalactic travel"}
).with_additional(
["distance", "id"]
).with_where({
"operator": "And",
"operands": [
{
"path": ["question"],
"operator": "Like",
"valueText": "*rocket*"
},
{
"path": ["points"],
"operator": "GreaterThan",
"valueInt": 200
}
]
}).do()

print(json.dumps(response, indent=2))

Take a look at the where argument (i.e. .with_where). What limitations do you expect in the results?

See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.2267003,
"id": "a488fbe5-c2c6-50ad-8938-4b9f20dc56d1"
},
"answer": "Robert Goddard",
"points": 400,
"question": "He's been called the \"Father of Modern Rocketry\""
},
{
"_additional": {
"distance": 0.24946856,
"id": "c00decd4-4cf1-5b03-a789-a57077e082fb"
},
"answer": "Huntsville",
"points": 1000,
"question": "A campus for the University of Alabama is here, nicknamed \"Rocket City, U.S.A.\""
}
]
}
}
}
Explain this query

This query has been modified to only return JeopardyQuestion objects with a points value great than than 400, AND include the text rocket in the question field.

You can apply these filters to an Aggregate query also. Try it yourself.

Exercise

Try these:

  • adding a where filter to an Aggregation query, following the above pattern.

Result pagination with offset

When you query for data, you can use the offset operator to skip a number of results. This is useful for pagination, where you want to show a certain number of results per page.

The offset operator works in conjunction with the existing limit operator to shows results from the offset+1 to offset+1+limit.

For example, to list the first ten results, set limit: 10. Then, to "display the second page of 10", set offset: 10, limit:10 and so on.

The syntax, using offset is as follows:

response = client.query.get(
"JeopardyQuestion",
["question", "answer"]
).with_limit(2).with_offset(2).with_near_text(
{"concepts": "Intergalactic travel"}
).do()

print(json.dumps(response, indent=2))
See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "the Milky Way",
"question": "Into the 20th Century it was thought the universe was one big galaxy--this one"
},
{
"answer": "space shuttles",
"question": "These transports, first sent up in 1981, lift off like a rocket & land like a plane"
}
]
}
}
}
Explain this query

This query retrieves the next 2 results (limit: 2) after the first 2 results (offset: 2).

We can confirm this by comparing the results of two queries with different result limits. The query below retrieves the top 4 results. The last two results from that query are the same as the result in the query that uses limit with pagination.

{
Get {
JeopardyQuestion(limit: 4) {
answer
question
}
}
}
response = client.query.get(
"JeopardyQuestion",
["question", "answer"]
).with_limit(4).with_near_text(
{"concepts": "Intergalactic travel"}
).do()

print(json.dumps(response, indent=2))
tip

So, the n th page would have offset: n*m, limit: m, where m is the number of results per page.

The offset operator is available with all vector search functions including Get and Aggregate.

Review

  Question
Which filter is used to apply a boolean condition to the data in Weaviate?
  Question
How can you combine the offset and limit operators to display the second page of results with 10 results per page?

Key takeaways

  • Filters are used to apply additional criteria to the results. Some commonly used filters include where, limit and offset.
  • The where filter allows you to apply a boolean condition to the data being queried. It can be used with various operators like Like, Greater, Equal, etc.
  • You can use multiple conditions within a where filter to further refine your query results.
  • The offset operator can be used in conjunction with limit to skip results and build pagination.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.