Filters
Overview
Available operators
So far, you've seen different query functions such as Get
, and Aggregate
, and search operators such as nearVector
, nearObject
and nearText
.
Now, let's take a look at filters.
A filter is a way to specify additional criteria to be applied to the results. There are a number of available filters in Weaviate.
Available filters
There exist many available filters, but we do not need to cover them all at this moment. For now, let's explore a few of the most commonly used filters:
where
: Apply a Boolean condition to filter the available data.limit
: Restrict the maximum objects to be retrieved.offset
: For pagination of search results.
Filter data with where
The where
filter is analogous to the WHERE
clause in a SQL query. As in the SQL clause, the where
filter can be used to apply a boolean conditional to the data.
Single operand example
We ran an example query like this earlier:
- Python
- GraphQL
response = client.query.get(
"JeopardyQuestion",
["question", "answer"]
).with_limit(2).with_near_text(
{"concepts": "Intergalactic travel"}
).with_additional(
["distance", "id"]
).do()
print(json.dumps(response, indent=2))
{
Get {
JeopardyQuestion (
limit: 2
nearText: {
concepts: ["Intergalactic travel"],
}
) {
question
answer
_additional {
distance
id
}
}
}
}
Which returned these answers:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.1791926,
"id": "b1645a32-0c22-5814-8f35-58f142eadf7e"
},
"answer": "escaping the Earth's gravity (and go off into outer space, on your way to the moon, for instance)",
"question": "It takes approximately 24,840 MPH to achieve this"
},
{
"_additional": {
"distance": 0.18123823,
"id": "ef263438-b152-5540-97f7-99f4076bd124"
},
"answer": "the Milky Way",
"question": "This is the name of our own galaxy"
}
]
}
}
}
So let's extend our query to now include a where
argument that uses a Like
operator.
- Python
- GraphQL
response = client.query.get(
"JeopardyQuestion",
["question", "answer"]
).with_limit(2).with_near_text(
{"concepts": "Intergalactic travel"}
).with_additional(
["distance", "id"]
).with_where({
"path": ["question"],
"operator": "Like",
"valueText": "*rocket*"
}).do()
print(json.dumps(response, indent=2))
{
Get {
JeopardyQuestion (
limit: 2
nearText: {
concepts: ["Intergalactic travel"],
}
where: {
path: ["question"],
operator: Like,
valueText: "*rocket*"
}
) {
question
answer
_additional {
distance
id
}
}
}
}
Can you guess how you would expect the earlier response to change, if at all?
Here is the actual response:
See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.18400955,
"id": "ddcc3f06-5410-5944-85c4-3cb56ab27088"
},
"answer": "space shuttles",
"question": "These transports, first sent up in 1981, lift off like a rocket & land like a plane"
},
{
"_additional": {
"distance": 0.2267003,
"id": "36ffe6ca-9b73-5a54-80eb-a93f01822956"
},
"answer": "Robert Goddard",
"question": "He's been called the \"Father of Modern Rocketry\""
}
]
}
}
}
Explain this query
Observe that the results have changed. The previous results have been removed as they do not contain the text rocket
in the question
property.
This approach of combining a vector search with a filter is a powerful way to find objects that are similar to a given input, but also meet additional criteria as you see. And while filtering may remove some objects which might otherwise be "closer" to the query vector than the remaining ones, it provides a powerful strategy to find the most relevant objects by removing false positive.
We can apply the query to filter the data in any number of ways. For example, consider this query:
- Python
- GraphQL
response = client.query.get(
"JeopardyQuestion",
["question", "answer", "points"]
).with_limit(2).with_near_text(
{"concepts": "Intergalactic travel"}
).with_additional(
["distance", "id"]
).with_where({
"path": ["points"],
"operator": "GreaterThan",
"valueInt": 200
}).do()
print(json.dumps(response, indent=2))
{
Get {
JeopardyQuestion (
limit: 2
nearText: {
concepts: ["Intergalactic travel"],
}
where: {
path: ["points"],
operator: GreaterThan,
valueInt: 200
}
) {
question
answer
points
_additional {
distance
id
}
}
}
}
How do you expect that this query will be different to the earlier queries?
See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.18251508,
"id": "15f06117-012c-506d-b5c5-24df2e750f35"
},
"answer": "the Milky Way",
"points": 400,
"question": "Into the 20th Century it was thought the universe was one big galaxy--this one"
},
{
"_additional": {
"distance": 0.19289112,
"id": "584a6c68-0ebe-561f-b32a-3a735eadf02e"
},
"answer": "Asteroid",
"points": 400,
"question": "A 1991 photo of Gaspra taken by the Galileo probe was the first close-up of one of these minor planets"
}
]
}
}
}
Explain this query
This query has been modified to only return JeopardyQuestion
objects with a points
value greater than 200.
Accordingly, the returned data set is very different.
Try filtering for JeopardyQuestion
objects with:
- a
points
value equal to 200 - a
points
value greater than or equal to 600
You can find the list of available operators on this page.
Multiple operands example
The query syntax can extend to beyond a single operand to take advantage of multiple conditions:
- Python
- GraphQL
response = client.query.get(
"JeopardyQuestion",
["question", "answer", "points"]
).with_limit(2).with_near_text(
{"concepts": "Intergalactic travel"}
).with_additional(
["distance", "id"]
).with_where({
"operator": "And",
"operands": [
{
"path": ["question"],
"operator": "Like",
"valueText": "*rocket*"
},
{
"path": ["points"],
"operator": "GreaterThan",
"valueInt": 200
}
]
}).do()
print(json.dumps(response, indent=2))
{
Get {
JeopardyQuestion (
limit: 2
nearText: {
concepts: ["Intergalactic travel"],
}
where: {
operator: And,
operands: [
{
path: ["question"],
operator: Like,
valueText: "*rocket*"
}
{
path: ["points"],
operator: GreaterThan,
valueInt: 400
},
]
}
) {
question
answer
points
_additional {
distance
id
}
}
}
}
Take a look at the where
argument (i.e. .with_where
). What limitations do you expect in the results?
See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.2267003,
"id": "a488fbe5-c2c6-50ad-8938-4b9f20dc56d1"
},
"answer": "Robert Goddard",
"points": 400,
"question": "He's been called the \"Father of Modern Rocketry\""
},
{
"_additional": {
"distance": 0.24946856,
"id": "c00decd4-4cf1-5b03-a789-a57077e082fb"
},
"answer": "Huntsville",
"points": 1000,
"question": "A campus for the University of Alabama is here, nicknamed \"Rocket City, U.S.A.\""
}
]
}
}
}
Explain this query
This query has been modified to only return JeopardyQuestion
objects with a points
value great than than 400, AND include the text rocket
in the question
field.
You can apply these filters to an Aggregate
query also. Try it yourself.
Try these:
- adding a
where
filter to anAggregation
query, following the above pattern.
Result pagination with offset
When you query for data, you can use the offset
operator to skip a number of results. This is useful for pagination, where you want to show a certain number of results per page.
The offset
operator works in conjunction with the existing limit
operator to shows results from the offset+1
to offset+1+limit
.
For example, to list the first ten results, set limit
: 10. Then, to "display the second page of 10", set offset
: 10, limit
:10 and so on.
The syntax, using offset
is as follows:
- Python
- GraphQL
response = client.query.get(
"JeopardyQuestion",
["question", "answer"]
).with_limit(2).with_offset(2).with_near_text(
{"concepts": "Intergalactic travel"}
).do()
print(json.dumps(response, indent=2))
{
Get {
JeopardyQuestion (
limit: 2,
offset: 2,
nearText: {
concepts: ["Intergalactic travel"],
}
) {
question
answer
}
}
}
See the JSON response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "the Milky Way",
"question": "Into the 20th Century it was thought the universe was one big galaxy--this one"
},
{
"answer": "space shuttles",
"question": "These transports, first sent up in 1981, lift off like a rocket & land like a plane"
}
]
}
}
}
Explain this query
This query retrieves the next 2 results (limit
: 2) after the first 2 results (offset
: 2).
We can confirm this by comparing the results of two queries with different result limits. The query below retrieves the top 4 results. The last two results from that query are the same as the result in the query that uses limit with pagination.
{
Get {
JeopardyQuestion(limit: 4) {
answer
question
}
}
}
- Python
- GraphQL
response = client.query.get(
"JeopardyQuestion",
["question", "answer"]
).with_limit(4).with_near_text(
{"concepts": "Intergalactic travel"}
).do()
print(json.dumps(response, indent=2))
{
Get {
JeopardyQuestion (
limit: 4
nearText: {
concepts: ["Intergalactic travel"],
}
) {
question
answer
}
}
}
So, the n
th page would have offset
: n*m
, limit
: m
, where m
is the number of results per page.
The offset
operator is available with all vector search functions including Get
and Aggregate
.
Review
Key takeaways
- Filters are used to apply additional criteria to the results. Some commonly used filters include
where
,limit
andoffset
. - The
where
filter allows you to apply a boolean condition to the data being queried. It can be used with various operators likeLike
,Greater
,Equal
, etc. - You can use multiple conditions within a
where
filter to further refine your query results. - The
offset
operator can be used in conjunction withlimit
to skip results and build pagination.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.