Skip to main content

Aggregate{} the result set

Overview


Now that you have seen how to retrieve individual objects with Get, let's take a look at how to compile information with Aggregate.

Aggregate is a powerful function that allows you to combine information from multiple objects into a single result to get a quick overview of the results.

About Aggregate queries

Aggregate function syntax

While the overall structure of Aggregate queries is similar to Get queries, there are some important differences as the queries relate to sets of results.

The basic syntax for Aggregate queries is as follows:

response = client.query.aggregate(
<CLASS>,
).with_fields(
<properties>
).<arguments>.do()

Unlike a Get query, available properties in Aggregate differ according to data types of the property being queried.

These reflect the possible operations that can be performed on different data types. For example, the available properties for a String property are different from those for an Integer property or a cross-reference.

Let's try out some Aggregate queries.

As a reminder, our objects include the following schema:

See relevant schema
{
"classes": [
{
"class": "JeopardyQuestion",
"properties": [
{
"dataType": ["text"],
"name": "question",
... // Truncated
},
{
"dataType": ["text"],
"name": "answer",
... // Truncated
},
{
"dataType": ["int"],
"name": "points"
... // Truncated
},
... // Truncated
],
... // Truncated
}
]
}

Standalone Aggregate queries

Example 1

Take a look at this query:

response = client.query.aggregate(
"JeopardyQuestion",
).with_meta_count().do()

print(json.dumps(response, indent=2))

What kind of results do you expect to come back?

Now, try it out yourself.

Your query should return something like this:

See the JSON response
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"meta": {
"count": 10000
}
}
]
}
}
}

Response object from Weaviate

This response includes an object to represent the meta information requested from JeopardyQuestion class. The meta object contains a count property, which is the total number of objects in the class.

Explain this query

This query aggregates the objects in the JeopardyQuestion class to obtain the total count. Since there are no restrictions, it returns the total number of objects which is 10,000.

meta property

In the above Aggregate query we requested a meta property, for the count of the objects. Note that this is not an available property of the object class itself. This is a key difference between Aggregate and Get queries.

A Get query retrieves a set of individual results, so we can select properties (e.g. id, or one of the properties unique to the data, such as answer) that apply to each of those individual results.

An Aggregate query, on the other hand, returns an aggregation of the results. Accordingly, we must specify a sub-property that applies to the entire set of results.

The meta property is one such property. It is available for all data types, and can be used with the count sub-property to return the number of retrieved objects.

Example 2

Take a look at this query:

response = client.query.aggregate(
"JeopardyQuestion"
).with_fields("answer {count topOccurrences {value occurs}}").do()

print(json.dumps(response, indent=2))

What fields do you expect back in the results?

Now, try it out yourself.

Your query should return something like this:

See the JSON response
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"answer": {
"count": 10000,
"topOccurrences": [
{
"occurs": 19,
"value": "Australia"
},
{
"occurs": 18,
"value": "Hawaii"
},
{
"occurs": 16,
"value": "Boston"
},
{
"occurs": 15,
"value": "French"
},
{
"occurs": 15,
"value": "India"
}
]
}
}
]
}
}
}
Explain this query

This response includes an object to represent aggregations from the answer property requested from JeopardyQuestion class. Because the property contains textual information, we can aggregate topOccurrences information, such as the value property, which is the token, as well as the number of times it occurs.

Available properties

The list of available properties can be found on this page in our documentation.

Aggregate with a search operator

As we did with Get queries, we can also use search operators such as nearText in an Aggregate query. Take a look:

Example (with nearText)

For example, let's say that now instead of individual questions, we would like to know something more holistic about the answers. Like how many questions might be related to this query:

response = client.query.aggregate(
"JeopardyQuestion",
).with_near_text(
{"concepts": ["Intergalactic travel"], "distance": 0.2}
).with_meta_count().do()

print(json.dumps(response, indent=2))

Before looking at the response, or running the query, think about the following:

  • How many results do you expect to be returned?
  • Can you guess how an increase in the distance parameter would change the number of results returned?

Now, try it out yourself. The query should return something like this:

See the JSON response
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"meta": {
"count": 9
}
}
]
}
}
}
Explain this query

This query aggregates the results that were restricted using the distance argument. This argument is a threshold that restricts the returned results to those that are relevant to the input. Without it, the search results would potentially include the entire class.

This is called "limiting the search space".

Limit search space

In order to produce meaningful aggregations with a vector search, you must limit the search space.

This is different from aggregations in, say, a relational database. In a relational database, grouping or aggregating data can be done using groupby with functions such as SUM, AVG, MIN, MAX, etc. This allows you to find a result set and then aggregate the results.

However, a vector search does not inherently exclude any results. This is because a vector search retrieves results based on degrees of similarity.

Accordingly, the search space must be limited so that only relevant results are included in the aggregation. This can be done by setting an explicit limit or a threshold (distance or certainty) in the query.

Aggregate with groupBy

So far, we have seen how to use Aggregate queries to compile information relating one set of results. This can be extended with the groupBy argument to compile information from multiple, subsets of results.

Example

For example, let's say we want to know how many questions there are for each available value property. We can do this by adding the groupBy argument to the query:

response = client.query.aggregate(
"JeopardyQuestion",
).with_group_by_filter(
"round"
).with_fields(
"groupedBy {path value}"
).with_near_text(
{"concepts": ["Intergalactic travel"], "distance": 0.2}
).with_meta_count().do()

print(json.dumps(response, indent=2))

What do you expect to see here? How will the results differ, now that we've added the groupBy argument? Do you notice what else has changed to the query?

Now, try it out yourself. The query should return something like this:

See the JSON response
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"groupedBy": {
"path": [
"round"
],
"value": "Double Jeopardy!"
},
"meta": {
"count": 5
}
},
{
"groupedBy": {
"path": [
"round"
],
"value": "Jeopardy!"
},
"meta": {
"count": 3
}
},
{
"groupedBy": {
"path": [
"round"
],
"value": "Final Jeopardy!"
},
"meta": {
"count": 1
}
}
]
}
}
}
Explain this query

This query supplies an additional groupedBy argument, as a result of which the counts are of each round. The query also requests groupedBy a property so that each count is identifiable by the value of each round group.

groupBy + groupedBy

Results identified by an Aggregate query can be further grouped by using a groupBy argument. This argument takes a list of properties as an argument, and will group the results by the values of those properties.

This is a particularly useful query pattern for identifying characteristics for subsets of results of a vector search.

When the groupBy argument is used, additional property groupedBy is made available. This property and its sub-properties can be used to identify the group that the result belongs to.

Exercise

Try out the above query again, with these changes.

  • Instead of round try grouping by the points property.
  • Instead of distance, try adding an .with_object_limit(9) in the method chain. Are the results the same?

Review

Review exercise

Try out the above nearText query again, with these changes.

  • Change the distance to another value - say, to 0.1, 0.19, 0.21 or 0.25 - how do the results change? Are they in line with your expectations?

Key takeaways

  • The Aggregate function is used to compile information from multiple objects, providing an overview.
  • Search operators, like nearText, can be used in Aggregate queries.
    • To produce meaningful aggregations, the search space must be limited by setting an explicit limit or a threshold (distance or certainty) in the query.
  • The groupBy argument can be used to compile information from multiple subsets of results, refining the aggregation.
  • When using the groupBy argument, the additional property groupedBy is made available, helping to identify the group that the result belongs to.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.