Skip to main content

Aggregate data

Overviewโ€‹

This section shows how to retrieve aggregate data from a results set using the Aggregate function. Aggregate is largely similar to Get, with the difference being that Aggregate returns summary data about the results set instead of individual objects in the results set.

Aggregate function requirementsโ€‹

To use Aggregate, you must specify at least:

  • The target class to search, and
  • One or more aggregated properties. The aggregated properties can include:
    • The meta property,
    • An object property, OR
    • The groupedBy property (if using groupBy).

You must then select at least one sub-property for each selected property.

See the Aggregate function syntax page for details.

Retrieve a meta propertyโ€‹

The meta property has only one sub-property (count) available. This returns the count of objects matched by the query.

response = (
client.query
.aggregate("JeopardyQuestion")
.with_meta_count()
.do()
)

print(json.dumps(response, indent=2))
Example response

The query should produce a response like the one below:

{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"meta": {
"count": 10000
}
}
]
}
}
}

Retrieve aggregated object propertiesโ€‹

You can retrieve aggregations of text, number, int, or boolean data types.

The available sub-types vary for each data type, except for type which is available to all, and count which is available to all but cross-references.

Example with textโ€‹

The following example retrieves information about the most commonly occurring examples in the question property:

response = (
client.query
.aggregate("JeopardyQuestion")
.with_fields("answer { count type topOccurrences { occurs value } }")
.do()
)

print(json.dumps(response, indent=2))
Example response

The query should produce a response like the one below:

{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"answer": {
"count": 10000,
"topOccurrences": [
{
"occurs": 19,
"value": "Australia"
},
{
"occurs": 18,
"value": "Hawaii"
},
{
"occurs": 16,
"value": "Boston"
},
{
"occurs": 15,
"value": "French"
},
{
"occurs": 15,
"value": "India"
}
],
"type": "text"
}
}
]
}
}
}

Example with intโ€‹

The following example retrieves the sum of the points property values:

response = (
client.query
.aggregate("JeopardyQuestion")
.with_fields("points { count sum }")
.do()
)
print(json.dumps(response, indent=2))
Example response

The query should produce a response like the one below:

{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"points": {
"count": 10000,
"sum": 6324100
}
}
]
}
}
}

Retrieve groupedBy propertiesโ€‹

You can use the groupBy variable to group the results set into subsets. Then, you can retrieve the grouped aggregate data for each group through the groupedBy properties.

For example, to list all distinct values of a property, and the counts for each:

response = (
client.query
.aggregate("JeopardyQuestion")
.with_group_by_filter(["round"])
.with_fields("groupedBy { value }")
.with_meta_count()
.do()
)
print(json.dumps(response, indent=2))
Example response

The query should produce a response like the one below:

{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"groupedBy": {
"value": "Double Jeopardy!"
},
"meta": {
"count": 5193
}
},
{
"groupedBy": {
"value": "Jeopardy!"
},
"meta": {
"count": 4522
}
},
{
"groupedBy": {
"value": "Final Jeopardy!"
},
"meta": {
"count": 285
}
}
]
}
}
}

With nearXXXโ€‹

When using a similarity search parameter (i.e. nearXXX) with Aggregate, you should include a way to limit the search results. This is because a vector search in itself does not exclude any objects from the results set.

Thus, for the vector search to affect the Aggregate output, you must set a limit on:

  • The number of results returned (with limit), or
  • How similar the results are to the query (with distance).

Set an object limitโ€‹

You can set the limit operator to specify the maximum number of results to be aggregated.

The below query retrieves the 10 question objects with vectors that are closest to "animals in space", and return the sum total of the point property.

response = (
client.query
.aggregate("JeopardyQuestion")
.with_near_text({
"concepts": ["animals in space"]
})
.with_object_limit(10)
.with_fields("points { sum }")
.do()
)
print(json.dumps(response, indent=2))
Example response

The query should produce a response like the one below:

{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"points": {
"sum": 4600
}
}
]
}
}
}

Set a maximum distanceโ€‹

You can set the distance operator to specify the maximum dissimilarity (i.e. minimum similarity) of results to be aggregated.

The below query retrieves the 10 question objects with vectors that are within a distance of 0.19 to "animals in space", and returns the sum total of the point property.

response = (
client.query
.aggregate("JeopardyQuestion")
.with_near_text({
"concepts": ["animals in space"],
"distance": 0.19
})
.with_fields("points { sum }")
.do()
)

print(json.dumps(response, indent=2))
Example response

The query should produce a response like the one below:

{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"points": {
"sum": 3000
}
}
]
}
}
}

Add a conditional (where) filterโ€‹

You can add a conditional filter to any aggregate search query, which will filter the results set.

The below example searches for objects where the round property equals Double Jeopardy! and returns the object count.

response = (
client.query
.aggregate("JeopardyQuestion")
.with_where({
"path": ["round"],
"operator": "Equal",
"valueText": "Final Jeopardy!"
})
.with_meta_count()
.do()
)

print(json.dumps(response, indent=2))
Example response

The query should produce a response like the one below:

{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"meta": {
"count": 285
}
}
]
}
}
}

More Resourcesโ€‹

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For more involved discussion: Weaviate Community Forum. Or,
  5. We also have a Slack channel.