Skip to main content

Aggregate

Overview

This page covers aggregation queries. They are collectively referred to as Aggregate queries within.

An Aggregate query can aggregate over an entire collection, or the results of a search.

Parameters

An Aggregate query requires the target collection to be specified. Each query can include any of the following types of arguments:

ArgumentDescriptionRequired
CollectionAlso called "class". The object collection to be retrieved from.Yes
PropertiesProperties to be retrievedYes
Conditional filtersFilter the objects to be retrievedNo
Search operatorsSpecify the search strategy (e.g. near text, hybrid, bm25)No
Additional operatorsSpecify additional operators (e.g. limit, offset, sort)No
Tenant nameSpecify the tenant nameYes, if multi-tenancy enabled. (Read more: what is multi-tenancy?)
Consistency levelSpecify the consistency levelNo

Available properties

Each data type has its own set of available aggregated properties. The following table shows the available properties for each data type.

Data typeAvailable properties
Textcount, type, topOccurrences (value, occurs)
Numbercount, type, minimum, maximum, mean, median, mode, sum
Integercount, type, minimum, maximum, mean, median, mode, sum
Booleancount, type, totalTrue, totalFalse, percentageTrue, percentageFalse
Datecount, type, minimum, maximum, mean, median, mode
See a GraphQL Aggregate format
{
Aggregate {
<Class> (groupBy:[<property>]) {
groupedBy { # requires `groupBy` filter
path
value
}
meta {
count
}
<propertyOfDatatypeText> {
count
type
topOccurrences (limit: <n_minimum_count>) {
value
occurs
}
}
<propertyOfDatatypeNumberOrInteger> {
count
type
minimum
maximum
mean
median
mode
sum
}
<propertyOfDatatypeBoolean> {
count
type
totalTrue
totalFalse
percentageTrue
percentageFalse
}
<propertyWithReference>
pointingTo
type
}
}
}

Below is an example query to obtain meta information about the Article collection. Note that the data is not grouped here, and results relate to all data objects in the Article collection.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.aggregate.over_all(
total_count=True,
return_metrics=wvc.query.Metrics("wordCount").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
),
)

print(response.total_count)
print(response.properties)

finally:
client.close()

The above query will result in something like the following:

{
"data": {
"Aggregate": {
"Article": [
{
"inPublication": {
"pointingTo": [
"Publication"
],
"type": "cref"
},
"meta": {
"count": 4403
},
"wordCount": {
"count": 4403,
"maximum": 16852,
"mean": 966.0113558937088,
"median": 680,
"minimum": 109,
"mode": 575,
"sum": 4253348,
"type": "int"
}
}
]
}
}
}
meta { count } will return the query object count

As such, this Aggregate query will retrieve the total object count in a class.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.aggregate.over_all(total_count=True)

print(response.total_count)

finally:
client.close()

groupBy argument

You can use a groupBy argument to get meta information about groups of data objects, from those matching a query. The groups can be based on a property of the data objects.

groupBy limitations
  • groupBy only works with near<Media> operators.
  • The groupBy path is limited to one property or cross-reference. Nested paths are not supported.

The groupBy argument is structured as follows for the Aggregate function:

{
Aggregate {
<Class> ( groupBy: ["<propertyName>"] ) {
groupedBy {
path
value
}
meta {
count
}
<propertyName> {
count
}
}
}
}

In the following example, the articles are grouped by the property inPublication, referring to the article's publisher.

import weaviate
import weaviate.classes as wvc
import os
from weaviate.classes.aggregate import GroupByAggregate

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.aggregate.over_all(
group_by=GroupByAggregate(prop="inPublication"),
total_count=True,
return_metrics=wvc.query.Metrics("wordCount").integer(mean=True)
)

for g in response.groups:
print(g.total_count)
print(g.properties)
print(g.grouped_by)

finally:
client.close()
Expected response
{
"data": {
"Aggregate": {
"Article": [
{
"groupedBy": {
"path": [
"inPublication"
],
"value": "weaviate://localhost/Publication/16476dca-59ce-395e-b896-050080120cd4"
},
"meta": {
"count": 829
},
"wordCount": {
"mean": 604.6537997587454
}
},
{
"groupedBy": {
"path": [
"inPublication"
],
"value": "weaviate://localhost/Publication/c9a0e53b-93fe-38df-a6ea-4c8ff4501783"
},
"meta": {
"count": 618
},
"wordCount": {
"mean": 917.1860841423949
}
},
...
]
}
}
}

Additional filters

Aggregate functions can be extended with conditional filters read more.

topOccurrences property

Aggregating data makes the topOccurrences property available. Note that the counts are not dependent on tokenization. The topOccurrences count is based on occurrences of the entire property, or one of the values if the property is an array.

You can optionally specify a limit parameter as a minimum count for the top occurrences. For example, limit: 5 will filter the top occurrences to those with a count of 5 or higher.

Consistency levels

Not available with Aggregate

Aggregate queries are currently not available with different consistency levels.

Multi-tenancy

Added in v1.20

Where multi-tenancy is configured, the Aggregate function can be configured to aggregate results from a specific tenant.

You can do so by specifying the tenant parameter in the query as shown below, or in the client.

{
Aggregate {
Article (
tenant: "tenantA"
) {
meta {
count
}
}
}
}
See HOW-TO guide

For more information on using multi-tenancy, see the Multi-tenancy operations guide.

note

This feature was added in v1.13.0

You can combine a vector search (e.g. nearObject, nearVector, nearText, nearImage, etc.) with an aggregation. Internally, this is a two-step process where the vector search first finds the desired objects, then the results are aggregated.

Limiting the search space

Vector searches compare objects by similarity. Thus they do not exclude any objects.

As a result, for a search operator to have an impact on an aggregation, you must limit the search space with an objectLimit or certainty.

You can achieve such a restriction of the search space in two different ways:

  • objectLimit, e.g. objectLimit: 100 specifies Weaviate to retrieve the top 100 objects related to a vector search query, then aggregate them. This is useful when you know up front how many results you want to serve, for example in a recommendation scenario, where you want to produce 100 recommendations.

  • certainty, e.g. certainty: 0.7 specifies Weaviate to retrieve all possible matches that have a certainty of 0.7 or higher. This list has no fixed length, it depends on how many objects were good matches. This is useful in user-facing search scenarios, such as e-commerce. The user might be interested in all search results semantically similar to "apple iphone" and then generate facets.

If neither an objectLimit, nor a certainty is set the query will error.

Examples

Below are examples for nearObject, nearVector, and nearText. Any near<Media> will work.

nearObject

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.aggregate.near_object(
near_object="00037775-1432-35e5-bc59-443baaef7d80",
distance=0.6,
object_limit=200,
total_count=True,
return_metrics=[
wvc.query.Metrics("wordCount").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
),
]
)

print(response.total_count)
print(response.properties)

finally:
client.close()

nearVector

Replace placeholder vector

To run this query, replace the placeholder vector with a real vector from the same vectorizer that used to generate object vectors.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.aggregate.near_vector(
near_vector=some_vector,
distance=0.7,
object_limit=100,
total_count=True,
return_metrics=[
wvc.query.Metrics("wordCount").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
),
]
)

print(response.total_count)
print(response.properties)

finally:
client.close()

nearText

note

For nearText to be available, a text2vec-* module must be installed with Weaviate.

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_local()

try:
collection = client.collections.get("Article")
response = collection.aggregate.near_text(
query="apple iphone",
distance=0.7,
object_limit=200,
total_count=True,
return_metrics=[
wvc.query.Metrics("wordCount").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
),
]
)

print(response.total_count)
print(response.properties)

finally:
client.close()

Questions and feedback

If you have any questions or feedback, let us know in the user forum.