Aggregate data
Overviewโ
This section shows how to retrieve aggregate data from a results set using the Aggregate
function. Aggregate
is largely similar to Get
, with the difference being that Aggregate
returns summary data about the results set instead of individual objects in the results set.
Aggregate
function requirementsโ
To use Aggregate
, you must specify at least:
- The target
class
to search, and - One or more aggregated properties. The aggregated properties can include:
- The
meta
property, - An object property, OR
- The
groupedBy
property (if usinggroupBy
).
- The
You must then select at least one sub-property for each selected property.
See the Aggregate
function syntax page for details.
Retrieve a meta
propertyโ
The meta
property has only one sub-property (count
) available. This returns the count of objects matched by the query.
- Python
- TypeScript
- GraphQL
response = (
client.query
.aggregate("JeopardyQuestion")
.with_meta_count()
.do()
)
print(json.dumps(response, indent=2))
result = await client
.graphql
.aggregate()
.withClassName('JeopardyQuestion')
.withFields('meta { count }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Aggregate {
JeopardyQuestion {
meta {
count
}
}
}
}
Example response
The query should produce a response like the one below:
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"meta": {
"count": 10000
}
}
]
}
}
}
Retrieve aggregated object properties
โ
You can retrieve aggregations of text
, number
, int
, or boolean
data types.
The available sub-types vary for each data type, except for type
which is available to all, and count
which is available to all but cross-references.
Example with text
โ
The following example retrieves information about the most commonly occurring examples in the question
property:
- Python
- TypeScript
- GraphQL
response = (
client.query
.aggregate("JeopardyQuestion")
.with_fields("answer { count type topOccurrences { occurs value } }")
.do()
)
print(json.dumps(response, indent=2))
result = await client
.graphql
.aggregate()
.withClassName('JeopardyQuestion')
.withFields('answer { count type topOccurrences { occurs value } }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Aggregate {
JeopardyQuestion {
answer {
count
type
topOccurrences {
occurs
value
}
}
}
}
}
Example response
The query should produce a response like the one below:
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"answer": {
"count": 10000,
"topOccurrences": [
{
"occurs": 19,
"value": "Australia"
},
{
"occurs": 18,
"value": "Hawaii"
},
{
"occurs": 16,
"value": "Boston"
},
{
"occurs": 15,
"value": "French"
},
{
"occurs": 15,
"value": "India"
}
],
"type": "text"
}
}
]
}
}
}
Example with int
โ
The following example retrieves the sum of the points
property values:
- Python
- TypeScript
- GraphQL
response = (
client.query
.aggregate("JeopardyQuestion")
.with_fields("points { count sum }")
.do()
)
print(json.dumps(response, indent=2))
result = await client
.graphql
.aggregate()
.withClassName('JeopardyQuestion')
.withFields('points { count sum }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Aggregate {
JeopardyQuestion {
points {
count
sum
}
}
}
}
Example response
The query should produce a response like the one below:
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"points": {
"count": 10000,
"sum": 6324100
}
}
]
}
}
}
Retrieve groupedBy
propertiesโ
You can use the groupBy
variable to group the results set into subsets. Then, you can retrieve the grouped aggregate data for each group through the groupedBy
properties.
For example, to list all distinct values of a property, and the counts for each:
- Python
- TypeScript
- GraphQL
response = (
client.query
.aggregate("JeopardyQuestion")
.with_group_by_filter(["round"])
.with_fields("groupedBy { value }")
.with_meta_count()
.do()
)
print(json.dumps(response, indent=2))
result = await client
.graphql
.aggregate()
.withClassName('JeopardyQuestion')
.withGroupBy(['round'])
.withFields('groupedBy { value } meta { count }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Aggregate {
JeopardyQuestion(groupBy: "round") {
groupedBy {
value
}
meta {
count
}
}
}
}
Example response
The query should produce a response like the one below:
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"groupedBy": {
"value": "Double Jeopardy!"
},
"meta": {
"count": 5193
}
},
{
"groupedBy": {
"value": "Jeopardy!"
},
"meta": {
"count": 4522
}
},
{
"groupedBy": {
"value": "Final Jeopardy!"
},
"meta": {
"count": 285
}
}
]
}
}
}
With nearXXX
โ
When using a similarity search parameter (i.e. nearXXX
) with Aggregate
, you should include a way to limit the search results. This is because a vector search in itself does not exclude any objects from the results set.
Thus, for the vector search to affect the Aggregate
output, you must set a limit on:
- The number of results returned (with
limit
), or - How similar the results are to the query (with
distance
).
Set an object limit
โ
You can set the limit
operator to specify the maximum number of results to be aggregated.
The below query retrieves the 10 question
objects with vectors that are closest to "animals in space"
, and return the sum total of the point
property.
- Python
- TypeScript
- GraphQL
response = (
client.query
.aggregate("JeopardyQuestion")
.with_near_text({
"concepts": ["animals in space"]
})
.with_object_limit(10)
.with_fields("points { sum }")
.do()
)
print(json.dumps(response, indent=2))
result = await client
.graphql
.aggregate()
.withClassName('JeopardyQuestion')
.withNearText({
concepts: ['animals in space'],
})
.withObjectLimit(10)
.withFields('points { sum }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Aggregate {
JeopardyQuestion(
nearText: {
concepts: ["animals in space"]
}
objectLimit: 10
) {
points {
sum
}
}
}
}
Example response
The query should produce a response like the one below:
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"points": {
"sum": 4600
}
}
]
}
}
}
Set a maximum distance
โ
You can set the distance
operator to specify the maximum dissimilarity (i.e. minimum similarity) of results to be aggregated.
The below query retrieves the 10 question
objects with vectors that are within a distance of 0.19
to "animals in space"
, and returns the sum total of the point
property.
- Python
- TypeScript
- GraphQL
response = (
client.query
.aggregate("JeopardyQuestion")
.with_near_text({
"concepts": ["animals in space"],
"distance": 0.19
})
.with_fields("points { sum }")
.do()
)
print(json.dumps(response, indent=2))
result = await client
.graphql
.aggregate()
.withClassName('JeopardyQuestion')
.withNearText({
concepts: ['animals in space'],
distance: 0.19,
})
.withFields('points { sum }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Aggregate {
JeopardyQuestion(
nearText: {
concepts: ["animals in space"]
distance: 0.19
}
) {
points {
sum
}
}
}
}
Example response
The query should produce a response like the one below:
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"points": {
"sum": 3000
}
}
]
}
}
}
Add a conditional (where
) filterโ
You can add a conditional filter to any aggregate search query, which will filter the results set.
The below example searches for objects where the round
property equals Double Jeopardy!
and returns the object count.
- Python
- TypeScript
- GraphQL
response = (
client.query
.aggregate("JeopardyQuestion")
.with_where({
"path": ["round"],
"operator": "Equal",
"valueText": "Final Jeopardy!"
})
.with_meta_count()
.do()
)
print(json.dumps(response, indent=2))
result = await client
.graphql
.aggregate()
.withClassName('JeopardyQuestion')
.withWhere({
path: ['round'],
operator: 'Equal',
valueText: 'Final Jeopardy!',
})
.withFields('meta { count }')
.do();
console.log(JSON.stringify(result, null, 2));
{
Aggregate {
JeopardyQuestion(where: {
path: ["round"]
operator: Equal
valueText: "Final Jeopardy!"
}) {
meta {
count
}
}
}
}
Example response
The query should produce a response like the one below:
{
"data": {
"Aggregate": {
"JeopardyQuestion": [
{
"meta": {
"count": 285
}
}
]
}
}
}
More Resourcesโ
If you can't find the answer to your question here, please look at the:
- Frequently Asked Questions. Or,
- Knowledge base of old issues. Or,
- For questions: Stackoverflow. Or,
- For more involved discussion: Weaviate Community Forum. Or,
- We also have a Slack channel.