Query
Finally! It's time to start and query Weaviate!
For this guide, you don't have to load any data into a Weaviate, we are going to use the Wikipedia demo dataset.
But before we start, some basics:
- Weaviate's main API is its GraphQL-API
- New to GraphQL? Check this 100-second explainer video.
- Weaviate also has a RESTful API but it is used for other operations.
- You can also use the clients to query Weaviate natively in your language of choice. The clients will automatically determine which API to use for the request.
- Weaviate provides client libraries for convenience.
- The Weaviate Console contains an auto-complete feature to help you write queries.
Let's get started!
Root GraphQL functions
Weaviate's GraphQL-API has three root functions:
Get{}
to retrieve data based on the schema.Aggregate{}
to retrieve meta information (e.g., the number of data objects in a class).Explore{}
to explore the complete vector space without using the schema.
{
Get {
# etc...
}
}
{
Aggregate {
# etc...
}
}
{
Explore {
# etc...
}
}
A more detailed explanation of Weaviate's GraphQL design is available here.
Get{}
In the basics getting started guide, you've learned how Weaviate uses a class-property structure and in the schema getting started guide you've learned how you can define the class-property structure.
Our demo dataset has two classes: Article
and Paragraph
. The Article
class has the properties: title
of the data type string
, hasParagraphs
of the data type Paragraph
, and linksToArticles
of the data type Article
. The Paragraph
class has the properties: title
of the data type string
, content
of the data type text
, order
of the data type int
, and inArticle
of the data type Article
. You can also inspect the schema of the demo dataset in JSON format here.
If we now want to Get{}
all (well, "all", in this case, "all" means limited to the default limit, more about this later) Paragraph
s without cross-references we can run the following query:
{
Get {
Paragraph {
content
order
title
}
}
}
Or we can set a cross reference like this:
{
Get {
Paragraph {
content
order
title
inArticle { # <= cross reference
... on Article {
title
}
}
}
}
}
Let's break down what's happening in this section:
inArticle {
... on Article {
title
}
}
inArticle
is the name of the property with the cross-reference toParagraph
.... on Article
is set because we can have a single property cross reference to multiple classes (did you notice that the cross-reference is set inside an array?). In this specific dataset, we only reference toArticle
but this could be more than one class.title
is a property ofArticle
. Because we told GraphQL that we are going to query over the cross-referenceArticle
, it knows that the properties ofArticle
should be the available options.
You might remember that the Article
class also has a cross-reference back to Paragraph
. So, the following query is valid:
{
Get {
Paragraph {
content
order
title
inArticle {
... on Article {
title
hasParagraphs {
... on Paragraph {
title
order
}
}
}
}
}
}
}
The GraphQL-API has additional properties that can be retrieved with _additional{}
. Some modules extend the _additional{}
property too, more about this later.
An example with basic _additional{}
properties:
{
Get {
Paragraph {
content
title
_additional {
id # <= the UUID of the data object
vector # <= the vector (if any) of the object
creationTimeUnix # <= the creating time as unix timestamp
lastUpdateTimeUnix # <= the latest update as unix timestamp
}
}
}
}
Get{} with filters
This is where the fun really begins! In Weaviate we set arguments on the level of the class names between brackets.
Let's start with the simplest filter we have, the limit
filter to, as you might have guessed, limit the number of results to a given number.
{
Get {
Paragraph(
limit: 3
) {
content
title
}
}
}
Weaviate also allows you to paginate over the results:
{
Get {
Paragraph(
limit: 3
offset: 3
) {
content
title
}
}
}
You can also use the filters to query for specific vectors! Simply like this:
{
Get {
Paragraph(
nearVector: {
certainty: 0.95
vector: [
-0.14980282,
-0.18726847,
-0.20329526,
... # This may include hundreds (e.g. 384) of dimensions
-0.028092828,
0.41721362,
-0.09374439
]
}
) {
content
title
_additional {
certainty
}
}
}
}
Did you see certainty
? This is the distance from the vector to the data objects. You can also calculate the cosine similarity if you like based on the certainty, more about this here.
You can also do the equivalent but based on the UUID of any object in the same vector space (the same, because we match based on vector length. But if you use the same model to vectorize in different classes you can also mix them).
{
Get {
Paragraph(
nearObject: {
id: "fd7383f7-f2e3-3d50-a272-db9b614417cb"
}
limit: 5
) {
content
title
_additional {
certainty
}
}
}
}
Traditional inverted index filtering is also possible. In Weaviate we call this the where
filter.
The where
filter takes three operands of its own:
path
is the graph path in your schema.operator
is set to define what you want to do with the value inside the path (e.g.,Equal
orGreaterThan
, etc. See the list here).value*
is set based on the type of the property defined inpath
. So, if the property inpath
is anint
, this becomesvalueInt
, if it's astring
it becomesvalueString
. The value itself is whatever you want to filter on.
The examples below are a bit more explanatory.
Let's filter for "Italian cuisine" in the title
of the Paragraph
.
{
Get {
Paragraph(
where: {
path: ["title"]
operator: Equal
valueString: "Italian cuisine"
}
limit: 10
) {
title
order
}
}
}
Or for Paragraph
s where the order is higher than 5
.
{
Get {
Paragraph(
where: {
path: ["order"]
operator: GreaterThan
valueInt: 5
}
limit: 10
) {
title
order
}
}
}
Or by combining them setting multiple operands:
{
Get {
Paragraph(
where: {
operator: And # <= We can have And, Or, etc.
operands: [{
path: ["title"]
operator: Equal
valueString: "Italian cuisine"
},{
path: ["order"]
operator: GreaterThan
valueInt: 5
}]
}
limit: 5
) {
title
order
}
}
}
The path is an array, so this means you can also set the filter specifically for a cross reference:
{
Get {
Paragraph(
where: {
path: ["inArticle", "Article", "title"]
operator: Equal
valueString: "Francesco Bellissimo"
}
limit: 25
) {
content
inArticle {
... on Article {
title
}
}
}
}
}
And yes, you can combine vector search with where filters.
{
Get {
Paragraph(
nearObject: { # <= vector search
id: "fd7383f7-f2e3-3d50-a272-db9b614417cb"
}
where: { # <= where filter
path: ["inArticle", "Article", "title"]
operator: Equal
valueString: "Francesco Bellissimo"
}
limit: 25
) {
content
inArticle {
... on Article {
title
}
}
_additional {
certainty
}
}
}
}
We call Weaviate "vector first". This means that when combining vector search with a where filter, the where-filter will create an allowed-list that skips entries that are not allowed in the ANN index.
If you use Weaviate with modules (the current Wikipedia demo dataset uses the text2vec-transformers
vectorizer module and the Q&A generator module), they might add custom filters and custom _additional
properties. These arguments are described in the documentation of the respective modules themselves.
Let's explore the additional filters for the modules which are part of this dataset.
First, there is the additional nearText
filter exposed by the text2vec-transformers
module.
{
Get {
Paragraph(
nearText: {
concepts: ["Italian cuisine"]
}
limit: 10
) {
content
inArticle {
... on Article {
title
}
}
_additional {
certainty
}
}
}
}
Second, we can use the ask
arguments exposed by the Q&A module, note how there are also additional _additional
properties.
{
Get {
Paragraph(
ask: {
question: "Who is Francesco Bellissimo?"
}
limit: 1
) {
title
inArticle {
... on Article {
title
}
}
_additional {
answer {
result
}
}
}
}
}
And last but not least, we can combine all of them together!
{
Get {
Paragraph(
ask: {
question: "When was the program with Daniele Macuglia launched?"
}
where: { # <= where filter
path: ["inArticle", "Article", "title"]
operator: Equal
valueString: "Francesco Bellissimo"
}
limit: 1
) {
content
inArticle {
... on Article {
title
}
}
_additional {
answer {
result
}
}
}
}
}
Talking about filters, wanna see something cool? Weaviate has many more functions out-of-the-box like feature projection to visualize your results, like this example on a 3D surface.
{
Get {
Paragraph(
nearText: {
concepts: ["Italian cuisine"]
}
limit: 10
) {
content
inArticle {
... on Article {
title
}
}
_additional {
featureProjection( # <= feature projection
dimensions: 3
){
vector
}
}
}
}
}
Last but not least, all the standard filters are documented in the filters section of the GraphQL references documentation or in the documentation of the individual modules.
Aggregate{}
The Aggregate{}
can be used to show aggregated data. For example, how many objects do I have of the Paragraph
class?
There are three core concepts to keep in mind for the Aggregate
function.
- Doing something on a class level is done in the
meta
property. - Doing something on a property level is done inside the property.
- Different property types (e.g.,
string
,int
, etc) support different aggregate functions.
The examples below are a bit more explanatory.
Let's start with counting the number of data objects in the Paragraph
class:
{
Aggregate {
Paragraph {
meta { # <= the meta property
count
}
}
}
}
You can also mix in filters like this:
{
Aggregate {
Paragraph(
nearObject: { # <= vector search
id: "fd7383f7-f2e3-3d50-a272-db9b614417cb"
certainty: 0.5
}
where: { # <= where filter
path: ["inArticle", "Article", "title"]
operator: Equal
valueString: "Francesco Bellissimo"
}
) {
meta {
count
}
}
}
}
The order
property in the Paragraph
class is a nice example of how you can use the Aggregate{}
function for integers.
{
Aggregate {
Paragraph {
order {
count
maximum
mean
median
minimum
mode
sum
type
}
}
}
}
You can find detailed documentation on the Aggregate{}
function here.
Explore{}
The Explore{}
function can be used if you want to search through the complete vector space but if you don't know the class that you're targeting. Bear in mind, if you know the class, you know the properties, the types, etc.
In short, the Explore{}
function lets you explore the vector space.
Important to know: in almost any situation, need to do two queries when using the Explore{}
function and you must set a nearObject
or nearVector
search parameter.
- Target candidates based on your vector search or similarity search.
- Collect these candidates.
{
Explore(
nearObject: {
id: "fd7383f7-f2e3-3d50-a272-db9b614417cb"
certainty: 0.95
}
) {
beacon
className
certainty
distance
}
}
The Explore{}
function works very straightforwardly and only returns four properties.
beacon
contains the URL and the id. It's a "beacon in the vector space" how you can target a data object.className
contains the name of the class that this data object has.certainty
is the order from the query to the dataobject.distance
is the distance between the query and the data object.
Data objects without vectors are skipped.
Data objects with different vector lengths than the input vector length or ID are skipped.
Recap
Weaviate's GraphQL-API is used to query your datasets. The structure of the dataset is based on the schema you've defined. You can add vector filters, where-filters, filters from modules, and you can mix them all together.
More Resources
If you can't find the answer to your question here, please look at the:
- Frequently Asked Questions. Or,
- Knowledge base of old issues. Or,
- For questions: Stackoverflow. Or,
- For issues: Github. Or,
- Ask your question in the Slack channel: Slack.