Queries in detail
Overview
In this section, we will explore different queries that you can perform with Weaviate. Here, we will expand on the nearText
queries that you may have seen in the Quickstart tutorial to show you different query types, filters and metrics that can be used.
By the end of this section, you will have performed vector and scalar searches separately as well as in combination to retrieve individual objects and aggregations.
Prerequisites
We recommend you complete the Quickstart tutorial first.
Before you start this tutorial, you should follow the steps in the Quickstart to have:
- An instance of Weaviate running (e.g. on the Weaviate Cloud),
- An API key for your preferred inference API, such as OpenAI, Cohere, or Hugging Face,
- Installed your preferred Weaviate client library,
- Set up a
Question
class in your schema, and - Imported the
jeopardy_tiny.json
data.
Object retrieval with Get
Weaviate's queries are built using GraphQL. If this is new to you, don't worry. We will take it step-by-step and build up from the basics. Also, in many cases, the GraphQL syntax is abstracted by the client.
You can query Weaviate using one or a combination of a semantic (i.e. vector) search and a lexical (i.e. scalar) search. As you've seen, a vector search allows for similarity-based searches, while scalar searches allow filtering by exact matches.
First, we will start by making queries to Weaviate to retrieve Question objects that we imported earlier.
The Weaviate function for retrieving objects is Get
.
This might be familiar for some of you. If you have completed our Imports in detail tutorial, you may have performed a Get
query to confirm that the data import was successful. Here is the same code as a reminder:
- Python
- JS/TS Client v2
import weaviate
import json
client = weaviate.Client("https://WEAVIATE_INSTANCE_URL/") # Replace with your Weaviate endpoint
some_objects = client.data_object.get()
print(json.dumps(some_objects))
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'https',
host: 'WEAVIATE_INSTANCE_URL', // Replace with your Weaviate endpoint
});
const response = await client
.data
.getter()
.do();
console.log(JSON.stringify(response, null, 2));
This query simply asks Weaviate for some objects of this (Question
) class.
Of course, in most cases we would want to retrieve information on some criteria. Let's build on this query by adding a vector search.
Get
with nearText
This is a vector search using a Get
query.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
- Go
- Curl
import weaviate
import weaviate.classes as wvc
import os
# Best practice: store your credentials in environment variables
wcd_url = os.environ["WCD_DEMO_URL"]
wcd_api_key = os.environ["WCD_DEMO_RO_KEY"]
openai_api_key = os.environ["OPENAI_APIKEY"]
client = weaviate.connect_to_weaviate_cloud(
cluster_url=wcd_url, # Replace with your Weaviate Cloud URL
auth_credentials=wvc.init.Auth.api_key(wcd_api_key), # Replace with your Weaviate Cloud key
headers={"X-OpenAI-Api-Key": openai_api_key} # Replace with appropriate header key/value pair for the required API
)
try:
pass # Work with the client. Close client gracefully in the finally block.
questions = client.collections.get("Question")
response = questions.query.near_text(
query="biology",
limit=2
)
print(response.objects[0].properties) # Inspect the first object
finally:
client.close() # Close client gracefully
import weaviate
import json
client = weaviate.Client(
url = "https://WEAVIATE_INSTANCE_URL", # Replace with your Weaviate endpoint
auth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace with your Weaviate instance API key
additional_headers = {
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY" # Replace with your inference API key
}
)
response = (
client.query
.get("Question", ["question", "answer", "category"])
.with_near_text({"concepts": ["biology"]})
.with_limit(2)
.do()
)
print(json.dumps(response, indent=4))
import weaviate, { WeaviateClient } from 'weaviate-client';
const weaviateURL = process.env.WEAVIATE_URL as string
const weaviateKey = process.env.WEAVIATE_ADMIN_KEY as string
const openaiKey = process.env.OPENAI_API_KEY as string
const client: WeaviateClient = await weaviate.connectToWeaviateCloud(weaviateURL, {
authCredentials: new weaviate.ApiKey(weaviateKey),
headers: {
'X-OpenAI-Api-Key': openaiKey, // Replace with your inference API key
}
})
async function nearTextQuery() {
const questions = client.collections.get('Question');
const result = await questions.query.nearText('biology', {
limit:2
});
for (let object of result.objects) {
console.log(JSON.stringify(object.properties, null, 2));
}
return result;
}
await nearTextQuery();
import weaviate, { WeaviateClient, ObjectsBatcher, ApiKey } from 'weaviate-ts-client';
import fetch from 'node-fetch';
const client: WeaviateClient = weaviate.client({
scheme: 'https',
host: 'WEAVIATE_INSTANCE_URL', // Replace with your Weaviate endpoint
apiKey: new ApiKey('YOUR-WEAVIATE-API-KEY'), // Replace with your Weaviate instance API key
headers: { 'X-OpenAI-Api-Key': 'YOUR-OPENAI-API-KEY' }, // Replace with your inference API key
});
async function nearTextQuery() {
const res = await client.graphql
.get()
.withClassName('Question')
.withFields('question answer category')
.withNearText({concepts: ['biology']})
.withLimit(2)
.do();
console.log(JSON.stringify(res, null, 2));
return res;
}
await nearTextQuery();
await nearTextWhereQuery();
// Set these environment variables
// WEAVIATE_URL your Weaviate instance URL, without https prefix
// WEAVIATE_API_KEY your Weaviate instance API key
// OPENAI_API_KEY your OpenAI API key
package main
import (
"context"
"fmt"
"os"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/auth"
"github.com/weaviate/weaviate-go-client/v4/weaviate/graphql"
)
func main() {
// Create the client
cfg := weaviate.Config{
Host: os.Getenv("WEAVIATE_URL"),
Scheme: "https",
AuthConfig: auth.ApiKey{Value: os.Getenv("WEAVIATE_API_KEY")},
Headers: map[string]string{
"X-OpenAI-Api-Key": os.Getenv("OPENAI_API_KEY"),
},
}
client, err := weaviate.NewClient(cfg)
if err != nil {
fmt.Println(err)
}
fields := []graphql.Field{
{Name: "question"},
{Name: "answer"},
{Name: "category"},
}
nearText := client.GraphQL().
NearTextArgBuilder().
WithConcepts([]string{"biology"})
result, err := client.GraphQL().Get().
WithClassName("Question").
WithFields(fields...).
WithNearText(nearText).
WithLimit(2).
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", result)
}
echo '{
"query": "{
Get {
Question (
limit: 2
nearText: {
concepts: [\"biology\"],
}
) {
question
answer
category
}
}
}"
}' | tr -d "\n" | curl \
-X POST \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "X-OpenAI-Api-Key: $OPENAI_API_KEY" \
-d @- \
https://WEAVIATE_INSTANCE_URL/v1/graphql # Replace WEAVIATE_INSTANCE_URL with your instance URL # Replace this with your endpoint
This might also look familiar, as it was used in the Quickstart tutorial. But let's break it down a little.
Here, we are using a nearText
operator. What we are doing is to provide Weaviate with a query concept
of biology
. Weaviate then converts this into a vector through the inference API (OpenAI in this particular example) and uses that vector as the basis for a vector search.
Also note here that we pass the API key in the header. This is required as the inference API is used to vectorize the input query.
Additionally, we use the limit
argument to only fetch a maximum of two (2) objects.
If you run this query, you should see the entries on "DNA" and "species" returned by Weaviate.
Get
with nearVector
In some cases, you might wish to input a vector directly as a search query. For example, you might be running Weaviate with a custom, external vectorizer. In such a case, you can use the nearVector
operator to provide the query vector to Weaviate.
For example, here is an example Python code obtaining an OpenAI embedding manually and providing it through the nearVector
operator:
import openai
openai.api_key = "YOUR-OPENAI-API-KEY"
model="text-embedding-ada-002"
oai_resp = openai.Embedding.create(input = ["biology"], model=model)
oai_embedding = oai_resp['data'][0]['embedding']
result = (
client.query
.get("Question", ["question", "answer"])
.with_near_vector({
"vector": oai_embedding,
"certainty": 0.7
})
.with_limit(2)
.do()
)
print(json.dumps(result, indent=4))
And it should return the same results as above.
Note that we used the same OpenAI embedding model (text-embedding-ada-002
) here so that the vectors are in the same vector "space".
You might also have noticed that we have added a certainty
argument in the with_near_vector
method. This lets you specify a similarity threshold for objects, and can be very useful for ensuring that no distant objects are returned.
Additional properties
We can ask Weaviate to return _additional
properties for any returned objects. This allows us to obtain properties such as the vector
of each returned object as well as the actual certainty
value, so we can verify how close each object is to our query vector. Here is a query that will return the certainty
value:
- GraphQL
- Python
- JS/TS Client v2
{
Get{
Question(
nearText: {
concepts: ["biology"],
}
){
question
answer
}
}
}
import weaviate
import json
client = weaviate.Client(
url="https://WEAVIATE_INSTANCE_URL/", # Replace with your Weaviate endpoint
additional_headers={
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY" # Or "X-Cohere-Api-Key" or "X-HuggingFace-Api-Key"
}
)
nearText = {"concepts": ["biology"]}
result = (
client.query
.get("Question", ["question", "answer", "category"])
.with_near_text(nearText)
.with_limit(2)
.with_additional(['certainty'])
.do()
)
print(json.dumps(result, indent=4))
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'https',
host: 'WEAVIATE_INSTANCE_URL', // Replace with your Weaviate endpoint
headers: { 'X-OpenAI-Api-Key': process.env['OPENAI_API_KEY'] }, // Replace with your API key
});
const response = await client.graphql
.get()
.withClassName('Question')
.withFields('question answer category _additional {certainty}')
.withNearText({ concepts: ['biology'] })
.withLimit(2)
.do();
console.log(JSON.stringify(response, null, 2));
Try it out, and you should see a response like this:
{
"data": {
"Get": {
"Question": [
{
"_additional": {
"certainty": 0.9030631184577942
},
"answer": "DNA",
"category": "SCIENCE",
"question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
},
{
"_additional": {
"certainty": 0.900638073682785
},
"answer": "species",
"category": "SCIENCE",
"question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification"
}
]
}
}
}
You can try modifying this query to see if you retrieve the vector (note - it will be a looooong response 😉).
We encourage you to also try out different queries and see how that changes the results and distances not only with this dataset but also with different datasets, and/or vectorizers.
Filters
As useful as it is, sometimes vector search alone may not be sufficient. For example, you may actually only be interested in Question objects in a particular category, for instance.
In these cases, you can use Weaviate's scalar filtering capabilities - either alone, or in combination with the vector search.
Try the following:
- Python
- JS/TS Client v2
import weaviate
import json
client = weaviate.Client(
url="https://some-endpoint.semi.network",
additional_headers={
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"
}
)
where_filter = {
"path": ["category"],
"operator": "Equal",
"valueText": "ANIMALS",
}
result = (
client.query
.get("Question", ["question", "answer", "category"])
.with_near_text({"concepts": ["biology"]})
.with_where(where_filter)
.do()
)
print(json.dumps(result, indent=4))
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'https',
host: 'WEAVIATE_INSTANCE_URL', // Replace with your Weaviate endpoint
headers: { 'X-OpenAI-Api-Key': process.env['OPENAI_API_KEY'] }, // Replace with your API key
});
const response = await client.graphql
.get()
.withClassName('Question')
.withFields('question answer category')
.withNearText({ concepts: ['biology'] })
.withWhere({
path: ['category'],
operator: 'Equal',
valueText: 'ANIMALS',
})
.do();
console.log(JSON.stringify(response, null, 2));
This query asks Weaviate for Question objects whose category contains the string ANIMALS
. You should see a result like this:
{
"data": {
"Get": {
"Question": [
{
"answer": "the diamondback rattler",
"category": "ANIMALS",
"question": "Heaviest of all poisonous snakes is this North American rattlesnake"
},
{
"answer": "Elephant",
"category": "ANIMALS",
"question": "It's the only living mammal in the order Proboseidea"
},
{
"answer": "the nose or snout",
"category": "ANIMALS",
"question": "The gavial looks very much like a crocodile except for this bodily feature"
},
{
"answer": "Antelope",
"category": "ANIMALS",
"question": "Weighing around a ton, the eland is the largest species of this animal in Africa"
}
]
}
}
}
Now that you've seen a scalar filter, let's see how it can be combined with vector search functions.
Vector search with scalar filters
Combining a filter with a vector search is an additive process. Let us show you what we mean by that.
- Python
- JS/TS Client v2
import weaviate
import json
client = weaviate.Client(
url="https://WEAVIATE_INSTANCE_URL/", # Replace with your Weaviate endpoint
additional_headers={
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY" # Or "X-Cohere-Api-Key" or "X-HuggingFace-Api-Key"
}
)
nearText = {"concepts": ["biology"]}
where_filter = {
"path": ["category"],
"operator": "Equal",
"valueText": "ANIMALS",
}
result = (
client.query
.get("Question", ["question", "answer", "category"])
.with_near_text(nearText)
.with_limit(2)
.with_additional(['certainty'])
.with_where(where_filter)
.do()
)
print(json.dumps(result, indent=4))
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'https',
host: 'WEAVIATE_INSTANCE_URL', // Replace with your Weaviate endpoint
headers: { 'X-OpenAI-Api-Key': process.env['OPENAI_API_KEY'] }, // Replace with your API key
});
const response = await client.graphql
.get()
.withClassName('Question')
.withFields('question answer category _additional { id certainty }')
.withNearText({ concepts: ['biology'] })
.withWhere({
path: ['category'],
operator: 'Equal',
valueText: 'ANIMALS',
})
.withLimit(2)
.do();
console.log(JSON.stringify(response, null, 2));
This query asks Weaviate for Question objects that are closest to "biology", but within the category of ANIMALS
. You should see a result like this:
{
"data": {
"Get": {
"Question": [
{
"_additional": {
"certainty": 0.8918434679508209
},
"answer": "the nose or snout",
"category": "ANIMALS",
"question": "The gavial looks very much like a crocodile except for this bodily feature"
},
{
"_additional": {
"certainty": 0.8867587149143219
},
"answer": "Elephant",
"category": "ANIMALS",
"question": "It's the only living mammal in the order Proboseidea"
}
]
}
}
}
Note that the results are confined to the choices from the 'animals' category. Note that these results, while not being cutting-edge science, are biological factoids.
Metadata with Aggregate
As the name suggests, the Aggregate
function can be used to show aggregated data such as on entire classes or groups of objects.
For example, the following query will return the number of data objects in the Question
class:
- GraphQL
- Python
- JS/TS Client v2
{
Aggregate {
Question {
meta {
count
}
}
}
}
import weaviate
import json
client = weaviate.Client(
url="https://WEAVIATE_INSTANCE_URL/", # Replace with your Weaviate endpoint
additional_headers={
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY" # Or "X-Cohere-Api-Key" or "X-HuggingFace-Api-Key"
}
)
result = (
client.query
.aggregate("Question")
.with_fields("meta { count }")
.do()
)
print(json.dumps(result, indent=4))
import weaviate, { ApiKey } from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'https',
host: 'WEAVIATE_INSTANCE_URL', // Replace with your Weaviate endpoint
apiKey: new ApiKey('YOUR-WEAVIATE-API-KEY'), // Replace with your Weaviate API key
headers: { 'X-OpenAI-Api-Key': process.env['OPENAI_API_KEY'] }, // Replace with your API key
});
const response = await client.graphql
.aggregate()
.withClassName('Question')
.withFields('meta { count }')
.do();
console.log(JSON.stringify(response, null, 2));
And you can also use the Aggregate
function with filters, just as you saw with the Get
function above. For example, this query will return the number of Question objects with the category "ANIMALS".
- GraphQL
- Python
- JS/TS Client v2
{
Aggregate {
Question(
where: {
path: "category"
operator: Equal
valueText: "ANIMALS"
}
) {
meta {
count
}
}
}
}
import weaviate
import json
client = weaviate.Client(
url="https://WEAVIATE_INSTANCE_URL/", # Replace with your Weaviate endpoint
additional_headers={
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY" # Or "X-Cohere-Api-Key" or "X-HuggingFace-Api-Key"
}
)
where_filter = {
"path": ["category"],
"operator": "Equal",
"valueText": "ANIMALS",
}
result = (
client.query
.aggregate("Question")
.with_fields("meta { count }")
.with_where(where_filter)
.do()
)
print(json.dumps(result, indent=4))
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'https',
host: 'WEAVIATE_INSTANCE_URL', // Replace with your Weaviate endpoint
headers: { 'X-OpenAI-Api-Key': process.env['OPENAI_API_KEY'] }, // Replace with your API key
});
const response = await client.graphql
.aggregate()
.withClassName('Question')
.withFields('meta { count }')
.withWhere({
path: ['category'],
operator: 'Equal',
valueText: 'ANIMALS',
})
.do();
console.log(JSON.stringify(response, null, 2));
And as you saw above, there are four objects that match the query filter.
{
"data": {
"Aggregate": {
"Question": [
{
"meta": {
"count": 4
}
}
]
}
}
}
Here, Weaviate has identified the same objects that you saw earlier in the similar Get
queries. The difference is that instead of returning the individual objects you are seeing the requested aggregated statistic (count) here.
As you can see, the Aggregate
function can return handy aggregated, or metadata, information from the Weaviate database.
Recap
Get
queries are used for retrieving data objects.Aggregate
queries can be used to retrieve metadata, or aggregated data.- Operators such as
nearText
ornearVector
can be used for vector queries. - Scalar filters can be used for exact filtering, taking advantage of inverted indexes.
- Vector and scalar filters can be combined, and are available on both
Get
andAggregate
queries
Suggested reading
- Tutorial: Schemas in detail
- Tutorial: Import in detail
- Tutorial: Introduction to modules
- Tutorial: Introduction to Weaviate Console
Notes
How is certainty calculated?
certainty
in Weaviate is a measure of distance from the vector to the data objects. You can also calculate the cosine similarity based on the certainty as described here.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.