Weaviate, end-to-end
Overview
Here, you will gain a hands-on overview of what you can do with Weaviate. If you have questions about some of the steps - don't worry, you can dig further into each step in more detail in later tutorials.
By the end of this page, you will have:
- Vectorized the quiz data
- Added the vectorized data to Weaviate, and
- Performed vector searches to retrieve relevant objects
Code examples
We have prepared code examples to help you follow along here. Go to weaviate-tutorials/quickstart on GitHub to take a look.
Prerequisites
At this point, you should have:
- A new instance of Weaviate running (e.g. on the Weaviate Cloud Services),
- An API key for your preferred inference API, such as OpenAI, Cohere, or Hugging Face, and
- Installed your preferred Weaviate client library.
We will be working with this dataset, which will be loaded directly from the remote URL.
Connect to Weaviate
You can connect to your instance of Weaviate using the Weaviate client as shown below. If this creates an instance of client, you should be ready to go.
- Python
- JavaScript
import weaviate
client = weaviate.Client(
url = "https://some-endpoint.weaviate.network/", # Replace with your endpoint
)
const weaviate = require("weaviate-client");
const client = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
});
Import data
Weaviate can take care of data vectorization at import time with its vectorizer modules
. So you don't need to worry about vectorization other than choosing an appropriate vectorizer and passing the data to Weaviate.
Using an inference API is one good way to do this. To do so:
- Specify a vectorizer module (e.g.
text2vec-openai
) - Provide the API key
- Load & import data into Weaviate
Specify a vectorizer
First, we must define the class objects to store the data and specify what vectorizer to use. The following will create a Question
class with the given vectorizer, and add it to the schema:
This tutorial uses the OpenAI API to obtain vectors. But you can use any of Cohere, Hugging Face or OpenAI inference APIs with WCS, as the relevant Weaviate modules for those are already built in by default.
Change the vectorizer setting below to point to your preferred module.
- Python
- JavaScript
class_obj = {
"class": "Question",
"vectorizer": "text2vec-openai" # Or "text2vec-cohere" or "text2vec-huggingface"
}
client.schema.create_class(class_obj)
let classObj = {
"class": "Question",
"vectorizer": "text2vec-openai" // Or "text2vec-cohere" or "text2vec-huggingface"
}
// add the schema
client
.schema
.classCreator()
.withClass(classObj)
.do()
.then(res => {
console.log(res)
})
.catch(err => {
console.error(err)
});
Weaviate will infer any further schema information from the given data. If you would like to know more, check out this tutorial which covers schemas in more detail.
Name 'Question' already used as a name for an Object class
You may see this error if you try to create a class that already exists in your instance of Weaviate. In this case, you can delete the class following the below instructions.
Confirm schema creation
After you have added the class to the schema, you can confirm that it has been created by visiting the schema
endpoint. You can inspect the Weaviate schema here (replace the URL with your actual endpoint):
https://some-endpoint.weaviate.network/v1/schema
You should see:
{
"classes": [
{
"class": "Question",
... // truncated additional information here
"vectorizer": "text2vec-openai"
}
]
}
Where the schema should indicate that the Question
class has been added.
Weaviate uses a combination of RESTful and GraphQL APIs. In Weaviate, RESTful API endpoints can be used to add data or obtain information about the Weaviate instance, and the GraphQL interface to retrieve data.
Deleting classes
See how you can delete classes.
If your Weaviate instance contains data you want removed, you can manually delete the unwanted class(es).
Know that deleting a class will also delete all associated objects!
Do not do this to a production database, or anywhere where you do not wish to delete your data.
Run the code below to delete the relevant class and its objects.
- Python
- JavaScript
import weaviate
client = weaviate.Client("https://some-endpoint.weaviate.network/") # Replace with your endpoint
# delete class "YourClassName" - THIS WILL DELETE ALL DATA IN THIS CLASS
client.schema.delete_class("YourClassName") # Replace with your class name - e.g. "Question"
const weaviate = require('weaviate-client');
const client = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
});
var className = 'YourClassName'; // Replace with your class name
client.schema
.classDeleter()
.withClassName(className)
.do()
.then(res => {
console.log(res);
})
.catch(err => {
console.error(err)
});
Provide the API key
The API key can be provided to Weaviate as an environment variable, or in the HTTP header with every request. Here, we will add them to the Weaviate client at instantiation as shown below. It will then send the key as a part of the header with every request.
- Python
- JavaScript
import weaviate
client = weaviate.Client(
url = "https://some-endpoint.weaviate.network/", # Replace with your endpoint
additional_headers = {
"X-OpenAI-Api-Key": "<THE-KEY>" # Replace with your API key
}
)
const weaviate = require("weaviate-client");
const client = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
headers: {'X-OpenAI-Api-Key': '<THE-KEY>'}, // Replace with your API key
});
If you are not using OpenAI, change the API key parameter in the code examples from X-OpenAI-Api-Key
to one relevant to your chosen inference API, such as X-Cohere-Api-Key
for Cohere or X-HuggingFace-Api-Key
for Hugging Face.
Load & import data
Now, we can load our dataset and import it into Weaviate. The code looks roughly like this:
- Python
- JavaScript
# Configure a batch process
with client.batch as batch:
batch.batch_size=100
for i, d in enumerate(data):
properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}
client.batch.add_data_object(properties, "Question")
async function importQuestions() {
// Prepare a batcher
let batcher = client.batch.objectsBatcher();
let counter = 0;
let batchSize = 100;
data.forEach(question => {
// Construct an object with a class and properties 'answer' and 'question'
const obj = {
class: 'Question',
properties: {
answer: question.Answer,
question: question.Question,
category: question.Category,
},
}
// add the object to the batch queue
batcher = batcher.withObject(obj);
// When the batch counter reaches batchSize, push the objects to Weaviate
if (counter++ == batchSize) {
// flush the batch queue
batcher
.do()
.then(res => {
console.log(res)
})
.catch(err => {
console.error(err)
});
// restart the batch queue
counter = 0;
batcher = client.batch.objectsBatcher();
}
});
// Flush the remaining objects
batcher
.do()
.then(res => {
console.log(res)
})
.catch(err => {
console.error(err)
});
}
importQuestions();
Note that we use a batch import process here for speed. You should use batch imports unless you have a good reason not to. We'll cover more on this later.
Putting it together
The following code puts it all together, taking care of everything from schema definition to data import. Remember to replace the endpoint and inference API key (and API key name if necessary).
- Python
- JavaScript
import weaviate
import json
client = weaviate.Client(
url = "https://some-endpoint.weaviate.network/", # Replace with your endpoint
additional_headers = {
"X-OpenAI-Api-Key": "<THE-KEY>" # Replace with your API key
}
)
# ===== add schema =====
class_obj = {
"class": "Question",
"vectorizer": "text2vec-openai"
}
client.schema.create_class(class_obj)
# ===== import data =====
# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)
# Configure a batch process
with client.batch as batch:
batch.batch_size=100
# Batch import all Questions
for i, d in enumerate(data):
print(f"importing question: {i+1}")
properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}
client.batch.add_data_object(properties, "Question")
const weaviate = require("weaviate-client");
const client = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
headers: {'X-OpenAI-Api-Key': '<THE-KEY>'}, // Replace with your API key
});
let classObj = {
"class": "Question",
"vectorizer": "text2vec-openai"
}
// add the schema
client
.schema
.classCreator()
.withClass(classObj)
.do()
.then(res => {
console.log(res)
})
.catch(err => {
console.error(err)
});
async function getJsonData() {
const file = await fetch('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json');
return file.json();
}
async function importQuestions() {
// Get the data from the data.json file
const data = await getJsonData();
// Prepare a batcher
let batcher = client.batch.objectsBatcher();
let counter = 0;
let batchSize = 100;
data.forEach(question => {
// Construct an object with a class and properties 'answer' and 'question'
const obj = {
class: 'Question',
properties: {
answer: question.Answer,
question: question.Question,
category: question.Category,
},
}
// add the object to the batch queue
batcher = batcher.withObject(obj);
// When the batch counter reaches batchSize, push the objects to Weaviate
if (counter++ == batchSize) {
// flush the batch queue
batcher
.do()
.then(res => {
console.log(res)
})
.catch(err => {
console.error(err)
});
// restart the batch queue
counter = 0;
batcher = client.batch.objectsBatcher();
}
});
// Flush the remaining objects
batcher
.do()
.then(res => {
console.log(res)
})
.catch(err => {
console.error(err)
});
}
importQuestions();
And that should have populated Weaviate with the data, including corresponding vectors!
Yes! You can bring your own vectors and pass them to Weaviate directly. See this reference for more information.
Note again that we did not provide any vectors to Weaviate. That's all managed by Weaviate, which calls the inference API for you and obtains a vector corresponding to your object at import time.
Confirm data import
To confirm successful data import, navigate to the objects
endpoint to check that all objects have been imported (replace with your actual endpoint):
https://some-endpoint.weaviate.network/v1/objects
You should see:
{
"deprecations": null,
"objects": [
... // Details of each object
],
"totalResults": 10 // You should see 10 results here
}
Where you should be able to confirm that you have imported all 10
objects.
Query Weaviate
Now that you've built a database, let's try some queries.
Text similarity search
One of the most common use cases is text similarity search. As we have a text2vec
module enabled, we can use the nearText
parameter for this purpose.
If you wanted to find entries which related to biology, you can apply the nearText
parameter like so:
- Python
- JavaScript
import weaviate
import json
client = weaviate.Client(
url="https://some-endpoint.weaviate.network/", # Replace with your endpoint
additional_headers={
"X-OpenAI-Api-Key": "<THE-KEY>" # Replace with your API key
}
)
nearText = {"concepts": ["biology"]}
result = (
client.query
.get("Question", ["question", "answer", "category"])
.with_near_text(nearText)
.with_limit(2)
.do()
)
print(json.dumps(result, indent=4))
const weaviate = require("weaviate-client");
const client = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
headers: {'X-OpenAI-Api-Key': '<THE-KEY>'}, // Replace with your API key
});
client.graphql
.get()
.withClassName('Question')
.withFields('question answer category')
.withNearText({concepts: ["biology"]})
.withLimit(2)
.do()
.then(res => {
console.log(JSON.stringify(res, null, 2))
})
.catch(err => {
console.error(err)
});
Note that we use the Get
function (or the relevant client implementation) to fetch objects, and the query text is specified in the concept
field.
You should see something like this:
{
"data": {
"Get": {
"Question": [
{
"answer": "DNA",
"category": "SCIENCE",
"question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
},
{
"answer": "species",
"category": "SCIENCE",
"question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification"
}
]
}
}
}
Note that even though the word 'biology' does not appear anywhere, Weaviate has returned biology-related entries (on DNA and species) as the closest results. Also, it has returned these entries over and above many entries otherwise related to animals in general.
That's a simple but powerful outcome, which shows a big reason behind the popularity of vector searches. Vectorized data objects allow for searches based on degrees of similarity, such as semantic similarity of text as we did here.
Try it out yourself with different strings, by changing the string from "biology".
Recap
If you made it here - well done. We have covered a lot in just a couple of pages, and you've successfully built a fully functioning vector database! 🥳
You have:
- Spun up an instance of Weaviate through WCS,
- Vectorized your dataset through an inference API,
- Populated your WCS instance with the vectorized data, and
- Performed text similarity searches.
Of course, there is a lot more to Weaviate that we have not yet covered, and probably a lot that you wish to know about. So we include a few links below that might help you to get started in your journey with us.
Also, please feel free to reach out to us on our community Slack. We love to hear from our users.
Next
You can choose your direction from here. For example, you can:
- Learn more about how to do things in Tutorials, like build schemas, import data, query data and more.
- Read about important concepts/theory about Weaviate
- Read our references for:
More Resources
If you can't find the answer to your question here, please look at the:
- Frequently Asked Questions. Or,
- Knowledge base of old issues. Or,
- For questions: Stackoverflow. Or,
- For issues: GitHub. Or,
- Ask your question in the Slack channel: Slack.