Skip to main content

Weaviate, end-to-end

LICENSE Weaviate on Stackoverflow badge Weaviate issues on GitHub badge Weaviate version badge Weaviate total Docker pulls badge Go Report Card

Overview

Here, you will gain a hands-on overview of what you can do with Weaviate. If you have questions about some of the steps - don't worry, you can dig further into each step in more detail in later tutorials.

By the end of this page, you will have:

  • Vectorized the quiz data
  • Added the vectorized data to Weaviate, and
  • Performed vector searches to retrieve relevant objects

Code examples

We have prepared code examples to help you follow along here. Go to weaviate-tutorials/quickstart on GitHub to take a look.

Prerequisites

At this point, you should have:

  • A new instance of Weaviate running (e.g. on the Weaviate Cloud Services),
  • An API key for your preferred inference API, such as OpenAI, Cohere, or Hugging Face, and
  • Installed your preferred Weaviate client library.

We will be working with this dataset, which will be loaded directly from the remote URL.

Connect to Weaviate

You can connect to your instance of Weaviate using the Weaviate client as shown below. If this creates an instance of client, you should be ready to go.

import weaviate

client = weaviate.Client(
url = "https://some-endpoint.weaviate.network/", # Replace with your endpoint
)

Import data

Weaviate can take care of data vectorization at import time with its vectorizer modules. So you don't need to worry about vectorization other than choosing an appropriate vectorizer and passing the data to Weaviate.

Using an inference API is one good way to do this. To do so:

  • Specify a vectorizer module (e.g. text2vec-openai)
  • Provide the API key
  • Load & import data into Weaviate

Specify a vectorizer

First, we must define the class objects to store the data and specify what vectorizer to use. The following will create a Question class with the given vectorizer, and add it to the schema:

Which inference API?

This tutorial uses the OpenAI API to obtain vectors. But you can use any of Cohere, Hugging Face or OpenAI inference APIs with WCS, as the relevant Weaviate modules for those are already built in by default.

Change the vectorizer setting below to point to your preferred module.

class_obj = {
"class": "Question",
"vectorizer": "text2vec-openai" # Or "text2vec-cohere" or "text2vec-huggingface"
}

client.schema.create_class(class_obj)

Weaviate will infer any further schema information from the given data. If you would like to know more, check out this tutorial which covers schemas in more detail.

If you see this error: Name 'Question' already used as a name for an Object class

You may see this error if you try to create a class that already exists in your instance of Weaviate. In this case, you can delete the class following the below instructions.

Confirm schema creation

After you have added the class to the schema, you can confirm that it has been created by visiting the schema endpoint. You can inspect the Weaviate schema here (replace the URL with your actual endpoint):

https://some-endpoint.weaviate.network/v1/schema  

You should see:

{
"classes": [
{
"class": "Question",
... // truncated additional information here
"vectorizer": "text2vec-openai"
}
]
}

Where the schema should indicate that the Question class has been added.

REST & GraphQL in Weaviate

Weaviate uses a combination of RESTful and GraphQL APIs. In Weaviate, RESTful API endpoints can be used to add data or obtain information about the Weaviate instance, and the GraphQL interface to retrieve data.


Deleting classes

See how you can delete classes.

If your Weaviate instance contains data you want removed, you can manually delete the unwanted class(es).

Deleting a class == Deleting its objects

Know that deleting a class will also delete all associated objects!

Do not do this to a production database, or anywhere where you do not wish to delete your data.

Run the code below to delete the relevant class and its objects.

import weaviate

client = weaviate.Client("https://some-endpoint.weaviate.network/") # Replace with your endpoint

# delete class "YourClassName" - THIS WILL DELETE ALL DATA IN THIS CLASS
client.schema.delete_class("YourClassName") # Replace with your class name - e.g. "Question"

Provide the API key

The API key can be provided to Weaviate as an environment variable, or in the HTTP header with every request. Here, we will add them to the Weaviate client at instantiation as shown below. It will then send the key as a part of the header with every request.

import weaviate

client = weaviate.Client(
url = "https://some-endpoint.weaviate.network/", # Replace with your endpoint
additional_headers = {
"X-OpenAI-Api-Key": "<THE-KEY>" # Replace with your API key
}
)
Not using OpenAI?

If you are not using OpenAI, change the API key parameter in the code examples from X-OpenAI-Api-Key to one relevant to your chosen inference API, such as X-Cohere-Api-Key for Cohere or X-HuggingFace-Api-Key for Hugging Face.

Load & import data

Now, we can load our dataset and import it into Weaviate. The code looks roughly like this:

# Configure a batch process
with client.batch as batch:
batch.batch_size=100
for i, d in enumerate(data):
properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}

client.batch.add_data_object(properties, "Question")

Note that we use a batch import process here for speed. You should use batch imports unless you have a good reason not to. We'll cover more on this later.

Putting it together

The following code puts it all together, taking care of everything from schema definition to data import. Remember to replace the endpoint and inference API key (and API key name if necessary).

import weaviate
import json

client = weaviate.Client(
url = "https://some-endpoint.weaviate.network/", # Replace with your endpoint
additional_headers = {
"X-OpenAI-Api-Key": "<THE-KEY>" # Replace with your API key
}
)

# ===== add schema =====
class_obj = {
"class": "Question",
"vectorizer": "text2vec-openai"
}

client.schema.create_class(class_obj)

# ===== import data =====
# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)

# Configure a batch process
with client.batch as batch:
batch.batch_size=100
# Batch import all Questions
for i, d in enumerate(data):
print(f"importing question: {i+1}")

properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}

client.batch.add_data_object(properties, "Question")

And that should have populated Weaviate with the data, including corresponding vectors!

Can I specify my own vectors?

Yes! You can bring your own vectors and pass them to Weaviate directly. See this reference for more information.

Note again that we did not provide any vectors to Weaviate. That's all managed by Weaviate, which calls the inference API for you and obtains a vector corresponding to your object at import time.

Confirm data import

To confirm successful data import, navigate to the objects endpoint to check that all objects have been imported (replace with your actual endpoint):

https://some-endpoint.weaviate.network/v1/objects

You should see:

{
"deprecations": null,
"objects": [
... // Details of each object
],
"totalResults": 10 // You should see 10 results here
}

Where you should be able to confirm that you have imported all 10 objects.

Query Weaviate

Now that you've built a database, let's try some queries.

One of the most common use cases is text similarity search. As we have a text2vec module enabled, we can use the nearText parameter for this purpose.

If you wanted to find entries which related to biology, you can apply the nearText parameter like so:

import weaviate
import json

client = weaviate.Client(
url="https://some-endpoint.weaviate.network/", # Replace with your endpoint
additional_headers={
"X-OpenAI-Api-Key": "<THE-KEY>" # Replace with your API key
}
)

nearText = {"concepts": ["biology"]}

result = (
client.query
.get("Question", ["question", "answer", "category"])
.with_near_text(nearText)
.with_limit(2)
.do()
)

print(json.dumps(result, indent=4))

Note that we use the Get function (or the relevant client implementation) to fetch objects, and the query text is specified in the concept field.

You should see something like this:

{
"data": {
"Get": {
"Question": [
{
"answer": "DNA",
"category": "SCIENCE",
"question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
},
{
"answer": "species",
"category": "SCIENCE",
"question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification"
}
]
}
}
}

Note that even though the word 'biology' does not appear anywhere, Weaviate has returned biology-related entries (on DNA and species) as the closest results. Also, it has returned these entries over and above many entries otherwise related to animals in general.

That's a simple but powerful outcome, which shows a big reason behind the popularity of vector searches. Vectorized data objects allow for searches based on degrees of similarity, such as semantic similarity of text as we did here.

Try it out yourself with different strings, by changing the string from "biology".

Recap

If you made it here - well done. We have covered a lot in just a couple of pages, and you've successfully built a fully functioning vector database! 🥳

You have:

  • Spun up an instance of Weaviate through WCS,
  • Vectorized your dataset through an inference API,
  • Populated your WCS instance with the vectorized data, and
  • Performed text similarity searches.

Of course, there is a lot more to Weaviate that we have not yet covered, and probably a lot that you wish to know about. So we include a few links below that might help you to get started in your journey with us.

Also, please feel free to reach out to us on our community Slack. We love to hear from our users.

Next

You can choose your direction from here. For example, you can:

More Resources

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For issues: GitHub. Or,
  5. Ask your question in the Slack channel: Slack.