Skip to main content

Quickstart Tutorial

LICENSEย Weaviate on Stackoverflow badgeย Weaviate issues on GitHub badgeย Weaviate version badgeย Weaviate total Docker pulls badgeย Go Report Card

Overviewโ€‹

Welcome to the Quickstart tutorial. Here, you will:

  • Create a vector database with Weaviate Cloud Services (WCS),
  • Import data, and
  • Perform a vector search
Object vectors

When you import data into Weaviate, you can optionally:

  • Have Weaviate create vectors, or
  • Specify custom vectors.

This tutorial demonstrates both methods. For the first method, we will use an inference API, and show you how you can change it.

Source dataโ€‹

We will use a (tiny) dataset from a TV quiz show ("Jeopardy!").

Take a look at the dataset
CategoryQuestionAnswer
0SCIENCEThis organ removes excess glucose from the blood & stores it as glycogenLiver
1ANIMALSIt's the only living mammal in the order ProboseideaElephant
2ANIMALSThe gavial looks very much like a crocodile except for this bodily featurethe nose or snout
3ANIMALSWeighing around a ton, the eland is the largest species of this animal in AfricaAntelope
4ANIMALSHeaviest of all poisonous snakes is this North American rattlesnakethe diamondback rattler
5SCIENCE2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classificationspecies
6SCIENCEA metal that is "ductile" can be pulled into this while cold & under pressurewire
7SCIENCEIn 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substanceDNA
8SCIENCEChanges in the tropospheric layer of this are what gives us weatherthe atmosphere
9SCIENCEIn 70-degree air, a plane traveling at about 1,130 feet per second breaks itSound barrier

Create a Weaviate instanceโ€‹

First, create a Weaviate database instance. We'll use a free instance from Weaviate Cloud Services (WCS).

  1. Go to the WCS Console, and
    1. Click on Sign in with the Weaviate Cloud Services.
    2. If you don't have a WCS account, click on Register.
  2. Sign in with your WCS username and password.
  3. Click on Create cluster.
See screenshot

To create a WCS instance:

Button to create WCS instance

Then:

  1. Select the Free sandbox plan tier.
  2. Provide a Cluster name. This plus a suffix will be your URL.
  3. Set the Enable Authentication? option to YES.
See screenshot

Your selections should look like this:

Instance configuration

Finally, click on Create. A tick โœ”๏ธ will appear (in ~2 minutes) when the instance has been created.

Make note of cluster detailsโ€‹

You will need the cluster URL, and authentication details. Click on the Details button to see them. The authentication details (Weaviate API key) can be found by clicking on the key button.

See screenshot

Your cluster details should look like this:

Instance API key location

Install a client libraryโ€‹

We recommend you use a Weaviate client library. Currently they are available for Python, TypeScript/JavaScript, Go and Java. Install your preferred client as follows:

Add weaviate-client to your Python environment with pip:

$ pip install weaviate-client

Connect to Weaviateโ€‹

Now connect to your Weaviate instance. From the Details tab in WCS, get:

  • The Weaviate instance API key, and
  • The Weaviate instance URL.

And if you want to use the inference service API to generate vectors, you must provide:

  • An additional inference API key in the header.
Choose your own vectorizer module

In this example, we use the Hugging Face inference API. But you can use others:


What if I want to use a different vectorizer module?

You can choose any vectorizer (text2vec-xxx) module for this tutorial, as long as:

  • The module is available in the Weaviate instance you are using, and
  • You have an API key (if necessary) for that module.

We use the text2vec-huggingface module in the Quickstart, but all of the following modules are available in the free sandbox.

  • text2vec-cohere
  • text2vec-huggingface
  • text2vec-openai
  • text2vec-palm

Depending on your choice, make sure to pass on the API key for the inference service by setting the header with an appropriate line from below, remembering to replace the placeholder with your actual key:

"X-Cohere-Api-Key": "YOUR-COHERE-API-KEY",  // For Cohere
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY", // For Hugging Face
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY", // For OpenAI
"X-Palm-Api-Key": "YOUR-PALM-API-KEY", // For PaLM

So, instantiate the client as follows:

import weaviate
import json

client = weaviate.Client(
url = "https://some-endpoint.weaviate.network", # Replace with your endpoint
auth_client_secret=weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace w/ your Weaviate instance API key
additional_headers = {
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY" # Replace with your inference API key
}
)

Now you are connected to your Weaviate instance.

Define a classโ€‹

Next, we need to define a data collection (a "class" in Weaviate) to store objects in.

Create a Question class with a vectorizer configured as shown below. This will allow Weaviate to convert data objects to vectors. It also includes the inference service setting to create vector embeddings. The class definition includes a suggested basic configuration for the module.

Is a vectorizer setting mandatory?
  • No. You always have the option of providing vector embeddings yourself.
  • Setting a vectorizer gives Weaviate the option of creating vector embeddings for you.
    • If you do not wish to, you can set this to none.
class_obj = {
"class": "Question",
"vectorizer": "text2vec-huggingface", # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
"moduleConfig": {
"text2vec-huggingface": {
"model": "sentence-transformers/all-MiniLM-L6-v2", # Can be any public or private Hugging Face model.
"options": {
"waitForModel": True
}
}
}
}

client.schema.create_class(class_obj)
If you are using a different vectorizer

In case you are using a different vectorizer, we also provide suggested vectorizer module configurations.

class_obj = {
"class": "Question",
"vectorizer": "text2vec-cohere",
}

Add objectsโ€‹

Now, we'll add objects using a batch import process. We will:

  • Load objects,
  • Initialize a batch process, and
  • Add objects one by one, specifying the class (in this case, Question) to add to.

We'll show both options, first using the vectorizer to create object vectors, and then providing custom vectors.

Option 1: Use the vectorizerโ€‹

The below code builds objects without any specific vector data. This will cause Weaviate to use the vectorizer in the class definition to create a vector embedding for each object.

# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)

# Configure a batch process
with client.batch(
batch_size=100
) as batch:
# Batch import all Questions
for i, d in enumerate(data):
print(f"importing question: {i+1}")

properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}

client.batch.add_data_object(
properties,
"Question",
)

Option 2: Specify custom vectorsโ€‹

Alternatively, you can also provide your own vectors to Weaviate. Regardless of whether a vectorizer is set, if a vector is specified, Weaviate will use it to represent the object.

The below example specifies pre-computed vectors with each object.

# Load data
import requests
fname = "jeopardy_tiny_with_vectors_all-MiniLM-L6-v2.json" # This file includes vectors, created using `all-MiniLM-L6-v2`
url = f'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/{fname}'
resp = requests.get(url)
data = json.loads(resp.text)

# Configure a batch process
with client.batch(
batch_size=100
) as batch:
# Batch import all Questions
for i, d in enumerate(data):
print(f"importing question: {i+1}")

properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}

custom_vector = d["vector"]
client.batch.add_data_object(
properties,
"Question",
vector=custom_vector # Add custom vector
)
Custom vectors with a vectorizer

Note that you can specify a vectorizer and still provide a custom vector. In this scenario, make sure that the vector comes from the same model as one specified in the vectorizer.


In this tutorial, they come from sentence-transformers/all-MiniLM-L6-v2 - the same as specified in the vectorizer configuration.

(Almost) Alwaus use batch imports

Batch imports provide significantly improved import performance, so you should almost always use batch imports unless you have a good reason not to, such as single object creation.

Putting it togetherโ€‹

The following code puts it all together. Try running it yourself to will import the data into your Weaviate instance.

Remember to replace the URL, Weaviate API key and inference API key
import weaviate
import json

client = weaviate.Client(
url = "https://some-endpoint.weaviate.network", # Replace with your endpoint
auth_client_secret=weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace w/ your Weaviate instance API key
additional_headers = {
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY" # Replace with your inference API key
}
)

# ===== add schema =====
class_obj = {
"class": "Question",
"vectorizer": "text2vec-huggingface", # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
"moduleConfig": {
"text2vec-huggingface": {
"model": "sentence-transformers/all-MiniLM-L6-v2", # Can be any public or private Hugging Face model.
"options": {
"waitForModel": True
}
}
}
}

client.schema.create_class(class_obj)

# ===== import data =====
# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)

# Configure a batch process
with client.batch(
batch_size=100
) as batch:
# Batch import all Questions
for i, d in enumerate(data):
print(f"importing question: {i+1}")

properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}

client.batch.add_data_object(
properties,
"Question",
)

Congratulations, you've successfully built a vector database!

Query Weaviateโ€‹

Now, we can run queries.

As we have a text2vec module enabled, Weaviate can perform text-based (nearText) similarity searches.

Try the nearText search shown below, looking for quiz objects related to biology.

import weaviate
import json

client = weaviate.Client(
url = "https://some-endpoint.weaviate.network", # Replace with your endpoint
auth_client_secret=weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace w/ your Weaviate instance API key
additional_headers = {
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY" # Replace with your inference API key
}
)

nearText = {"concepts": ["biology"]}

response = (
client.query
.get("Question", ["question", "answer", "category"])
.with_near_text(nearText)
.with_limit(2)
.do()
)

print(json.dumps(response, indent=4))

You should see a result like this (may vary depending on the model used):

{
"data": {
"Get": {
"Question": [
{
"answer": "DNA",
"category": "SCIENCE",
"question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
},
{
"answer": "Liver",
"category": "SCIENCE",
"question": "This organ removes excess glucose from the blood & stores it as glycogen"
}
]
}
}
}

See that even though the word biology does not appear anywhere, Weaviate returns biology-related entries.

This example shows why vector searches are powerful. Vectorized data objects allow for searches based on degrees of similarity, as shown here.

Recapโ€‹

Well done. You have:

  • Created your own cloud-based vector database with Weaviate,
  • Populated it with data objects,
    • Using an inference API, or
    • Using custom vectors, and
  • Performed a text similarity search.

Where next is up to you. We include a few links below - or you can check out the sidebar.

Note: Sandbox expiry & options
Sandbox expiry

The sandbox is free, but it will expire after 14 days. After this time, all data in the sandbox will be deleted.

If you would like to preserve your sandbox data, you can retrieve your data, or contact us to upgrade to a production SaaS instance.

Troubleshootingโ€‹

We provide answers to some common questions, or potential issues below.

Confirm class creationโ€‹

If you are not sure whether the class has been created, you can confirm it by visiting the schema endpoint here (replace the URL with your actual endpoint):

https://some-endpoint.weaviate.network/v1/schema
Expected response

You should see:

{
"classes": [
{
"class": "Question",
... // truncated additional information here
"vectorizer": "text2vec-huggingface"
}
]
}

Where the schema should indicate that the Question class has been added.

REST & GraphQL in Weaviate

Weaviate uses a combination of RESTful and GraphQL APIs. In Weaviate, RESTful API endpoints can be used to add data or obtain information about the Weaviate instance, and the GraphQL interface to retrieve data.

If you see Error: Name 'Question' already used as a name for an Object classโ€‹

You may see this error if you try to create a class that already exists in your instance of Weaviate. In this case, you can delete the class following the below instructions.

If your Weaviate instance contains data you want removed, you can manually delete the unwanted class(es).

Deleting a class == Deleting its objects

Know that deleting a class will also delete all associated objects!

Do not do this to a production database, or anywhere where you do not wish to delete your data.

Run the code below to delete the relevant class and its objects.

# delete class "YourClassName" - THIS WILL DELETE ALL DATA IN THIS CLASS
client.schema.delete_class("YourClassName") # Replace with your class name - e.g. "Question"

Confirm data importโ€‹

To confirm successful data import, navigate to the objects endpoint to check that all objects have been imported (replace with your actual endpoint):

https://some-endpoint.weaviate.network/v1/objects

You should see:

{
"deprecations": null,
"objects": [
... // Details of each object
],
"totalResults": 10 // You should see 10 results here
}

Where you should be able to confirm that you have imported all 10 objects.

Nextโ€‹

You can choose your direction from here. For example, you can:

More Resourcesโ€‹

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For more involved discussion: Weaviate Community Forum. Or,
  5. We also have a Slack channel.