Skip to main content

Quickstart Tutorial

Overview

Welcome to the Quickstart guide for Weaviate, an open-source vector database. This tutorial is intended to be a hands-on introduction to Weaviate.

This Quickstart takes about 20 minutes to complete. It introduces some common tasks:

  • Build a Weaviate vector database.
  • Make a semantic search query.
  • Add a filter to your query.
  • Use generative searches and a large language model (LLM) to transform your search results.

Object vectors

Vectors are mathematical representations of data objects, which enable similarity-based searches in vector databases like Weaviate.

With Weaviate, you have options to:

  • Have Weaviate create vectors for you, or
  • Specify custom vectors.

This tutorial demonstrates having Weaviate create vectors with a vectorizer. For a tutorial on using custom vectors, see this tutorial.

Source data

We will use a (tiny) dataset of quizzes.

See the dataset

The data comes from a TV quiz show ("Jeopardy!")

CategoryQuestionAnswer
0SCIENCEThis organ removes excess glucose from the blood & stores it as glycogenLiver
1ANIMALSIt's the only living mammal in the order ProboseideaElephant
2ANIMALSThe gavial looks very much like a crocodile except for this bodily featurethe nose or snout
3ANIMALSWeighing around a ton, the eland is the largest species of this animal in AfricaAntelope
4ANIMALSHeaviest of all poisonous snakes is this North American rattlesnakethe diamondback rattler
5SCIENCE2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classificationspecies
6SCIENCEA metal that is "ductile" can be pulled into this while cold & under pressurewire
7SCIENCEIn 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substanceDNA
8SCIENCEChanges in the tropospheric layer of this are what gives us weatherthe atmosphere
9SCIENCEIn 70-degree air, a plane traveling at about 1,130 feet per second breaks itSound barrier
For Python users

Try it directly on Google Colab (or go to the file).

Step 1: Create a Weaviate database

You need a Weaviate instance to work with. We recommend creating a free cloud sandbox instance on Weaviate Cloud (WCD).

  • Go to the WCD quickstart and follow the instructions to create a sandbox instance.
  • Get the API key and URL from the Details tab in WCD.
  • Come back here to continue this Quickstart.
Alternative Weaviate instances

If you prefer to use a different Weaviate instance, see Can I use a different deployment method.

Step 2: Install a client library

Install the Weaviate client library for your preferred programming language.

To install the library, run the installation code for your language:

Install client libraries

Add weaviate-client to your Python environment with pip. The v4 client requires Weaviate 1.23 or higher.

pip install -U weaviate-client

Step 3: Connect to Weaviate

To connect to your Weaviate instance, you need the instance connection details and a client to connect with.

Connection details

Gather the following information:

  • The Weaviate URL (get it from WCD Details tab)
Compare URLs
  • The Weaviate API key (Get it from the instance Details)
  • An OpenAI inference API key (Sign up at OpenAI)

Client connection code

This sample connection code creates a client object. You can re-use the client object to connect to your Weaviate instance as you work through this tutorial.

Copy the code to a file called quickstart. Add the appropriate extension for your programming language, and run the file to connect to Weaviate.

import weaviate
import weaviate.classes as wvc
import os
import requests
import json

# Best practice: store your credentials in environment variables
wcd_url = os.environ["WCD_DEMO_URL"]
wcd_api_key = os.environ["WCD_DEMO_RO_KEY"]
openai_api_key = os.environ["OPENAI_APIKEY"]

client = weaviate.connect_to_weaviate_cloud(
cluster_url=wcd_url, # Replace with your Weaviate Cloud URL
auth_credentials=wvc.init.Auth.api_key(wcd_api_key), # Replace with your Weaviate Cloud key
headers={"X-OpenAI-Api-Key": openai_api_key} # Replace with appropriate header key/value pair for the required API
)

try:
pass # Replace with your code. Close client gracefully in the finally block.

finally:
client.close() # Close client gracefully

Step 4: Define a data collection

Next, we define a data collection (a "collection" in Weaviate) to store objects in. This is analogous to creating a table in relational (SQL) databases.

The following code:

  • Configures a collection object with:
  • Then creates the collection.

Run it to create the collection in your Weaviate instance.

    questions = client.collections.create(
name="Question",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(), # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
generative_config=wvc.config.Configure.Generative.openai() # Ensure the `generative-openai` module is used for generative queries
)
Change the vectorizer or generator integrations

If you prefer to use a different setup, see this section.

Now you are ready to add objects to Weaviate.

Step 5: Add objects

You can now add objects to Weaviate. You will be using a batch import (read more) process for maximum efficiency.

The guide covers using the vectorizer defined for the collection to create a vector embedding for each object. You may have to add the API key for your vectorizer.

    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text) # Load data

question_objs = list()
for i, d in enumerate(data):
question_objs.append({
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
})

questions = client.collections.get("Question")
questions.data.insert_many(question_objs)

The above code:

  • Loads objects, and
  • Adds objects to the target collection (Question) one by one.

Partial recap

The following code puts the above steps together.

If you have not been following along with the snippets, run the code block below. This will let you run queries in the next section.

End-to-end code
Remember to replace the URL, Weaviate API key and inference API key
import weaviate
import weaviate.classes as wvc
import os
import requests
import json

# Best practice: store your credentials in environment variables
wcd_url = os.environ["WCD_DEMO_URL"]
wcd_api_key = os.environ["WCD_DEMO_RO_KEY"]
openai_api_key = os.environ["OPENAI_APIKEY"]

client = weaviate.connect_to_weaviate_cloud(
cluster_url=wcd_url, # Replace with your Weaviate Cloud URL
auth_credentials=wvc.init.Auth.api_key(wcd_api_key), # Replace with your Weaviate Cloud key
headers={"X-OpenAI-Api-Key": openai_api_key} # Replace with appropriate header key/value pair for the required API
)

try:
pass # Replace with your code. Close client gracefully in the finally block.
# ===== define collection =====
questions = client.collections.create(
name="Question",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(), # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
generative_config=wvc.config.Configure.Generative.openai() # Ensure the `generative-openai` module is used for generative queries
)

# ===== import data =====
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text) # Load data

question_objs = list()
for i, d in enumerate(data):
question_objs.append({
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
})

questions = client.collections.get("Question")
questions.data.insert_many(question_objs)


finally:
client.close() # Close client gracefully

Step 6: Queries

Now, let's run some queries on your Weaviate instance. Weaviate powers many different types of searches. We will try a few here.

Let's start with a similarity search. A nearText search looks for objects in Weaviate whose vectors are most similar to the vector for the given input text.

Run the following code to search for objects whose vectors are most similar to that of biology.

import weaviate
import weaviate.classes as wvc
import os

# Best practice: store your credentials in environment variables
wcd_url = os.environ["WCD_DEMO_URL"]
wcd_api_key = os.environ["WCD_DEMO_RO_KEY"]
openai_api_key = os.environ["OPENAI_APIKEY"]

client = weaviate.connect_to_weaviate_cloud(
cluster_url=wcd_url, # Replace with your Weaviate Cloud URL
auth_credentials=wvc.init.Auth.api_key(wcd_api_key), # Replace with your Weaviate Cloud key
headers={"X-OpenAI-Api-Key": openai_api_key} # Replace with appropriate header key/value pair for the required API
)

try:
pass # Replace with your code. Close client gracefully in the finally block.
questions = client.collections.get("Question")

response = questions.query.near_text(
query="biology",
limit=2
)

print(response.objects[0].properties) # Inspect the first object

finally:
client.close() # Close client gracefully

You should see results like this:

{
"data": {
"Get": {
"Question": [
{
"answer": "DNA",
"category": "SCIENCE",
"question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
},
{
"answer": "Liver",
"category": "SCIENCE",
"question": "This organ removes excess glucose from the blood & stores it as glycogen"
}
]
}
}
}

The response includes a list of objects whose vectors are most similar to the word biology. The top 2 results are returned here as we have set a limit to 2.

Why is this useful?

Notice that even though the word biology does not appear anywhere, Weaviate returns biology-related entries.

This example shows why vector searches are powerful. Vectorized data objects allow for searches based on degrees of similarity, as shown here.

Semantic search with a filter

You can add Boolean filters to searches. For example, the above search can be modified to only in objects that have a "category" value of "ANIMALS". Run the following code to see the results:

    questions = client.collections.get("Question")

response = questions.query.near_text(
query="biology",
limit=2,
filters=wvc.query.Filter.by_property("category").equal("ANIMALS")
)

print(response.objects[0].properties) # Inspect the first object

You should see results like this:

{
"data": {
"Get": {
"Question": [
{
"answer": "Elephant",
"category": "ANIMALS",
"question": "It's the only living mammal in the order Proboseidea"
},
{
"answer": "the nose or snout",
"category": "ANIMALS",
"question": "The gavial looks very much like a crocodile except for this bodily feature"
}
]
}
}
}

The results are limited to objects from the ANIMALS category.

Why is this useful?

Using a Boolean filter allows you to combine the flexibility of vector search with the precision of where filters.

Generative search (single prompt)

Next, let's try a generative search. A generative search, also called retrieval augmented generation, prompts a large language model (LLM) with a combination of a user query as well as data retrieved from a database.

To see what happens when an LLM uses query results to perform a task that is based on our prompt, run the code below.

Note that the code uses a single prompt query, which asks the model generate an answer for each retrieved database object.

    questions = client.collections.get("Question")

response = questions.generate.near_text(
query="biology",
limit=2,
single_prompt="Explain {answer} as you might to a five-year-old."
)

print(response.objects[0].generated) # Inspect the generated text

You should see results similar to this:

{
"data": {
"Get": {
"Question": [
{
"_additional": {
"generate": {
"error": null,
"singleResult": "DNA is like a special code that tells our bodies how to grow and work. It's like a recipe book that has all the instructions for making you who you are. Just like a recipe book has different recipes for different foods, DNA has different instructions for making different parts of your body, like your eyes, hair, and even your personality! It's really amazing because it's what makes you unique and special."
}
},
"answer": "DNA",
"category": "SCIENCE",
"question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
},
{
"_additional": {
"generate": {
"error": null,
"singleResult": "Well, a species is a group of living things that are similar to each other in many ways. They have the same kind of body parts, like legs or wings, and they can have babies with other members of their species. For example, dogs are a species, and so are cats. They look different and act differently, but all dogs can have puppies with other dogs, and all cats can have kittens with other cats. So, a species is like a big family of animals or plants that are all related to each other in a special way."
}
},
"answer": "species",
"category": "SCIENCE",
"question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification"
}
]
}
}
}

We see that Weaviate has retrieved the same results as before. But now it includes an additional, generated text with a plain-language explanation of each answer.

Generative search (grouped task)

The next example uses a grouped task prompt instead to combine all search results and send them to the LLM with a prompt.

To ask the LLM to write a tweet about these search results, run the following code.

    questions = client.collections.get("Question")

response = questions.generate.near_text(
query="biology",
limit=2,
grouped_task="Write a tweet with emojis about these facts."
)

print(response.generated) # Inspect the generated text

The first returned object will include the generated text. Here's one that we got:

🧬 In 1953, Watson & Crick 🧪 built a model of the molecular structure of DNA, the gene-carrying substance! 🧬

🐦🔍 2000 news: The Gunnison sage grouse isn't just another northern sage grouse, but a new species of its own! 🆕🐔 #ScienceFacts
Why is this useful?

Generative search sends retrieved data from Weaviate to a large language model, or LLM. This allows you to go beyond simple data retrieval, but transform the data into a more useful form, without ever leaving Weaviate.


Recap

Well done! You have:

  • Created your own cloud-based vector database with Weaviate
  • Populated it with data objects using an inference API
  • Performed searches, including:
    • Semantic search
    • Semantic search with a filter
    • Generative search

Where next is up to you. We include a few links below - or you can check out the sidebar.


Next

You can do much more with Weaviate. We suggest trying one of these:

For more holistic learning, try Weaviate Academy. We have built free courses for you to learn about Weaviate and the world of vector search.

You can also try a larger, 1,000 row version of the Jeopardy! dataset, or this tiny set of 50 wine reviews.


FAQs & Troubleshooting

We provide answers to some common questions, or potential issues below.

Questions

Can I use a different deployment method?

See answer

Yes, you can use any method listed on our installation options sections.


Using Docker Compose may be a convenient option for many. To do so:

  1. Save this Docker Compose file as docker-compose.yml,
---
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.26.6
ports:
- 8080:8080
- 50051:50051
restart: on-failure:0
environment:
OPENAI_APIKEY: $OPENAI_APIKEY
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
ENABLE_MODULES: 'text2vec-openai,generative-openai'
CLUSTER_HOSTNAME: 'node1'
...
  1. Run docker compose up -d from the location of your docker-compose.yml file, and then
  2. Connect to Weaviate at http://localhost:8080.

If you are using this Docker Compose file, Weaviate will not require API-key authentication. So your connection code will change to:

import weaviate
import json

client = weaviate.Client(
url = "http://localhost:8080", # Replace with your Weaviate endpoint
additional_headers = {
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY" # Replace with your inference API key
}
)

Can I use different integrations?

See answer

In this example, we use the OpenAI inference API. But you can use others.

If you do want to change the embeddings, or the generative AI integrations, you can. You will need to:

  • Ensure that the Weaviate module is available in the Weaviate instance you are using,
  • Modify your collection definition to use your preferred integration, and
  • Make sure to use the right API key(s) (if necessary) for your integration.

Please see the model providers integration section for more information.

Is a vectorizer setting mandatory?

See answer
  • No. You always have the option of providing vector embeddings yourself.
  • Setting a vectorizer gives Weaviate the option of creating vector embeddings for you.
    • If you do not wish to, you can set this to none.

What is a sandbox, exactly?

Note: Sandbox expiry & options
Sandbox expiration

The sandbox is free for 14 days. After 14 days, the sandbox expires and all data is deleted.

To retrieve a copy of your sandbox data before it is deleted, use the cursor API.

To preserve your data and upgrade to a paid instance, contact us for help.

Troubleshooting

If you see Error: Name 'Question' already used as a name for an Object class

See answer

You may see this error if you try to create a collection that already exists in your instance of Weaviate. In this case, you can follow these instructions to delete the collection.

You can delete any unwanted collection(s), along with the data that they contain.

Deleting a collection also deletes its objects

When you delete a collection, you delete all associated objects!

Be very careful with deletes on a production database and anywhere else that you have important data.

This code deletes a collection and its objects.

# delete collection "Article" - THIS WILL DELETE THE COLLECTION AND ALL ITS DATA
client.collections.delete("Article") # Replace with your collection name

How to confirm collection creation

See answer

If you are not sure whether the collection has been created, check the schema endpoint.

Replace WEAVIATE_INSTANCE_URL with your instance URL.:

https://WEAVIATE_INSTANCE_URL/v1/schema

You should see:

{
"classes": [
{
"class": "Question",
... // truncated additional information here
"vectorizer": "text2vec-openai"
}
]
}

Where the schema should indicate that the Question collection has been added.

REST & GraphQL in Weaviate

Weaviate uses a combination of RESTful and GraphQL APIs. In Weaviate, RESTful API endpoints can be used to add data or obtain information about the Weaviate instance, and the GraphQL interface to retrieve data.

How to confirm data import

See answer

To confirm successful data import, check the objects endpoint to verify that all objects are imported.

Replace WEAVIATE_INSTANCE_URL with your instance URL:

https://WEAVIATE_INSTANCE_URL/v1/objects

You should see:

{
"deprecations": null,
"objects": [
... // Details of each object
],
"totalResults": 10 // You should see 10 results here
}

Where you should be able to confirm that you have imported all 10 objects.

If the nearText search is not working

See answer

To perform text-based (nearText) similarity searches, you need to have a vectorizer enabled, and configured in your collection.

Make sure the vectorizer is configured like this.

If the search still doesn't work, contact us!

Questions and feedback

If you have any questions or feedback, let us know in the user forum.