Quickstart Tutorial
Overviewโ
Welcome to the Quickstart tutorial. Here, you will:
- Create a vector database with Weaviate Cloud Services (WCS),
- Import data, and
- Perform a vector search
When you import data into Weaviate, you can optionally:
- Have Weaviate create vectors, or
- Specify custom vectors.
This tutorial demonstrates both methods. For the first method, we will use an inference API, and show you how you can change it.
Source dataโ
We will use a (tiny) dataset from a TV quiz show ("Jeopardy!").
Take a look at the dataset
Category | Question | Answer | |
---|---|---|---|
0 | SCIENCE | This organ removes excess glucose from the blood & stores it as glycogen | Liver |
1 | ANIMALS | It's the only living mammal in the order Proboseidea | Elephant |
2 | ANIMALS | The gavial looks very much like a crocodile except for this bodily feature | the nose or snout |
3 | ANIMALS | Weighing around a ton, the eland is the largest species of this animal in Africa | Antelope |
4 | ANIMALS | Heaviest of all poisonous snakes is this North American rattlesnake | the diamondback rattler |
5 | SCIENCE | 2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification | species |
6 | SCIENCE | A metal that is "ductile" can be pulled into this while cold & under pressure | wire |
7 | SCIENCE | In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance | DNA |
8 | SCIENCE | Changes in the tropospheric layer of this are what gives us weather | the atmosphere |
9 | SCIENCE | In 70-degree air, a plane traveling at about 1,130 feet per second breaks it | Sound barrier |
Create a Weaviate instanceโ
First, create a Weaviate database instance. We'll use a free instance from Weaviate Cloud Services (WCS).
- Go to the WCS Console, and
- Click on Sign in with the Weaviate Cloud Services.
- If you don't have a WCS account, click on Register.
- Sign in with your WCS username and password.
- Click on Create cluster.
See screenshot
To create a WCS instance:

Then:
- Select the Free sandbox plan tier.
- Provide a Cluster name. This plus a suffix will be your URL.
- Set the Enable Authentication? option to YES.
See screenshot
Your selections should look like this:

Finally, click on Create. A tick โ๏ธ will appear (in ~2 minutes) when the instance has been created.
Make note of cluster detailsโ
You will need the cluster URL, and authentication details. Click on the Details button to see them. The authentication details (Weaviate API key) can be found by clicking on the key button.
See screenshot
Your cluster details should look like this:

Install a client libraryโ
We recommend you use a Weaviate client library. Currently they are available for Python, TypeScript/JavaScript, Go and Java. Install your preferred client as follows:
- Python
- TypeScript/JavaScript
- Go
- Java
Add weaviate-client
to your Python environment with pip
:
$ pip install weaviate-client
Add weaviate-ts-client
to your project with npm
:
npm install weaviate-ts-client
Add weaviate-go-client
to your project with go get
:
go get github.com/semi-technologies/weaviate-go-client/v4
Add this dependency to your project:
<dependency>
<groupId>io.weaviate</groupId>
<artifactId>client</artifactId>
<version>4.0.0</version> <!-- Check latest version -->
</dependency>
Connect to Weaviateโ
Now connect to your Weaviate instance. From the Details tab in WCS, get:
- The Weaviate instance API key, and
- The Weaviate instance URL.
And if you want to use the inference service API to generate vectors, you must provide:
- An additional inference API key in the header.
In this example, we use the Hugging Face
inference API. But you can use others:
What if I want to use a different vectorizer module?
You can choose any vectorizer (text2vec-xxx
) module for this tutorial, as long as:
- The module is available in the Weaviate instance you are using, and
- You have an API key (if necessary) for that module.
We use the text2vec-huggingface
module in the Quickstart, but all of the following modules are available in the free sandbox.
text2vec-cohere
text2vec-huggingface
text2vec-openai
text2vec-palm
Depending on your choice, make sure to pass on the API key for the inference service by setting the header with an appropriate line from below, remembering to replace the placeholder with your actual key:
"X-Cohere-Api-Key": "YOUR-COHERE-API-KEY", // For Cohere
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY", // For Hugging Face
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY", // For OpenAI
"X-Palm-Api-Key": "YOUR-PALM-API-KEY", // For PaLM
So, instantiate the client as follows:
- Python
- TypeScript
- Go
- Curl
import weaviate
import json
client = weaviate.Client(
url = "https://some-endpoint.weaviate.network", # Replace with your endpoint
auth_client_secret=weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace w/ your Weaviate instance API key
additional_headers = {
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY" # Replace with your inference API key
}
)
import weaviate, { WeaviateClient, ObjectsBatcher, ApiKey } from 'weaviate-ts-client';
import fetch from 'node-fetch';
const client: WeaviateClient = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
apiKey: new ApiKey('YOUR-WEAVIATE-API-KEY'), // Replace w/ your Weaviate instance API key
headers: {'X-HuggingFace-Api-Key': 'YOUR-HUGGINGFACE-API-KEY'}, // Replace with your inference API key
});
cfg := weaviate.Config{
Host: "some-endpoint.weaviate.network/", // Replace with your endpoint
Scheme: "https",
AuthConfig: auth.ApiKey{Value: "YOUR-WEAVIATE-API-KEY"}, // Replace w/ your Weaviate instance API key
Headers: map[string]string{
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY", // Replace with your inference API key
},
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
classObj := &models.Class{
Class: "Question",
Vectorizer: "text2vec-huggingface",
}
if client.Schema().ClassCreator().WithClass(classObj).Do(context.Background()) != nil {
panic(err)
}
- With
curl
, add the API key to the header as shown below:
echo '{
"query": "<QUERY>"
}' | curl \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR-WEAVIATE-API-KEY" \
-H "X-HuggingFace-Api-Key: YOUR-HUGGINGFACE-API-KEY" \
-d @- \
https://some-endpoint.weaviate.network/v1/graphql
Now you are connected to your Weaviate instance.
Define a classโ
Next, we need to define a data collection (a "class" in Weaviate) to store objects in.
Create a Question
class with a vectorizer
configured as shown below. This will allow Weaviate to convert data objects to vectors. It also includes the inference service setting to create vector embeddings. The class definition includes a suggested basic configuration for the module.
vectorizer
setting mandatory?- No. You always have the option of providing vector embeddings yourself.
- Setting a
vectorizer
gives Weaviate the option of creating vector embeddings for you.- If you do not wish to, you can set this to
none
.
- If you do not wish to, you can set this to
- Python
- TypeScript
- Go
- Curl
class_obj = {
"class": "Question",
"vectorizer": "text2vec-huggingface", # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
"moduleConfig": {
"text2vec-huggingface": {
"model": "sentence-transformers/all-MiniLM-L6-v2", # Can be any public or private Hugging Face model.
"options": {
"waitForModel": True
}
}
}
}
client.schema.create_class(class_obj)
let classObj = {
'class': 'Question',
'vectorizer': 'text2vec-huggingface', // If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
'moduleConfig': {
'text2vec-huggingface': {
'model': 'sentence-transformers/all-MiniLM-L6-v2', // Can be any public or private Hugging Face model.
'options': {
'waitForModel': true
}
}
}
}
async function addSchema() {
const res = await client.schema.classCreator().withClass(classObj).do();
console.log(res);
}
await addSchema();
classObj := &models.Class{
Class: "Question",
Vectorizer: "text2vec-huggingface", // If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
ModuleConfig: map[string]interface{}{
"text2vec-huggingface": map[string]interface{}{
"model": "sentence-transformers/paraphrase-MiniLM-L6-v2",
"options": {
"waitForModel": true
}
},
},
}
// add the schema
err := client.Schema().ClassCreator().WithClass(classObj).Do(context.Background())
if err != nil {
panic(err)
}
echo '{
"class": "Question",
"vectorizer": "text2vec-huggingface",
"moduleConfig": {
"text2vec-huggingface": {
"model": "sentence-transformers/all-MiniLM-L6-v2",
"options": {
"waitForModel": true
}
}
}
}' | curl \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-d @- \
https://some-endpoint.weaviate.network/v1/schema
If you are using a different vectorizer
In case you are using a different vectorizer, we also provide suggested vectorizer
module configurations.
- Cohere
- Hugging Face
- OpenAI
- PaLM
class_obj = {
"class": "Question",
"vectorizer": "text2vec-cohere",
}
class_obj = {
"class": "Question",
"vectorizer": "text2vec-huggingface",
"moduleConfig": {
"text2vec-huggingface": {
"model": "sentence-transformers/all-MiniLM-L6-v2", // Can be any public or private Hugging Face model.
"options": {
"waitForModel": true, // Try this if you get a "model not ready" error
}
}
}
}
class_obj = {
"class": "Question",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text"
}
}
}
class_obj = {
"class": "Question",
"vectorizer": "text2vec-palm",
"moduleConfig": {
"text2vec-palm": {
"projectId": "YOUR-GOOGLE-CLOUD-PROJECT-ID", // Required. Replace with your value: (e.g. "cloud-large-language-models")
"apiEndpoint": "YOUR-API-ENDPOINT", // Optional. Defaults to "us-central1-aiplatform.googleapis.com".
"modelId": "YOUR-GOOGLE-CLOUD-MODEL-ID", // Optional. Defaults to "textembedding-gecko".
},
}
}
Add objectsโ
Now, we'll add objects using a batch import process. We will:
- Load objects,
- Initialize a batch process, and
- Add objects one by one, specifying the class (in this case,
Question
) to add to.
We'll show both options, first using the vectorizer
to create object vectors, and then providing custom vectors.
Option 1: Use the vectorizer
โ
The below code builds objects without any specific vector data. This will cause Weaviate to use the vectorizer
in the class definition to create a vector embedding for each object.
- Python
- TypeScript
- Go
- Curl
# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)
# Configure a batch process
with client.batch(
batch_size=100
) as batch:
# Batch import all Questions
for i, d in enumerate(data):
print(f"importing question: {i+1}")
properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}
client.batch.add_data_object(
properties,
"Question",
)
async function getJsonData() {
const file = await fetch('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json');
return file.json();
}
async function importQuestions() {
// Get the questions directly from the URL
const data = await getJsonData();
// Prepare a batcher
let batcher: ObjectsBatcher = client.batch.objectsBatcher();
let counter = 0;
let batchSize = 100;
for (const question of data) {
// Construct an object with a class and properties 'answer' and 'question'
const obj = {
class: 'Question',
properties: {
answer: question.Answer,
question: question.Question,
category: question.Category,
},
}
// add the object to the batch queue
batcher = batcher.withObject(obj);
// When the batch counter reaches batchSize, push the objects to Weaviate
if (counter++ == batchSize) {
// flush the batch queue
const res = await batcher.do();
console.log(res);
// restart the batch queue
counter = 0;
batcher = client.batch.objectsBatcher();
}
}
// Flush the remaining objects
const res = await batcher.do();
console.log(res);
}
await importQuestions();
async function getJsonData() {
const fname = 'jeopardy_tiny_with_vectors_all-MiniLM-L6-v2.json';
const url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/' + fname
const file = await fetch(url);
return file.json();
}
async function importQuestions() {
// Get the questions directly from the URL
const data = await getJsonData(); // Each question object here would include vector data
// Prepare a batcher
let batcher: ObjectsBatcher = client.batch.objectsBatcher();
let counter = 0;
let batchSize = 100;
for (const question of data) {
// Construct an object with a class and properties 'answer' and 'question'
const obj = {
class: 'Question',
properties: {
answer: question.Answer,
question: question.Question,
category: question.Category,
},
vector: question.vector // Add the vector data to the object,
}
// add the object to the batch queue
batcher = batcher.withObject(obj);
// When the batch counter reaches batchSize, push the objects to Weaviate
if (counter++ == batchSize) {
// flush the batch queue
const res = await batcher.do();
console.log(res);
// restart the batch queue
counter = 0;
batcher = client.batch.objectsBatcher();
}
}
// Flush the remaining objects
const res = await batcher.do();
console.log(res);
}
// Retrieve the data
data, err := http.DefaultClient.Get("https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json")
if err != nil {
panic(err)
}
defer data.Body.Close()
// Decode the data
var items []map[string]string
if err := json.NewDecoder(data.Body).Decode(&items); err != nil {
panic(err)
}
// convert items into a slice of models.Object
objects := make([]*models.Object, len(items))
for i := range items {
objects[i] = &models.Object{
Class: "Question",
Properties: map[string]any{
"category": items[i]["Category"],
"question": items[i]["Question"],
"answer": items[i]["Answer"],
},
}
}
// batch write items
batchRes, err := client.Batch().ObjectsBatcher().WithObjects(objects...).Do(context.Background())
if err != nil {
panic(err)
}
for _, res := range batchRes {
if res.Result.Errors != nil {
panic("batch load failed: %v", res.Result.Errors.Error)
}
}
# Replace with your endpoint
API_URL="http://some-endpoint.weaviate.network/v1/batch/objects"
# Replace with your Inference API token
OPENAI_API_TOKEN="<OpenAI-API-Token>"
# Set batch size
BATCH_SIZE=100
# Read the JSON file and loop through its entries
lines_processed=0
batch_data="{\"objects\": ["
cat jeopardy_tiny.json | jq -c '.[]' | while read line; do
# Concatenate lines
line=$(echo "$line" | jq "{class: \"Question\", properties: {answer: .Answer, question: .Question, category: .Category}}")
if [ $lines_processed -eq 0 ]; then
batch_data+=$line
else
batch_data+=",$line"
fi
lines_processed=$((lines_processed + 1))
# If the batch is full, send it to the API using curl
if [ $lines_processed -eq $BATCH_SIZE ]; then
batch_data+="]}"
curl -X POST "$API_URL" \
-H "Content-Type: application/json" \
-H "X-OpenAI-Api-Key: $OPENAI_API_TOKEN" \
-d "$batch_data"
echo "" # Print a newline for better output formatting
# Reset the batch data and counter
lines_processed=0
batch_data="{\"objects\": ["
fi
done
# Send the remaining data (if any) to the API using curl
if [ $lines_processed -ne 0 ]; then
batch_data+="]}"
curl -X POST "$API_URL" \
-H "Content-Type: application/json" \
-H "X-OpenAI-Api-Key: $OPENAI_API_TOKEN" \
-d "$batch_data"
echo "" # Print a newline for better output formatting
fi
Option 2: Specify custom vector
sโ
Alternatively, you can also provide your own vectors to Weaviate. Regardless of whether a vectorizer
is set, if a vector is specified, Weaviate will use it to represent the object.
The below example specifies pre-computed vectors with each object.
- Python
- TypeScript
# Load data
import requests
fname = "jeopardy_tiny_with_vectors_all-MiniLM-L6-v2.json" # This file includes vectors, created using `all-MiniLM-L6-v2`
url = f'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/{fname}'
resp = requests.get(url)
data = json.loads(resp.text)
# Configure a batch process
with client.batch(
batch_size=100
) as batch:
# Batch import all Questions
for i, d in enumerate(data):
print(f"importing question: {i+1}")
properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}
custom_vector = d["vector"]
client.batch.add_data_object(
properties,
"Question",
vector=custom_vector # Add custom vector
)
async function getJsonData() {
const fname = 'jeopardy_tiny_with_vectors_all-MiniLM-L6-v2.json';
const url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/' + fname
const file = await fetch(url);
return file.json();
}
async function importQuestions() {
// Get the questions directly from the URL
const data = await getJsonData(); // Each question object here would include vector data
// Prepare a batcher
let batcher: ObjectsBatcher = client.batch.objectsBatcher();
let counter = 0;
let batchSize = 100;
for (const question of data) {
// Construct an object with a class and properties 'answer' and 'question'
const obj = {
class: 'Question',
properties: {
answer: question.Answer,
question: question.Question,
category: question.Category,
},
vector: question.vector // Add the vector data to the object,
}
// add the object to the batch queue
batcher = batcher.withObject(obj);
// When the batch counter reaches batchSize, push the objects to Weaviate
if (counter++ == batchSize) {
// flush the batch queue
const res = await batcher.do();
console.log(res);
// restart the batch queue
counter = 0;
batcher = client.batch.objectsBatcher();
}
}
// Flush the remaining objects
const res = await batcher.do();
console.log(res);
}
Custom vectors with a vectorizer
Note that you can specify a vectorizer
and still provide a custom vector. In this scenario, make sure that the vector comes from the same model as one specified in the vectorizer
.
In this tutorial, they come from sentence-transformers/all-MiniLM-L6-v2
- the same as specified in the vectorizer configuration.
Batch imports provide significantly improved import performance, so you should almost always use batch imports unless you have a good reason not to, such as single object creation.
Putting it togetherโ
The following code puts it all together. Try running it yourself to will import the data into your Weaviate instance.
- Python
- TypeScript
- Go
- Curl
import weaviate
import json
client = weaviate.Client(
url = "https://some-endpoint.weaviate.network", # Replace with your endpoint
auth_client_secret=weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace w/ your Weaviate instance API key
additional_headers = {
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY" # Replace with your inference API key
}
)
# ===== add schema =====
class_obj = {
"class": "Question",
"vectorizer": "text2vec-huggingface", # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
"moduleConfig": {
"text2vec-huggingface": {
"model": "sentence-transformers/all-MiniLM-L6-v2", # Can be any public or private Hugging Face model.
"options": {
"waitForModel": True
}
}
}
}
client.schema.create_class(class_obj)
# ===== import data =====
# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)
# Configure a batch process
with client.batch(
batch_size=100
) as batch:
# Batch import all Questions
for i, d in enumerate(data):
print(f"importing question: {i+1}")
properties = {
"answer": d["Answer"],
"question": d["Question"],
"category": d["Category"],
}
client.batch.add_data_object(
properties,
"Question",
)
import weaviate, { WeaviateClient, ObjectsBatcher, ApiKey } from 'weaviate-ts-client';
import fetch from 'node-fetch';
const client: WeaviateClient = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
apiKey: new ApiKey('YOUR-WEAVIATE-API-KEY'), // Replace w/ your Weaviate instance API key
headers: {'X-HuggingFace-Api-Key': 'YOUR-HUGGINGFACE-API-KEY'}, // Replace with your inference API key
});
// Add the schema
let classObj = {
'class': 'Question',
'vectorizer': 'text2vec-huggingface', // If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
'moduleConfig': {
'text2vec-huggingface': {
'model': 'sentence-transformers/all-MiniLM-L6-v2', // Can be any public or private Hugging Face model.
'options': {
'waitForModel': true
}
}
}
}
async function addSchema() {
const res = await client.schema.classCreator().withClass(classObj).do();
console.log(res);
}
// END Add the schema
// Import data function
async function getJsonData() {
const file = await fetch('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json');
return file.json();
}
async function importQuestions() {
// Get the questions directly from the URL
const data = await getJsonData();
// Prepare a batcher
let batcher: ObjectsBatcher = client.batch.objectsBatcher();
let counter = 0;
let batchSize = 100;
for (const question of data) {
// Construct an object with a class and properties 'answer' and 'question'
const obj = {
class: 'Question',
properties: {
answer: question.Answer,
question: question.Question,
category: question.Category,
},
}
// add the object to the batch queue
batcher = batcher.withObject(obj);
// When the batch counter reaches batchSize, push the objects to Weaviate
if (counter++ == batchSize) {
// flush the batch queue
const res = await batcher.do();
console.log(res);
// restart the batch queue
counter = 0;
batcher = client.batch.objectsBatcher();
}
}
// Flush the remaining objects
const res = await batcher.do();
console.log(res);
}
async function run() {
await addSchema();
await importQuestions();
}
await run();
package main
import (
"context"
"encoding/json"
"net/http"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate/entities/models"
)
func main() {
cfg := weaviate.Config{
Host: "some-endpoint.weaviate.network/", // Replace with your endpoint
Scheme: "https",
AuthConfig: auth.ApiKey{Value: "YOUR-WEAVIATE-API-KEY"}, // Replace w/ your Weaviate instance API key
Headers: map[string]string{
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY", // Replace with your inference API key
},
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
// add the schema
classObj := &models.Class{
Class: "Question",
Vectorizer: "text2vec-openai",
ModuleConfig: map[string]interface{}{
"text2vec-huggingface": map[string]interface{}{
"model": "sentence-transformers/paraphrase-MiniLM-L6-v2",
},
},
}
if client.Schema().ClassCreator().WithClass(classObj).Do(context.Background()) != nil {
panic(err)
}
// Retrieve the data
items, err := getJSONdata()
if err != nil {
panic(err)
}
// convert items into a slice of models.Object
objects := make([]*models.Object, len(items))
for i := range items {
objects[i] = &models.Object{
Class: "Question",
Properties: map[string]any{
"category": items[i]["Category"],
"question": items[i]["Question"],
"answer": items[i]["Answer"],
},
}
}
// batch write items
batchRes, err := client.Batch().ObjectsBatcher().WithObjects(objects...).Do(context.Background())
if err != nil {
panic(err)
}
for _, res := range batchRes {
if res.Result.Errors != nil {
panic(res.Result.Errors.Error)
}
}
}
func getJSONdata() ([]map[string]string, error) {
// Retrieve the data
data, err := http.DefaultClient.Get("https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json")
if err != nil {
return nil, err
}
defer data.Body.Close()
// Decode the data
var items []map[string]string
if err := json.NewDecoder(data.Body).Decode(&items); err != nil {
return nil, err
}
return items, nil
}
#!/bin/bash
# Requiring `bash` above enables process substitution support, used when redirecting the output of jq to the while loop.
echo "Downloading the data file..."
curl -O -L "https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json" --no-progress-meter
# Replace with your endpoint and API keys
WEAVIATE_URL=https://some-endpoint.weaviate.network
WEAVIATE_API_KEY=YOUR_WEAVIATE_API_KEY
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
SCHEMA_API_URL="$WEAVIATE_URL/v1/schema"
BATCH_API_URL="$WEAVIATE_URL/v1/batch/objects"
BATCH_SIZE=100
# Send to the batch endpoint valid JSON data (no comments, no newlines - https://github.com/weaviate/weaviate/issues/2745)
function send_data() {
curl --no-progress-meter -X POST "$BATCH_API_URL" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-H "X-HuggingFace-Api-Key: $HUGGINGFACE_API_KEY" \
-d "$1" \
-o /dev/null # suppress the output because vectors are long
}
# Uncomment to delete all Question objects if you see a "Name 'Question' already used" error
curl -X DELETE $SCHEMA_API_URL/Question -H "Authorization: Bearer $WEAVIATE_API_KEY"
echo "Creating the schema. Weaviate's autoschema feature will infer class properties when importing..."
echo '{
"class": "Question",
"vectorizer": "text2vec-huggingface",
"moduleConfig": {
"text2vec-huggingface": {
"model": "sentence-transformers/all-MiniLM-L6-v2",
}
}
}' | curl --no-progress-meter \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-d @- \
-o /dev/null \
$SCHEMA_API_URL
# Read the JSON file and loop through its entries
lines_processed=0
batch_data='{"objects": ['
while read -r line; do
# Create the class object out of the JSON data
line=$(echo "$line" | jq '{class: "Question", properties: {answer: .Answer, question: .Question, category: .Category}}')
if [ $lines_processed -eq 0 ]; then
batch_data+=$line
else
batch_data+=",$line"
fi
lines_processed=$((lines_processed + 1))
# If the batch is full, send it to the API
if [ $lines_processed -eq $BATCH_SIZE ]; then
batch_data+="]}"
send_data "$batch_data"
# Reset the batch data and counter
lines_processed=0
batch_data='{"objects": ['
fi
done < <(jq -c '.[]' jeopardy_tiny.json) # process substitution
echo "Sending the remaining data (if any) to the API..."
if [ $lines_processed -ne 0 ]; then
batch_data+="]}"
send_data "$batch_data"
fi
echo "Import finished."
Congratulations, you've successfully built a vector database!
Query Weaviateโ
Now, we can run queries.
Text similarity searchโ
As we have a text2vec
module enabled, Weaviate can perform text-based (nearText
) similarity searches.
Try the nearText
search shown below, looking for quiz objects related to biology
.
- Python
- TypeScript
- Go
- Curl
import weaviate
import json
client = weaviate.Client(
url = "https://some-endpoint.weaviate.network", # Replace with your endpoint
auth_client_secret=weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace w/ your Weaviate instance API key
additional_headers = {
"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY" # Replace with your inference API key
}
)
nearText = {"concepts": ["biology"]}
response = (
client.query
.get("Question", ["question", "answer", "category"])
.with_near_text(nearText)
.with_limit(2)
.do()
)
print(json.dumps(response, indent=4))
import weaviate, { WeaviateClient, ObjectsBatcher, ApiKey } from 'weaviate-ts-client';
import fetch from 'node-fetch';
const client: WeaviateClient = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
apiKey: new ApiKey('YOUR-WEAVIATE-API-KEY'), // Replace w/ your Weaviate instance API key
headers: {'X-HuggingFace-Api-Key': 'YOUR-HUGGINGFACE-API-KEY'}, // Replace with your inference API key
});
async function nearTextQuery() {
const res = await client.graphql
.get()
.withClassName('Question')
.withFields('question answer category')
.withNearText({concepts: ['biology']})
.withLimit(2)
.do();
console.log(JSON.stringify(res, null, 2));
return res
}
await nearTextQuery();
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/graphql"
)
func main() {
cfg := weaviate.Config{
Host: "some-endpoint.weaviate.network/", // Replace with your endpoint
Scheme: "https",
AuthConfig: auth.ApiKey{Value: "YOUR-WEAVIATE-API-KEY"}, // Replace w/ your Weaviate instance API key
Headers: map[string]string{"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY"}, // Replace with your inference API key
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
fields := []graphql.Field{
{Name: "question"},
{Name: "answer"},
{Name: "category"},
}
nearText := client.GraphQL().
NearTextArgBuilder().
WithConcepts([]string{"biology"})
result, err := client.GraphQL().Get().
WithClassName("Question").
WithFields(fields...).
WithNearText(nearText).
WithLimit(2).
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", result)
}
echo '{
"query": "{
Get{
Question (
limit: 2
nearText: {
concepts: [\"biology\"],
}
){
question
answer
category
}
}
}"
}' | tr -d "\n" | curl \
-X POST \
-H 'Content-Type: application/json' \
-H "X-HuggingFace-Api-Key: YOUR_HUGGINGFACE_API_KEY" \
-d @- \
https://some-endpoint.weaviate.network/v1/graphql # Replace this with your endpoint
You should see a result like this (may vary depending on the model used):
{
"data": {
"Get": {
"Question": [
{
"answer": "DNA",
"category": "SCIENCE",
"question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
},
{
"answer": "Liver",
"category": "SCIENCE",
"question": "This organ removes excess glucose from the blood & stores it as glycogen"
}
]
}
}
}
See that even though the word biology
does not appear anywhere, Weaviate returns biology-related entries.
This example shows why vector searches are powerful. Vectorized data objects allow for searches based on degrees of similarity, as shown here.
Recapโ
Well done. You have:
- Created your own cloud-based vector database with Weaviate,
- Populated it with data objects,
- Using an inference API, or
- Using custom vectors, and
- Performed a text similarity search.
Where next is up to you. We include a few links below - or you can check out the sidebar.
Note: Sandbox expiry & options
The sandbox is free, but it will expire after 14 days. After this time, all data in the sandbox will be deleted.
If you would like to preserve your sandbox data, you can retrieve your data, or contact us to upgrade to a production SaaS instance.
Troubleshootingโ
We provide answers to some common questions, or potential issues below.
Confirm class creationโ
If you are not sure whether the class has been created, you can confirm it by visiting the schema
endpoint here (replace the URL with your actual endpoint):
https://some-endpoint.weaviate.network/v1/schema
Expected response
You should see:
{
"classes": [
{
"class": "Question",
... // truncated additional information here
"vectorizer": "text2vec-huggingface"
}
]
}
Where the schema should indicate that the Question
class has been added.
Weaviate uses a combination of RESTful and GraphQL APIs. In Weaviate, RESTful API endpoints can be used to add data or obtain information about the Weaviate instance, and the GraphQL interface to retrieve data.
If you see Error: Name 'Question' already used as a name for an Object class
โ
You may see this error if you try to create a class that already exists in your instance of Weaviate. In this case, you can delete the class following the below instructions.
If your Weaviate instance contains data you want removed, you can manually delete the unwanted class(es).
Know that deleting a class will also delete all associated objects!
Do not do this to a production database, or anywhere where you do not wish to delete your data.
Run the code below to delete the relevant class and its objects.
- Python
- TypeScript
- Go
- Curl
# delete class "YourClassName" - THIS WILL DELETE ALL DATA IN THIS CLASS
client.schema.delete_class("YourClassName") # Replace with your class name - e.g. "Question"
var className: string = 'YourClassName'; // Replace with your class name
client.schema
.classDeleter()
.withClassName(className)
.do()
.then((res: any) => {
console.log(res);
})
.catch((err: Error) => {
console.error(err)
});
className := "YourClassName"
// delete the class
if err := client.Schema().ClassDeleter().WithClassName(className).Do(context.Background()); err != nil {
// Weaviate will return a 400 if the class does not exist, so this is allowed, only return an error if it's not a 400
if status, ok := err.(*fault.WeaviateClientError); ok && status.StatusCode != http.StatusBadRequest {
panic(err)
}
}
curl \
-X DELETE \
https://some-endpoint.weaviate.network/v1/schema/YourClassName
Confirm data importโ
To confirm successful data import, navigate to the objects
endpoint to check that all objects have been imported (replace with your actual endpoint):
https://some-endpoint.weaviate.network/v1/objects
You should see:
{
"deprecations": null,
"objects": [
... // Details of each object
],
"totalResults": 10 // You should see 10 results here
}
Where you should be able to confirm that you have imported all 10
objects.
Nextโ
You can choose your direction from here. For example, you can:
- Go through our guided Tutorials, like how to
- build schemas,
- import data,
- query data and more.
- Find out how to do specific things like:
- Read about important concepts/theory about Weaviate
- Read our references for:
More Resourcesโ
If you can't find the answer to your question here, please look at the:
- Frequently Asked Questions. Or,
- Knowledge base of old issues. Or,
- For questions: Stackoverflow. Or,
- For more involved discussion: Weaviate Community Forum. Or,
- We also have a Slack channel.