Schemas in detail
Overview
In this section, we will explore schema construction, including discussing some of the more commonly specified parameters. We will also discuss the auto-schema feature and why you might want to take the time to manually set the schema.
Prerequisites
We recommend you complete the Quickstart tutorial first.
Before you start this tutorial, you should follow the steps in the tutorials to have:
- A new instance of Weaviate running (e.g. on the Weaviate Cloud Services),
- An API key for your preferred inference API, such as OpenAI, Cohere, or Hugging Face, and
- Installed your preferred Weaviate client library.
If you have completed the entire Quickstart tutorial, your Weaviate instance will contain data objects and a schema. We recommend deleting the Question
class before starting this section. See below for details on how to do so:
Deleting classes
You can delete any unwanted class(es), along with the data that they contain.
Know that deleting a class will also delete all associated objects!
Do not do this to a production database, or anywhere where you do not wish to delete your data.
Run the code below to delete the relevant class and its objects.
- Python
- JavaScript/TypeScript
- Go
- Curl
# delete class "YourClassName" - THIS WILL DELETE ALL DATA IN THIS CLASS
client.schema.delete_class("YourClassName") # Replace with your class name - e.g. "Question"
const className: string = 'YourClassName'; // Replace with your class name
await client.schema
.classDeleter()
.withClassName(className)
.do();
className := "YourClassName"
// delete the class
if err := client.Schema().ClassDeleter().WithClassName(className).Do(context.Background()); err != nil {
// Weaviate will return a 400 if the class does not exist, so this is allowed, only return an error if it's not a 400
if status, ok := err.(*fault.WeaviateClientError); ok && status.StatusCode != http.StatusBadRequest {
panic(err)
}
}
curl \
-X DELETE \
https://some-endpoint.weaviate.network/v1/schema/YourClassName
Introduction
What is a schema?
Weaviate's schema defines its data structure in a formal language. In other words, it is a blueprint of how the data is to be organized and stored.
The schema defines data classes (i.e. collections of objects), the properties within each class (name, type, description, settings), possible graph links between data objects (cross-references), and the vectorizer module (if any) to be used for the class, as well as settings such as the vectorizer module, and index configurations.
Quickstart recap
In the Quickstart tutorial, you saw how to specify the name and the vectorizer for a data collection, called a "class" in Weaviate:
- Python
- JavaScript/TypeScript
- Go
- Curl
class_obj = {
"class": "Question",
"vectorizer": "text2vec-openai", # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
"moduleConfig": {
"text2vec-openai": {},
"generative-openai": {} # Ensure the `generative-openai` module is used for generative queries
}
}
client.schema.create_class(class_obj)
const classObj = {
'class': 'Question',
'vectorizer': 'text2vec-openai', // If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
'moduleConfig': {
'text2vec-openai': {},
'generative-openai': {} // Ensure the `generative-openai` module is used for generative queries
},
};
async function addSchema() {
const res = await client.schema.classCreator().withClass(classObj).do();
console.log(res);
}
await addSchema();
package main
import (
"context"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/auth"
"github.com/weaviate/weaviate/entities/models"
)
func main() {
cfg := weaviate.Config{
Host: "some-endpoint.weaviate.network/", // Replace with your endpoint
Scheme: "https",
AuthConfig: auth.ApiKey{Value: "YOUR-WEAVIATE-API-KEY"}, // Replace w/ your Weaviate instance API key
Headers: map[string]string{
"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY", // Replace with your inference API key
},
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
classObj := &models.Class{
Class: "Question",
Vectorizer: "text2vec-openai", // If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
ModuleConfig: map[string]interface{}{
"text2vec-openai": map[string]interface{}{},
"generative-openai": map[string]interface{}{},
},
}
// add the schema
err = client.Schema().ClassCreator().WithClass(classObj).Do(context.Background())
if err != nil {
panic(err)
}
}
echo '{
"class": "Question",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {},
"generative-openai": {}
}
}' | curl \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WEAVIATE_API_KEY" \
-d @- \
https://some-endpoint.weaviate.network/v1/schema
Then when you navigated to the schema
endpoint at https://some-endpoint.weaviate.network/v1/schema
, you will have seen the above-specified class name and the vectorizer.
But you might have also noticed that the schema
included a whole lot of information that you did not specify.
That's because Weaviate inferred them for us, using the "auto-schema" feature.
Auto-schema vs. manual schema
Weaviate requires a complete schema for each class of data objects.
If any required information is missing, Weaviate will use the auto-schema feature to fill in infer the rest from the data being imported as well as the default settings.
While this may be suitable in some circumstances, in many cases you may wish to explicitly define a schema. Manually defining the schema will help you ensure that the schema is suited for your specific data and needs.
Create a class
A collection of data in Weaviate is called a "class". We will be adding a class to store our quiz data.
About classes
Here are some key considerations about classes:
Each Weaviate class:
- Is always written with a capital letter first. This is to distinguish them from generic names for cross-referencing.
- Constitutes a distinct vector space. A search in Weaviate is always restricted to a class.
- Can have its own vectorizer. (e.g. one class can have a
text2vec-openai
vectorizer, and another might havemulti2vec-clip
vectorizer, ornone
if you do not intend on using a vectorizer). - Has
property
values, where eachproperty
specifies the data type to store.
Yes! You can bring your own vectors and pass them to Weaviate directly. See this reference for more information.
Create a basic class
Let's create a class called Question for our data.
Our Question class will:
- Contain three properties:
- name
answer
: typetext
- name
question
: typetext
- name
category
: typetext
- name
- Use a
text2vec-openai
vectorizer
Run the below code with your client to define the schema for the Question class and display the created schema information.
- Python
- JavaScript/TypeScript
import weaviate
import json
client = weaviate.Client("https://some-endpoint.weaviate.network/") # Replace with your endpoint
# we will create the class "Question"
class_obj = {
"class": "Question",
"description": "Information from a Jeopardy! question", # description of the class
"properties": [
{
"dataType": ["text"],
"description": "The question",
"name": "question",
},
{
"dataType": ["text"],
"description": "The answer",
"name": "answer",
},
{
"dataType": ["text"],
"description": "The category",
"name": "category",
},
],
"vectorizer": "text2vec-openai",
}
# add the schema
client.schema.create_class(class_obj)
# get the schema
schema = client.schema.get()
# print the schema
print(json.dumps(schema, indent=4))
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'https',
host: 'some-endpoint.weaviate.network', // Replace with your endpoint
});
// Define the 'Question' class
const classObj = {
class: 'Question',
description: 'Information from a Jeopardy! question', // description of the class
properties: [
{
dataType: ['text'],
description: 'The question',
name: 'question',
},
{
dataType: ['text'],
description: 'The answer',
name: 'answer',
},
{
dataType: ['text'],
description: 'The category',
name: 'category',
},
],
vectorizer: 'text2vec-openai',
};
// Add the class to the schema
await client
.schema
.classCreator()
.withClass(classObj)
.do();
// Get and print the schema
const classInSchema = await client.schema
.getter()
.do();
console.log(JSON.stringify(classInSchema, null, 2));
Classes always start with a capital letter. Properties always begin with a small letter. You can use CamelCase
class names, and property names allow underscores. Read more about schema classes, properties and data types here.
The result should look something like this:
See the returned schema
{
"classes": [
{
"class": "Question",
"description": "Information from a Jeopardy! question",
"invertedIndexConfig": {
"bm25": {
"b": 0.75,
"k1": 1.2
},
"cleanupIntervalSeconds": 60,
"stopwords": {
"additions": null,
"preset": "en",
"removals": null
}
},
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text",
"vectorizeClassName": true
}
},
"properties": [
{
"dataType": [
"text"
],
"description": "The question",
"moduleConfig": {
"text2vec-openai": {
"skip": false,
"vectorizePropertyName": false
}
},
"name": "question",
"tokenization": "word"
},
{
"dataType": [
"text"
],
"description": "The answer",
"moduleConfig": {
"text2vec-openai": {
"skip": false,
"vectorizePropertyName": false
}
},
"name": "answer",
"tokenization": "word"
},
{
"dataType": [
"text"
],
"description": "The category",
"moduleConfig": {
"text2vec-openai": {
"skip": false,
"vectorizePropertyName": false
}
},
"name": "category",
"tokenization": "word"
}
],
"replicationConfig": {
"factor": 1
},
"shardingConfig": {
"virtualPerPhysical": 128,
"desiredCount": 1,
"actualCount": 1,
"desiredVirtualCount": 128,
"actualVirtualCount": 128,
"key": "_id",
"strategy": "hash",
"function": "murmur3"
},
"vectorIndexConfig": {
"skip": false,
"cleanupIntervalSeconds": 300,
"maxConnections": 64,
"efConstruction": 128,
"ef": -1,
"dynamicEfMin": 100,
"dynamicEfMax": 500,
"dynamicEfFactor": 8,
"vectorCacheMaxObjects": 1000000000000,
"flatSearchCutoff": 40000,
"distance": "cosine"
},
"vectorIndexType": "hnsw",
"vectorizer": "text2vec-openai"
}
]
}
We get back a lot of information here.
Some of it is what we specified, such as the class name (class
), and properties
including their dataType
and name
. But the others are inferred by Weaviate based on the defaults and the data provided.
Class property specification examples
And depending on your needs, you might want to change any number of these. For example, you might change:
dataType
to modify the type of data being saved. For example, classes with dataTypetext
will be tokenized differently to those withstring
dataType (read more).moduleConfig
to modify how each module behaves. In this case, you could change the model and/or version for the OpenAI inference API, and the vectorization behavior such as whether the class name is used for vectorization.properties
/moduleConfig
to further modify module behavior at a class data property level. You might choose to skip a particular property being included for vectorization.invertedIndexConfig
to add or remove particular stopwords, or change BM25 indexing constants.vectorIndexConfig
to change vector index (e.g. HNSW) parameters, such as for speed / recall tradeoffs.
So for example, you might specify a schema like the one below:
{
"class": "Question",
"description": "Information from a Jeopardy! question",
"moduleConfig": {
"text2vec-openai": {
"vectorizeClassName": false // Default: true
}
},
"invertedIndexConfig": {
"bm25": {
"k1": 1.5, // Default: 1.2
"b": 0.75
}
},
"properties": [
{
"dataType": ["text"],
"description": "The question",
"moduleConfig": {
"text2vec-openai": {
"vectorizePropertyName": true // Default: false
}
},
"name": "question",
},
...
]
}
With this you will have changed the specified properties from their defaults. Note that in the rest of the tutorials, we assume that you have not done this.
You can read more about various schema, data types, modules, and index configuration options in the pages below.
Recap
- The schema is where you define the structure of the information to be saved.
- A schema consists of classes and properties, which define concepts.
- Any unspecified setting is inferred by the auto-schema feature based on the data and defaults.
- The schema can be modified through the RESTful API.
- A class or property in Weaviate is immutable, but can always be extended.
Suggested reading
- Reference:
schema
endpoint RESTful API - Tutorial: Import in detail
- Tutorial: Queries in detail
- Tutorial: Introduction to modules
- Tutorial: Introduction to Weaviate Console
More Resources
For additional information, try these sources.