How to define a schema
Overview
A schema
in Weaviate is the blueprint that defines its data structure for each class
of objects. A class is a collection of objects of the same type.
In this section, you will learn how to define a schema and gain insight into some key considerations while doing so.
How to define a schema
As you learned earlier, a schema definition includes a great deal of information. Let's cover a few of those properties in this section, starting with:
- The metadata such as its name (
class
), - Its data
properties
, - The
vectorizer
, and - Module configurations (
moduleConfig
).
Metadata definition
You can define for each class and property a name and description.
For classes, these are called:
class
(required), anddescription
(optional).
For properties, these are called:
name
(required), anddescription
(optional).
In defining a class, the only required parameter is class
, as the rest can be inferred by Weaviate. However, it is recommended to include a description for each class and property, as this will help you and others understand the data structure.
To define a class, you can use this syntax.
- Python
class_obj = {
"class": "Article",
}
client.schema.create_class(class_obj)
Properties with data types
Each class
definition will include one or more properties, which must have a data type. If you do not specify a data type, Weaviate will automatically assign one based on your data. But for more predictable results, we recommend that you manually specify them in the schema if possible.
Currently, Weaviate data type support includes the following types:
Available data types in Weaviate
Weaviate Type | Exact Data Type | Formatting | Note |
---|---|---|---|
text | string | string | |
text[] | list of strings | ["string one", "string two"] | |
object | object | {"child": "I'm nested!"} | Available from 1.22 |
object[] | list of objects | [{"child": "I'm nested!"}, {"child": "I'm nested too!"} | Available from 1.22 |
int | int64 (see note) | 0 | |
int[] | list of int64 (see note) | [0, 1] | |
boolean | boolean | true /false | |
boolean[] | list of booleans | [true, false] | |
number | float64 | 0.0 | |
number[] | list of float64 | [0.0, 1.1] | |
date | string | more info | |
date[] | list of string | more info | |
uuid | string | "c8f8176c-6f9b-5461-8ab3-f3c7ce8c2f5c" | |
uuid[] | list of strings | ["c8f8176c-6f9b-5461-8ab3-f3c7ce8c2f5c", "36ddd591-2dee-4e7e-a3cc-eb86d30a4303"] | |
geoCoordinates | string | more info | |
phoneNumber | string | more info | |
blob | base64 encoded string | more info | |
cross reference | string | more info |
Deprecated types
Weaviate Type | Exact Data Type | Formatting | Deprecated from |
---|---|---|---|
string | string | "string" | v1.19 |
string[] | list of strings | ["string", "second string"] | v1.19 |
Note that most data types can include one such instance, or an array of instances, such as text
or text[]
.
- Python
class_obj = {
"class": "Article",
"properties": [
{
"name": "title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
},
{
"name": "url",
"dataType": ["text"],
},
],
}
client.schema.create_class(class_obj)
If you ran the first class creation command shown, and this command, Weaviate will throw an error as the class Article
already exists. For the purposes of this section, delete the class by running the following command.
Deleting a class should not be done lightly, as deleting a class will delete all of its objects.
- Python
client.schema.delete_class("Article")
Setting the vectorizer
The vectorizer
parameter for the class specifies the Weaviate module that will be used to generate vector embeddings for the class.
For text objects, you would typically select one of the text2vec
modules - such as text2vec-cohere
, text2vec-huggingface
, text2vec-openai
, or text2vec-palm
.
Modules are enabled at the instance level through its configuration. You can see the list of available modules for your particular instance by running the following command.
- Python
module_metadata = client.get_meta()
module_metadata['modules']
module
, exactly?By now, you've probably seen mentions of Weaviate modules
here and there. Modules are optional Weaviate components used to enhance and customize its capabilities.
Weaviate Academy units will generally assume WCD usage, which is pre-configured with a set of modules. We will cover how to enable modules for local instances in another unit, or you can see our Docker installation page.
WCD instances come pre-configured with a number of modules. For example, the response below shows that the text2vec-openai
module is available, so we can use it in our schema.
See the JSON response
{
"generative-openai": {
"documentationHref": "https://beta.openai.com/docs/api-reference/completions",
"name": "Generative Search - OpenAI"
},
"qna-openai": {
"documentationHref": "https://beta.openai.com/docs/api-reference/completions",
"name": "OpenAI Question & Answering Module"
},
"ref2vec-centroid": {},
"text2vec-cohere": {
"documentationHref": "https://docs.cohere.com/docs/embeddings",
"name": "Cohere Module"
},
"text2vec-huggingface": {
"documentationHref": "https://huggingface.co/docs/api-inference/detailed_parameters#feature-extraction-task",
"name": "Hugging Face Module"
},
"text2vec-openai": {
"documentationHref": "https://beta.openai.com/docs/guides/embeddings/what-are-embeddings",
"name": "OpenAI Module"
}
}
- Python
class_obj = {
"class": "Article",
"properties": [
{
"name": "title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
},
{
"name": "url",
"dataType": ["text"],
},
],
"vectorizer": "text2vec-openai"
}
client.schema.create_class(class_obj)
Note that you can set the vectorizer to None
, if you would prefer to only deal with your own vectors by providing them at import time.
In some cases, you can use a vectorizer while uploading them at import time. In this case, you will need to ensure that the vectorizer (e.g. text2vec-cohere
) is using the same model as the one you used to generate the vectors, so that the vectors are compatible.
Class-level module configurations
You can set the moduleConfig
parameter at the class-level to set class-wide settings for module behavior. For example, the vectorizer could be configured to set the model used (model
), or whether to vectorize the class name (vectorizeClassName
).
- Python
class_obj = {
"class": "Article",
"moduleConfig": {
"text2vec-openai": {
"vectorizeClassName": False,
"model": "ada",
"modelVersion": "002",
"type": "text"
}
},
"properties": [
{
"name": "title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
},
{
"name": "url",
"dataType": ["text"],
},
],
"vectorizer": "text2vec-openai"
}
client.schema.create_class(class_obj)
Property-level module configurations
You can also set the moduleConfig
parameter at the property level to set module behavior for each property. For example, you could set whether to vectorize the property name (vectorizePropertyName
), or whether to skip the property from vectorization altogether (skip
).
In the following example, the skip
parameter is set to True
for the url
property, so that the URL text will be skipped when producing a vector embedding for the object.
- Python
class_obj = {
"class": "Article",
"moduleConfig": {
"text2vec-openai": {
"vectorizeClassName": False,
"model": "ada",
"modelVersion": "002",
"type": "text"
}
},
"properties": [
{
"name": "title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
"moduleConfig": {
"text2vec-openai": {
"skip": False,
"vectorizePropertyName": True
}
}
},
{
"name": "url",
"dataType": ["text"],
"moduleConfig": {
"text2vec-openai": {
"skip": True,
}
}
},
],
"vectorizer": "text2vec-openai"
}
client.schema.create_class(class_obj)
There are other settings that we haven't covered yet - such as the index settings, or cluster settings such as those relating to replication. We'll cover these in other units later on.
Why so many options?
This might all seem very complex, especially if you are new to Weaviate or databases. But these options will directly impact how your data is stored and how it will react to various queries.
We'll ingest some data in the next section, and then you'll see how these options impact the results of your queries.
Review
Review exercise
Do you have a dataset that you are interested in adding to Weaviate?
Try to construct a schema for that dataset based on what you've learned here.
Key takeaways
- A schema in Weaviate serves as a blueprint defining the data structure for each class of objects.
- A class represents a collection of objects of the same type.
- Schema definition includes metadata, data properties, the vectorizer, and module configurations.
- Data properties in a class need to be assigned a specific data type, such as
text
ornumber
. - The vectorizer parameter determines which Weaviate module will be used to generate vector embeddings for a class.
- Module configurations at the class and property levels allow customization of module behavior across the entire class or per property, respectively.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.