Skip to main content

How to define a schema

Overview


A schema in Weaviate is the blueprint that defines its data structure for each class of objects. A class is a collection of objects of the same type.

In this section, you will learn how to define a schema and gain insight into some key considerations while doing so.

How to define a schema

As you learned earlier, a schema definition includes a great deal of information. Let's cover a few of those properties in this section, starting with:

  • The metadata such as its name (class),
  • Its data properties,
  • The vectorizer, and
  • Module configurations (moduleConfig).

Metadata definition

You can define for each class and property a name and description.

For classes, these are called:

  • class (required), and
  • description (optional).

For properties, these are called:

  • name (required), and
  • description (optional).

In defining a class, the only required parameter is class, as the rest can be inferred by Weaviate. However, it is recommended to include a description for each class and property, as this will help you and others understand the data structure.

To define a class, you can use this syntax.

class_obj = {
"class": "Article",
}

client.schema.create_class(class_obj)

Properties with data types

Each class definition will include one or more properties, which must have a data type. If you do not specify a data type, Weaviate will automatically assign one based on your data. But for more predictable results, we recommend that you manually specify them in the schema if possible.

Currently, Weaviate data type support includes the following types:

Available data types in Weaviate
Weaviate TypeExact Data TypeFormattingNote
textstringstring
text[]list of strings["string one", "string two"]
objectobject{"child": "I'm nested!"}Available from 1.22
object[]list of objects[{"child": "I'm nested!"}, {"child": "I'm nested too!"}Available from 1.22
intint64 (see note)0
int[]list of int64 (see note)[0, 1]
booleanbooleantrue/false
boolean[]list of booleans[true, false]
numberfloat640.0
number[]list of float64[0.0, 1.1]
datestringmore info
date[]list of stringmore info
uuidstring"c8f8176c-6f9b-5461-8ab3-f3c7ce8c2f5c"
uuid[]list of strings["c8f8176c-6f9b-5461-8ab3-f3c7ce8c2f5c", "36ddd591-2dee-4e7e-a3cc-eb86d30a4303"]
geoCoordinatesstringmore info
phoneNumberstringmore info
blobbase64 encoded stringmore info
cross referencestringmore info

Deprecated types

Weaviate TypeExact Data TypeFormattingDeprecated from
stringstring"string"v1.19
string[]list of strings["string", "second string"]v1.19

Note that most data types can include one such instance, or an array of instances, such as text or text[].

class_obj = {
"class": "Article",
"properties": [
{
"name": "title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
},
{
"name": "url",
"dataType": ["text"],
},
],
}

client.schema.create_class(class_obj)
Did you get an error?

If you ran the first class creation command shown, and this command, Weaviate will throw an error as the class Article already exists. For the purposes of this section, delete the class by running the following command.


Deleting a class should not be done lightly, as deleting a class will delete all of its objects.

client.schema.delete_class("Article")

Setting the vectorizer

The vectorizer parameter for the class specifies the Weaviate module that will be used to generate vector embeddings for the class.

For text objects, you would typically select one of the text2vec modules - such as text2vec-cohere, text2vec-huggingface, text2vec-openai, or text2vec-palm.

Modules are enabled at the instance level through its configuration. You can see the list of available modules for your particular instance by running the following command.

module_metadata = client.get_meta()
module_metadata['modules']
What is a module, exactly?

By now, you've probably seen mentions of Weaviate modules here and there. Modules are optional Weaviate components used to enhance and customize its capabilities.


Weaviate Academy units will generally assume WCS usage, which is pre-configured with a set of modules. We will cover how to enable modules for local instances in another unit, or you can see our Docker installation page.

WCS instances come pre-configured with a number of modules. For example, the response below shows that the text2vec-openai module is available, so we can use it in our schema.

See the JSON response
{
"generative-openai": {
"documentationHref": "https://beta.openai.com/docs/api-reference/completions",
"name": "Generative Search - OpenAI"
},
"qna-openai": {
"documentationHref": "https://beta.openai.com/docs/api-reference/completions",
"name": "OpenAI Question & Answering Module"
},
"ref2vec-centroid": {},
"text2vec-cohere": {
"documentationHref": "https://docs.cohere.com/docs/embeddings",
"name": "Cohere Module"
},
"text2vec-huggingface": {
"documentationHref": "https://huggingface.co/docs/api-inference/detailed_parameters#feature-extraction-task",
"name": "Hugging Face Module"
},
"text2vec-openai": {
"documentationHref": "https://beta.openai.com/docs/guides/embeddings/what-are-embeddings",
"name": "OpenAI Module"
}
}
class_obj = {
"class": "Article",
"properties": [
{
"name": "title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
},
{
"name": "url",
"dataType": ["text"],
},
],
"vectorizer": "text2vec-openai"
}

client.schema.create_class(class_obj)
Vectorizers and user-provided vectors

Note that you can set the vectorizer to None, if you would prefer to only deal with your own vectors by providing them at import time.


In some cases, you can use a vectorizer while uploading them at import time. In this case, you will need to ensure that the vectorizer (e.g. text2vec-cohere) is using the same model as the one you used to generate the vectors, so that the vectors are compatible.

Class-level module configurations

You can set the moduleConfig parameter at the class-level to set class-wide settings for module behavior. For example, the vectorizer could be configured to set the model used (model), or whether to vectorize the class name (vectorizeClassName).

class_obj = {
"class": "Article",
"moduleConfig": {
"text2vec-openai": {
"vectorizeClassName": False,
"model": "ada",
"modelVersion": "002",
"type": "text"
}
},
"properties": [
{
"name": "title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
},
{
"name": "url",
"dataType": ["text"],
},
],
"vectorizer": "text2vec-openai"
}

client.schema.create_class(class_obj)

Property-level module configurations

You can also set the moduleConfig parameter at the property level to set module behavior for each property. For example, you could set whether to vectorize the property name (vectorizePropertyName), or whether to skip the property from vectorization altogether (skip).

In the following example, the skip parameter is set to True for the url property, so that the URL text will be skipped when producing a vector embedding for the object.

class_obj = {
"class": "Article",
"moduleConfig": {
"text2vec-openai": {
"vectorizeClassName": False,
"model": "ada",
"modelVersion": "002",
"type": "text"
}
},
"properties": [
{
"name": "title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
"moduleConfig": {
"text2vec-openai": {
"skip": False,
"vectorizePropertyName": True
}
}
},
{
"name": "url",
"dataType": ["text"],
"moduleConfig": {
"text2vec-openai": {
"skip": True,
}
}
},
],
"vectorizer": "text2vec-openai"
}

client.schema.create_class(class_obj)
But wait, what about the other options?

There are other settings that we haven't covered yet - such as the index settings, or cluster settings such as those relating to replication. We'll cover these in other units later on.

Why so many options?

This might all seem very complex, especially if you are new to Weaviate or databases. But these options will directly impact how your data is stored and how it will react to various queries.

We'll ingest some data in the next section, and then you'll see how these options impact the results of your queries.

Review

Review exercise

Exercise

Do you have a dataset that you are interested in adding to Weaviate?

Try to construct a schema for that dataset based on what you've learned here.

Key takeaways

  • A schema in Weaviate serves as a blueprint defining the data structure for each class of objects.
  • A class represents a collection of objects of the same type.
  • Schema definition includes metadata, data properties, the vectorizer, and module configurations.
  • Data properties in a class need to be assigned a specific data type, such as text or number.
  • The vectorizer parameter determines which Weaviate module will be used to generate vector embeddings for a class.
  • Module configurations at the class and property levels allow customization of module behavior across the entire class or per property, respectively.

Questions and feedback

If you have any questions or feedback, please let us know on our forum. For example, you can: