How to define a schema
Overview​
This tutorial is designed to show you an example of how to create a schema in Weaviate.
By the end of this tutorial, you should have a good idea of how to create a schema. You will begin to see why it is important, and where to find the relevant information required for schema definition.
Key points​
- A schema consists of classes and properties, which define concepts.
- Words in the schema (names of classes and properties) must be part of the
text2vec-contextionary
. - The schema can be modified through the RESTful API. Python, JavaScript and Go clients are available.
- A class or property in Weaviate becomes immutable, but can always be extended.
- Learn about Concepts, Classes, Properties and dataTypes in the API reference guide.
Prerequisites​
We recommend reading the Quickstart tutorial first before tackling this tutorial.
For the tutorial, you will need a Weaviate instance running with the text2vec-contextionary
module. We assume your instance is running at http://localhost:8080
.
What is a schema?​
Weaviate's schema defines its data structure in a formal language. In other words, it is a blueprint of how the data is to be organized and stored. For example, classes of data objects and properties within each class are defined in the schema. The schema also specifies data types of each class property, possible graph links between data objects, and vectorizer module to be used for each class.
If you begin to import data without having defined a schema, it will trigger the auto-schema feature and Weaviate will create a schema for you.
While this may be suitable in some circumstances, in many cases you may wish to explicitly define a schema. Manually defining the schema will help you ensure that the schema is suited for your specific data and needs.
Creating your first schema (with the Python client)​
Let's say you want to create a schema for a news publications dataset. This dataset consists of random news articles from publications like Financial Times, New York Times, CNN, Wired, etcetera. You also want to capture the authors, and some metadata about these objects like publication dates.
Follow these steps to create and upload the schema.
1. Start with an empty schema in JSON format.
Schemas are defined in JSON format. An empty schema to start with:
{
"classes": []
}
2. Define classes and properties.
Let's say there are three classes you want to capture from this dataset in Weaviate: Publication
, Article
and Author
. Notice that these words are singular (which is best practice, each data object is one of these classes).
Classes always start with a capital letter. Properties always begin with a small letter. When you want to concatenate words into one class name or one property name, you can do that with camelCasing the words. Read more about schema classes, properties and data types here.
Let's define the class Publication
with the properties name
, hasArticles
and headquartersGeoLocation
in JSON format. name
will be the name of the Publication
, in string format. hasArticles
will be a reference to Article objects. We need to define the class Articles
in the same schema to make sure the reference is possible. headquartersGeoLocation
will be of the special dataType geoCoordinates
.
Note that the property "title"
of the class "Article"
has dataType "string"
, while the property "content"
is of dataType "text"
. string
values are indexed as one token, whereas text
values are indexed after applying tokenization. "jane.doe@foobar.com"
as string would be indexed as "jane.doe@foobar.com"
and also only match that in a GraphQL where filter, whereas as text it would be indexed as ['jane', 'doe', 'foobar', 'com']
and also match the individual words.
{
"class": "Publication",
"description": "A publication with an online source",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the publication",
"name": "name"
},
{
"dataType": [
"Article"
],
"description": "The articles this publication has",
"name": "hasArticles"
},
{
"dataType": [
"geoCoordinates"
],
"description": "Geo location of the HQ",
"name": "headquartersGeoLocation"
}
]
}
Add the classes Article
and Author
to the same schema, so you will end up with the following classes:
[{
"class": "Publication",
"description": "A publication with an online source",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the publication",
"name": "name"
},
{
"dataType": [
"Article"
],
"description": "The articles this publication has",
"name": "hasArticles"
},
{
"dataType": [
"geoCoordinates"
],
"description": "Geo location of the HQ",
"name": "headquartersGeoLocation"
}
]
}, {
"class": "Article",
"description": "A written text, for example a news article or blog post",
"properties": [
{
"dataType": [
"string"
],
"description": "Title of the article",
"name": "title"
},
{
"dataType": [
"text"
],
"description": "The content of the article",
"name": "content"
}
]
}, {
"class": "Author",
"description": "The writer of an article",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the author",
"name": "name"
},
{
"dataType": [
"Article"
],
"description": "Articles this author wrote",
"name": "wroteArticles"
},
{
"dataType": [
"Publication"
],
"description": "The publication this author writes for",
"name": "writesFor"
}
]
}]
Now, add this list of classes to the schema, which will look like this:
{
"classes": [{
"class": "Publication",
"description": "A publication with an online source",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the publication",
"name": "name"
},
{
"dataType": [
"Article"
],
"description": "The articles this publication has",
"name": "hasArticles"
},
{
"dataType": [
"geoCoordinates"
],
"description": "Geo location of the HQ",
"name": "headquartersGeoLocation"
}
]
}, {
"class": "Article",
"description": "A written text, for example a news article or blog post",
"properties": [
{
"dataType": [
"string"
],
"description": "Title of the article",
"name": "title"
},
{
"dataType": [
"text"
],
"description": "The content of the article",
"name": "content"
}
]
}, {
"class": "Author",
"description": "The writer of an article",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the author",
"name": "name"
},
{
"dataType": [
"Article"
],
"description": "Articles this author wrote",
"name": "wroteArticles"
},
{
"dataType": [
"Publication"
],
"description": "The publication this author writes for",
"name": "writesFor"
}
]
}]
}
3. Upload the schema to Weaviate with the Python client.
- Python
import weaviate
client = weaviate.Client("http://localhost:8080")
schema = {
"classes": [{
"class": "Publication",
"description": "A publication with an online source",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the publication",
"name": "name"
},
{
"dataType": [
"Article"
],
"description": "The articles this publication has",
"name": "hasArticles"
},
{
"dataType": [
"geoCoordinates"
],
"description": "Geo location of the HQ",
"name": "headquartersGeoLocation"
}
]
}, {
"class": "Article",
"description": "A written text, for example a news article or blog post",
"properties": [
{
"dataType": [
"string"
],
"description": "Title of the article",
"name": "title"
},
{
"dataType": [
"text"
],
"description": "The content of the article",
"name": "content"
}
]
}, {
"class": "Author",
"description": "The writer of an article",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the author",
"name": "name"
},
{
"dataType": [
"Article"
],
"description": "Articles this author wrote",
"name": "wroteArticles"
},
{
"dataType": [
"Publication"
],
"description": "The publication this author writes for",
"name": "writesFor"
}
]
}]
}
client.schema.create(schema)
Creating your first schema (RESTful API, Python or JavaScript)​
Currently, only with the Python client it is possible to upload a whole schema at once. If you are not using Python, you need to upload classes to Weaviate one by one. The schema from the previous example can be uploaded in the following steps:
1. Create the classes without references.
References to other classes can only be added if those classes exist in the Weaviate schema. Therefore, we first create the classes with all properties without references, and we will add the references in the step 2.
Add a class Publication
without the property hasArticles
, and add this to a running Weaviate instance as follows:
- Python
- JavaScript
- Go
- Java
- Curl
import weaviate
client = weaviate.Client("http://localhost:8080")
class_obj = {
"class": "Publication",
"description": "A publication with an online source",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the publication",
"name": "name"
},
{
"dataType": [
"geoCoordinates"
],
"description": "Geo location of the HQ",
"name": "headquartersGeoLocation"
}
]
}
client.schema.create_class(class_obj)
const weaviate = require('weaviate-client');
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
var classObj = {
'class': 'Publication',
'description': 'A publication with an online source',
'properties': [
{
'dataType': [
'string'
],
'description': 'Name of the publication',
'name': 'name'
},
{
'dataType': [
'geoCoordinates'
],
'description': 'Geo location of the HQ',
'name': 'headquartersGeoLocation'
}
]
};
client.schema
.classCreator()
.withClass(classObj)
.do()
.then(res => {
console.log(res)
})
.catch(err => {
console.error(err)
});
package main
import (
"context"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate/entities/models"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client := weaviate.New(cfg)
classObj := &models.Class{
Class: "Publication",
Description: "A publication with an online source",
Properties: []*models.Property{
{
DataType: []string{"string"},
Description: "Name of the publication",
Name: "name",
},
{
DataType: []string{"geoCoordinates"},
Description: "Geo location of the HQ",
Name: "headquartersGeoLocation",
},
},
}
err := client.Schema().ClassCreator().WithClass(classObj).Do(context.Background())
if err != nil {
panic(err)
}
}
package technology.semi.weaviate;
import java.util.ArrayList;
import technology.semi.weaviate.client.Config;
import technology.semi.weaviate.client.WeaviateClient;
import technology.semi.weaviate.client.base.Result;
import technology.semi.weaviate.client.v1.schema.model.DataType;
import technology.semi.weaviate.client.v1.schema.model.Property;
import technology.semi.weaviate.client.v1.schema.model.WeaviateClass;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
WeaviateClass clazz = WeaviateClass.builder()
.className("Publication")
.description("A publication with an online source")
.properties(new ArrayList() { {
add(Property.builder()
.dataType(new ArrayList(){ { add(DataType.STRING); } })
.description("Name of the publication")
.name("name")
.build());
add(Property.builder()
.dataType(new ArrayList(){ { add(DataType.GEO_COORDINATES); } })
.description("Geo location of the HQ")
.name("headquartersGeoLocation")
.build());
} })
.build();
Result<Boolean> result = client.schema().classCreator().withClass(clazz).run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
$ curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "Publication",
"description": "A publication with an online source",
"properties": [
{
"dataType": [
"string"
],
"description": "Name of the publication",
"name": "name"
},
{
"dataType": [
"geoCoordinates"
],
"description": "Geo location of the HQ",
"name": "headquartersGeoLocation"
}
]
}' \
http://localhost:8080/v1/schema
Perform a similar request with the Article
and Author
class.
2. Add reference properties to the existing classes.
There are three classes in your Weaviate schema now, but we did not link them to each other with cross references yet. Let's add the reference between Publication
and Articles
in the property hasArticles
like this:
- Python
- JavaScript
- Go
- Java
- Curl
import weaviate
client = weaviate.Client("http://localhost:8080")
reference_property = {
"dataType": [
"Article"
],
"description": "The articles this publication has",
"name": "hasArticles"
}
client.schema.property.create("Publication", reference_property)
const weaviate = require('weaviate-client');
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
const className = 'Publication';
const prop = {
'dataType': [
'Article'
],
'description': 'The articles this publication has',
'name': 'hasArticles'
};
client.schema
.propertyCreator()
.withClassName(className)
.withProperty(prop)
.do()
.then(res => {
console.log(res);
})
.catch(err => {
console.error(err)
});
package main
import (
"context"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate/entities/models"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client := weaviate.New(cfg)
prop := &models.Property{
DataType: []string{"Article"},
Name: "hasArticles",
Description: "The articles this publication has",
}
err := client.Schema().PropertyCreator().
WithClassName("Publication").
WithProperty(prop).
Do(context.Background())
if err != nil {
panic(err)
}
}
package technology.semi.weaviate;
import java.util.Arrays;
import technology.semi.weaviate.client.Config;
import technology.semi.weaviate.client.WeaviateClient;
import technology.semi.weaviate.client.base.Result;
import technology.semi.weaviate.client.v1.schema.model.Property;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
Property property = Property.builder()
.dataType(Arrays.asList("Article"))
.description("The articles this publication has")
.name("hasArticles")
.build();
Result<Boolean> result = client.schema().propertyCreator()
.withClassName("Article")
.withProperty(property)
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
$ curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"dataType": [
"Article"
],
"description": "The articles this publication has",
"name": "hasArticles"
}' \
http://localhost:8080/v1/schema/Publication/properties
Repeat this action with a property wroteArticles
and writesFor
of Author
referring to Articles
and Publication
respectively.
Next steps​
- Check out the RESTful API reference for an overview of all schema API operations.
- Read this article on Weaviate and schema creation
More Resources​
If you can't find the answer to your question here, please look at the:
- Frequently Asked Questions. Or,
- Knowledge base of old issues. Or,
- For questions: Stackoverflow. Or,
- For issues: Github. Or,
- Ask your question in the Slack channel: Slack.