Skip to main content

Data structure

LICENSE Weaviate on Stackoverflow badge Weaviate issues on Github badge Weaviate version badge Weaviate total Docker pulls badge Go Report Card

Overview

This document lays out how Weaviate deals with data objects, including how they are stores, represented, and linked to each other.

Data object nomenclature

Weaviate stores data objects (represented as JSON-documents) in class-based collections, where each object can be represented by a machine learning vector (i.e., an embedding).

Each class-based collection contains objects of the same class, which are defined by a common schema.

Let's unpack this with an example.

JSON documents as objects

Imagine we need to store information about the following author: Alice Munro.

The data about this author can be represented in JSON like this:

{
"name": "Alice Munro",
"age": 91,
"born": "1931-07-10T00:00:00.0Z",
"wonNobelPrize": true,
"description": "Alice Ann Munro is a Canadian short story writer who won the Nobel Prize in Literature in 2013. Munro's work has been described as revolutionizing the architecture of short stories, especially in its tendency to move forward and backward in time."
}

Vectors

As mentioned earlier, we can also attach vector representations to our data objects. This is represented as an array of numbers under a "vector" property, like this:

{
"id": "779c8970-0594-301c-bff5-d12907414002",
"class": "Author",
"properties": {
"name": "Alice Munro",
(...)
},
"vector": [
-0.16147631,
-0.065765485,
-0.06546908
]
}

You can generate vectors yourself outside of Weaviate, or use one of Weaviate's vectorizer modules.

Class Collections

Weaviate groups all Authors under the Author class and places them in the same class collection.

Following on our author example, Weaviate can store multipe authors like this:

[{
"id": "dedd462a-23c8-32d0-9412-6fcf9c1e8149",
"class": "Author",
"properties": {
"name": "Alice Munro",
"age": 91,
"born": "1931-07-10T00:00:00.0Z",
"wonNobelPrize": true,
"description": "Alice Ann Munro is a Canadian short story writer who won the Nobel Prize in Literature in 2013. Munro's work has been described as revolutionizing the architecture of short stories, especially in its tendency to move forward and backward in time."
},
"vector": [
-0.16147631,
-0.065765485,
-0.06546908
]
}, {
"id": "779c8970-0594-301c-bff5-d12907414002",
"class": "Author",
"properties": {
"name": "Paul Krugman",
"age": 69,
"born": "1953-02-28T00:00:00.0Z",
"wonNobelPrize": true,
"description": "Paul Robin Krugman is an American economist and public intellectual, who is Distinguished Professor of Economics at the Graduate Center of the City University of New York, and a columnist for The New York Times. In 2008, Krugman was the winner of the Nobel Memorial Prize in Economic Sciences for his contributions to New Trade Theory and New Economic Geography."
},
"vector": [
-0.93070928,
-0.03782172,
-0.56288009
]
}]
tip

Every object stored in Weaviate has a UUID, which guarantees uniqueness across all collections.

Cross-references

In some cases we need to link data objects with each other.

For example: "Paul Krugman writes for the New York Times".
To represent this relationship between the Author and the Publication, we need to cross reference the objects.

Let's say we have a New York Times object, like this:

{
"id": "32d5a368-ace8-3bb7-ade7-9f7ff03eddb6",
"class": "Publication",
"properties": {
"name": "The New York Times"
},
"vector": [...]
}

Then we can use the UUID from the above object, to attach it to the Author like this (see "writesFor"):

{
"id": "779c8970-0594-301c-bff5-d12907414002",
"class": "Author",
"properties": {
"name": "Paul Krugman",
...
"writesFor": [
{
"beacon": "weaviate://localhost/32d5a368-ace8-3bb7-ade7-9f7ff03eddb6",
"href": "/v1/objects/32d5a368-ace8-3bb7-ade7-9f7ff03eddb6"
}
]
},
"vector": [...]
}
Hrefs vs beacons

Hrefs and beacons are the locations within Weaviate, which allow us to retrieve cross-referenced objects. We will discuss the difference further as we go forward.

Weaviate Schema

Weaviate requires a data schema to be built before adding data.

Weaviate's schema defines its data structure in a formal language. In other words, it is a blueprint of how the data is to be organized and stored. For example, classes of data objects and properties within each class are defined in the schema. The schema also specifies data types of each class property, possible graph links between data objects, and vectorizer module to be used for each class.

Designing and adding a data schema does not need to be done manually. In the absence of a data schema specification, Weaviate will generate a schema automatically from the provided data.

Schema vs. Taxonomy

A Weaviate data schema is slightly different from a taxonomy, which has a hierarchy. Read more about how taxonomies, ontologies and schemas are related to Weaviate in this blog post.

As you're probably guessing, we have a separate quickstart tutorial for working with a schema.

For now, what's important to know is this:

  1. Classes and properties (as explained above) are defined in the schema.
  2. Every class has its own vector space, which means that you can attach vectors from different models to different classes.
  3. You can link classes (even if they use different embeddings) by setting cross-references.
  4. You can configure module behavior, ANN index settings, reverse index types, etc. In the schema as well (more about this in the schema quickstart tutorial).

Recap

  • Inside Weaviate, you can store data objects which can be represented by a machine learning vector.
  • Weaviate represents data objects as JSON documents.
  • Every data object can contain a vector.
  • You can set cross-references as datatypes to link to other objects.
  • You will define classes and properties in a schema.
  • Different classes can represent different vector spaces.
  • The schema has a class-property data structure.
  • You define classes and properties in the schema.
  • We can query using the GraphQL-interface or -in some cases- the RESTful API.
  • Vectors come from machine learning models that you inference yourself or through a Weaviate module.

More Resources

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For issues: Github. Or,
  5. Ask your question in the Slack channel: Slack.