Skip to main content

Data structure

LICENSEย Weaviate on Stackoverflow badgeย Weaviate issues on GitHub badgeย Weaviate version badgeย Weaviate total Docker pulls badgeย Go Report Card

Overviewโ€‹

This document lays out how Weaviate deals with data objects, including how they are stored, represented, and linked to each other.

Data object nomenclatureโ€‹

Each data object in Weaviate always belongs to a Class, and has one or more Properties.

Weaviate stores data objects (represented as JSON-documents) in class-based collections, where each object can be represented by a machine learning vector (i.e., an embedding).

Each class-based collection contains objects of the same class, which are defined by a common schema.

Let's unpack this with an example.

JSON documents as objectsโ€‹

Imagine we need to store information about the following author: Alice Munro.

The data about this author can be represented in JSON like this:

{
"name": "Alice Munro",
"age": 91,
"born": "1931-07-10T00:00:00.0Z",
"wonNobelPrize": true,
"description": "Alice Ann Munro is a Canadian short story writer who won the Nobel Prize in Literature in 2013. Munro's work has been described as revolutionizing the architecture of short stories, especially in its tendency to move forward and backward in time."
}

Vectorsโ€‹

As mentioned earlier, we can also attach vector representations to our data objects. This is represented as an array of numbers under a "vector" property, like this:

{
"id": "779c8970-0594-301c-bff5-d12907414002",
"class": "Author",
"properties": {
"name": "Alice Munro",
(...)
},
"vector": [
-0.16147631,
-0.065765485,
-0.06546908
]
}

You can generate vectors yourself outside of Weaviate, or use one of Weaviate's vectorizer modules.

Class Collectionsโ€‹

Weaviate groups all Authors under the Author class and places them in the same class collection.

Following on our author example, Weaviate can store multipe authors like this:

[{
"id": "dedd462a-23c8-32d0-9412-6fcf9c1e8149",
"class": "Author",
"properties": {
"name": "Alice Munro",
"age": 91,
"born": "1931-07-10T00:00:00.0Z",
"wonNobelPrize": true,
"description": "Alice Ann Munro is a Canadian short story writer who won the Nobel Prize in Literature in 2013. Munro's work has been described as revolutionizing the architecture of short stories, especially in its tendency to move forward and backward in time."
},
"vector": [
-0.16147631,
-0.065765485,
-0.06546908
]
}, {
"id": "779c8970-0594-301c-bff5-d12907414002",
"class": "Author",
"properties": {
"name": "Paul Krugman",
"age": 69,
"born": "1953-02-28T00:00:00.0Z",
"wonNobelPrize": true,
"description": "Paul Robin Krugman is an American economist and public intellectual, who is Distinguished Professor of Economics at the Graduate Center of the City University of New York, and a columnist for The New York Times. In 2008, Krugman was the winner of the Nobel Memorial Prize in Economic Sciences for his contributions to New Trade Theory and New Economic Geography."
},
"vector": [
-0.93070928,
-0.03782172,
-0.56288009
]
}]
tip

Every object stored in Weaviate has a UUID, which guarantees uniqueness across all collections.

Cross-referencesโ€‹

Cross-references do not affect vectors

Creating cross-references does not affect object vectors in either direction.

Where data objects have relationships with each other, they can be represented in Weaviate with cross-references.

For example, let's say that we want to represent the fact that "Paul Krugman writes for the New York Times". We can do this by establishing a cross-reference relationship that Paul Krugman writes for the New York Times. More specifically, a Publication object representing the New York Times can have a cross-reference to an Author object representing Paul Krugman.

So, given the following Publication object for the New York Times:

{
"id": "32d5a368-ace8-3bb7-ade7-9f7ff03eddb6",
"class": "Publication",
"properties": {
"name": "The New York Times"
},
"vector": [...]
}

We can identify it with its UUID, and specify it in the writesFor property for the Author like this:

{
"id": "779c8970-0594-301c-bff5-d12907414002",
"class": "Author",
"properties": {
"name": "Paul Krugman",
...
"writesFor": [
{
"beacon": "weaviate://localhost/32d5a368-ace8-3bb7-ade7-9f7ff03eddb6",
"href": "/v1/objects/32d5a368-ace8-3bb7-ade7-9f7ff03eddb6"
}
],
},
"vector": [...]
}

Each cross-reference relationship in Weaviate is directional.

So, in addition to the Author class having a writesFor property that points to the Publication class, you could have a hasAuthors property in the Publication class that points to the Author class.

Cross-references in Weaviate can be best thought of as links to help you retrieve related information. Cross-references do not affect the vector of the from, or the to object.

Hrefs vs beacons

Hrefs and beacons are the locations within Weaviate, which allow us to retrieve cross-referenced objects. We will discuss the difference further as we go forward.

Weaviate Schemaโ€‹

Weaviate requires a data schema to be built before adding data.

Weaviate's schema defines its data structure in a formal language. In other words, it is a blueprint of how the data is to be organized and stored.

The schema defines data classes (i.e. collections of objects), the properties within each class (name, type, description, settings), possible graph links between data objects (cross-references), and the vectorizer module (if any) to be used for the class, as well as settings such as the vectorizer module, and index configurations.

Designing and adding a data schema does not need to be done manually. In the absence of a data schema specification, Weaviate will generate a schema automatically from the provided data.

Schema vs. Taxonomy

A Weaviate data schema is slightly different from a taxonomy, which has a hierarchy. Read more about how taxonomies, ontologies and schemas are related to Weaviate in this blog post.

To learn how to build a schema, see our schema tutorial, or how-to on schema configuration.

For now, what's important to know is this:

  1. Classes and properties (as explained above) are defined in the schema.
  2. Every class has its own vector space, which means that you can attach vectors from different models to different classes.
  3. You can link classes (even if they use different embeddings) by setting cross-references.
  4. You can configure module behavior, ANN index settings, reverse index types, etc. In the schema as well (more about this in the schema tutorial).

Recapโ€‹

  • Inside Weaviate, you can store data objects which can be represented by a machine learning vector.
  • Weaviate represents data objects as JSON documents.
  • Every data object can contain a vector.
  • You can set cross-references as datatypes to link to other objects.
  • You will define classes and properties in a schema.
  • Different classes can represent different vector spaces.
  • The schema has a class-property data structure.
  • You define classes and properties in the schema.
  • We can query using the GraphQL-interface or -in some cases- the RESTful API.
  • Vectors come from machine learning models that you inference yourself or through a Weaviate module.

More Resourcesโ€‹

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For more involved discussion: Weaviate Community Forum. Or,
  5. We also have a Slack channel.