ref2Vec-centroid module is used to calculate object vectors based on the centroid of referenced vectors. The idea is that this centroid vector would be calculated from the vectors of an object's references, enabling associations between clusters of objects. This is useful in applications such as making suggestions based on the aggregation of a user's actions or preferences.
How to enable
Weaviate Cloud Services
This module is enabled by default on the WCS.
Weaviate open source
Which modules to use in a Weaviate instance can be specified in the
Docker Compose file. Ref2Vec-centroid can be added like this:
How to configure
In your Weaviate schema, you must define how you want this module to vectorize your data. If you are new to Weaviate schemas, you might want to check out the tutorial on the Weaviate schema first.
For example, here is an
Article class which is configured to use ref2vec-centroid. Doing so requires only a class-level
moduleConfig, containing two fields:
referenceProperties: a list of the class' reference properties which should be used during the calculation of the centroid.
method: the method by which the centroid is calculated. Currently only
Article class specifies its
hasParagraphs property as the only reference property to be used in the calculation of an
Article object's vector.
It is important to note that unlike the other vectorizer modules (e.g. text2vec/multi2vec/img2vec), ref2vec-centroid does not generate embeddings based on the contents of an object. Rather, the point of this module is to calculate an object's vector based on vectors of its references.
In this case, the
Paragraph class is configured to generate vectors using the text2vec-contextionary module. Thus, the vector representation of the
Article class is an average of text2vec-contextionary vectors sourced from referenced
Although this example uses text2vec-contextionary to generate vectors for the
Paragraph class, ref2vec-centroid's behavior remains identical for user-provided vectors. In such a case, ref2vec-centroid's output will still be calculated as an average of the reference vectors; the only difference being the provenance of the reference vectors.
"description": "A class representing a published article",
"description": "Title of the article",
"description": "Paragraphs belonging to this article",
"description": "Paragraphs belonging to an Article",
"description": "Content that will be vectorized",
How to use
Now that the
Article class is properly configured to use the ref2vec-centroid module, we can begin to create some objects. If there are not yet any
Paragraph objects to reference, or if we simply don't want to reference a
Paragraph object yet, any newly created
Article object will have its vector set to
Once we are ready to reference one or more existing
Paragraph objects (with non-nil vectors), our
Article object will automatically be assigned a centroid vector, calculated using the vectors from all the
Paragraph objects which are referenced by our
Updating the centroid
An object whose class is configured to use ref2vec-centroid will have its vector calculated (or recalculated) as a result of these events:
- Creating the object with references already assigned as properties
POST: create a single new object with references
- Batch object
POST: create multiple objects at once, each with references
- Updating an existing object's list of references. Note that this can happen several ways:
PUT: update all of the object's properties with a new set of references. This totally replaces the object's existing reference list with the newly provided one
PATCH: update an existing object by adding any newly provided reference(s) to the object's existing reference list
POST: create a new reference to an existing object
PUT: update all of the object's references
- Deleting references from the object. Note that this can happen several ways:
PUT: update all of the object's properties, removing all references
DELETE: delete an existing reference from the object's list of references
Note: Adding references in batches is not currently supported. This is because the batch reference feature is specifically built to avoid the cost of updating the vector index. If this is an important use case for you that you'd like to see in production, please feel free to open up a feature request on GitHub.
It is important to note that updating a referenced object will not automatically trigger an update to the referencing object's vector.
In other words, using our
Let's say an
"On the Philosophy of Modern Ant Colonies", references three
"conclusion". Over time,
"body" may be updated as more research has been conducted on the dynamic between worker ants and soldier ants. In this case, the existing vector for the article will not be updated with a new vector based on the refactored
If we want
"On the Philosophy of Modern Ant Colonies"'s centroid vector to be recalculated, we would need to otherwise trigger an update. For example, we could either remove the reference to
"body" and add it back, or simply
Article object with an identical object.