Parsing Objects & Resolving References
Overview
Objects are parsed twice:
First, closest to disk, immediately after reading-in the byte blob, all non-reference props are parsed and their respective Golang types (e.g.
*models.GeoCoordinates
or*models.PhoneNumber
) are returned.A second time at the root level of the
db.DB
type, the whole request is parsed again (recursively) and cross-refs are resolved as requested by the user (throughtraverser.SelectProperties
)
Motivation behind split-parsing
Generally, shards (and also indexes) are self-contained units. It is thus
natural that they return objects which work in isolation and can be interpreted
by the rest of the application (usually in the form of a search.Result
or
search.Results
, both defined as entities
)
However, cross-references aren't predictable. They could point to an item in
another shard or even to an item of another index (because they are a different
user-facing Class
). When running in multi-node mode (horizontal replication)
the shards could be distributed on any node in the cluster.
Furthermore it is more efficient (see cached resolver) to resolve references for a list of objects as opposed to a single object. At shard-level we do not know if a specific object is part of a list and if this list spans across shards or indexes.
Thus the second parsing - to enrich the desired cross-references - happens at
the outermost layer of the persistence package in the db.DB
after
assembling the index/shards parts.
Cached Resolver Logic
The cached resolver is a helper struct with a two-step process:
Cacher: The input object list is (in form of a
search.Results
) is analyzed for references. This is a recursive process, as each resolved references might be pointing to another object which the user (as specified through thetraverser.SelectProperties
) wants to resolve. However Step 1 ("the cacher") stores all results in a flat list (technically a map). This saves on complexity as only the "finding references" part is recursive, but the storage part is simple.Resolver: In a second step, the schema is parsed recursively again where each reference pointer (in the form of a
*models.SingleRef
containing aBeacon
string) is replaced with the resolved reference content (in the form of asearch.LocalRef
). If the result again contains such reference pointers to other objects, these are resolved in the same fashion - recursively until everything that the user requested is resolved.
Relevant Code
- The reference Cacher and its unit tests
- The reference Resolver and its unit tests
- Integration tests for nested refs and refs of different types
More Resources
For additional information, try these sources.
- Frequently Asked Questions
- Weaviate Community Forum
- Knowledge base of old issues
- Weaviate slack channel
Questions and feedback
If you have any questions or feedback, let us know in the user forum.