Data types
Introduction
When creating a property, Weaviate needs to know what type of data you will give it. Weaviate accepts the following types:
Weaviate Type | Exact Data Type | Formatting | Note |
---|---|---|---|
text | string | string | |
text[] | list of strings | ["string one", "string two"] | |
object | object | {"child": "I'm nested!"} | Available from 1.22 |
object[] | list of objects | [{"child": "I'm nested!"}, {"child": "I'm nested too!"} | Available from 1.22 |
int | int64 (see note) | 0 | |
int[] | list of int64 (see note) | [0, 1] | |
boolean | boolean | true /false | |
boolean[] | list of booleans | [true, false] | |
number | float64 | 0.0 | |
number[] | list of float64 | [0.0, 1.1] | |
date | string | more info | |
date[] | list of string | more info | |
uuid | string | "c8f8176c-6f9b-5461-8ab3-f3c7ce8c2f5c" | |
uuid[] | list of strings | ["c8f8176c-6f9b-5461-8ab3-f3c7ce8c2f5c", "36ddd591-2dee-4e7e-a3cc-eb86d30a4303"] | |
geoCoordinates | string | more info | |
phoneNumber | string | more info | |
blob | base64 encoded string | more info | |
cross reference | string | more info |
Deprecated types
Weaviate Type | Exact Data Type | Formatting | Deprecated from |
---|---|---|---|
string | string | "string" | v1.19 |
string[] | list of strings | ["string", "second string"] | v1.19 |
- Although Weaviate supports
int64
, GraphQL currently only supportsint32
, and does not supportint64
. This means that currently integer data fields in Weaviate with integer values larger thanint32
, will not be returned using GraphQL queries. We are working on solving this issue. As current workaround is to use astring
instead. - Data types are specified as an array (e.g. ["text"]), as it is required for some cross-reference specifications.
DataType: text
Tokenization configuration
Refer to this section on how to configure the tokenization behavior of a text
property.
string
is deprecatedPrior to v1.19
, Weaviate supported an additional datatype string
, which was differentiated by tokenization behavior to text
. As of v1.19
, this type is deprecated and will be removed in a future release.
Use text
instead of string
. text
supports the tokenization options that are available through string
.
DataType: cross-reference
The cross-reference
type is the graph element of Weaviate: you can create a link from one object to another. In the schema you can define multiple classes to which a property can point, in a list of strings. The strings in the dataType
list are names of classes defined elsewhere in the schema. For example:
{
"properties": [
{
"name": "hasWritten",
"dataType": [
"Article",
"Blog"
]
}
]
}
Number of linked instances
The cross-reference
type objects are arrays
by default. This allows you to link to any number of instances of a given class (including zero).
In the above example, our objects can be linked to:
- 0 Articles and 1 Blog
- 1 Article and 3 Blogs
- 2 Articles and 5 Blogs
- etc.
DataType: object
v1.22
The object
type allows you to store nested data structures in Weaviate. The data structure is a JSON object, and can be nested to any depth.
For example, a Person
class could have an address
property, as an object. It could in turn include nested properties such as street
and city
:
{
"class": "Person",
"properties": [
{
"dataType": ["text"],
"name": "last_name",
},
{
"dataType": ["object"],
"name": "address",
"nestedProperties": [
{"dataType": ["text"], "name": "street"},
{"dataType": ["text"], "name": "city"}
],
}
],
}
An object for this class may have a structure such as follows:
{
"last_name": "Franklin",
"address": {
"city": "London",
"street": "King Street"
}
}
As of 1.22
, object
and object[]
datatype properties are not indexed and not vectorized.
Future plans include the ability to index nested properties, for example to allow for filtering on nested properties and vectorization options.
DataType: date
A date
in Weaviate is represented by an RFC 3339 timestamp in the date-time
format. The timestamp includes the time and an offset.
For example:
"1985-04-12T23:20:50.52Z"
"1996-12-19T16:39:57-08:00"
"1937-01-01T12:00:27.87+00:20"
To add a list of dates as a single entity, use an array of date-time
formatted strings. For example: ["1985-04-12T23:20:50.52Z", "1937-01-01T12:00:27.87+00:20"]
DataType: blob
The datatype blob accepts any binary data. The data should be base64
encoded, and passed as a string
. Characteristics:
- Weaviate doesn't make assumptions about the type of data that is encoded. A module (e.g.
img2vec
) can investigate file headers as it wishes, but Weaviate itself does not do this. - When storing, the data is
base64
decoded (so Weaviate stores it more efficiently). - When serving, the data is
base64
encoded (so it is safe to serve asjson
). - There is no max file size limit.
- This
blob
field is always skipped in the inverted index, regardless of setting. This mean you can not search by thisblob
field in a Weaviate GraphQLwhere
filter, and there is novalueBlob
field accordingly. Depending on the module, this field can be used in module-specific filters (e.g.nearImage
{} in theimg2vec-neural
filter).
Example:
The dataType blob
can be used as property dataType in the data schema as follows:
{
"properties": [
{
"name": "image",
"dataType": ["blob"]
}
]
}
To obtain the base64-encoded value of an image, you can run the following command - or use the helper methods in the Weaviate clients - to do so:
cat my_image.png | base64
You can then import data with blob
dataType to Weaviate as follows:
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "FashionPicture",
"id": "36ddd591-2dee-4e7e-a3cc-eb86d30a4302",
"properties": {
"image": "iVBORw0KGgoAAAANS..."
}
}' \
http://localhost:8080/v1/objects
DataType: uuid
v1.19
The dedicated uuid
and uuid[]
data types are more space-efficient than storing the same data as text.
- Each
uuid
is a 128-bit (16-byte) number. - The filterable index uses roaring bitmaps.
It is currently not possible to aggregate or sort by uuid
or uuid[]
types.
DataType: geoCoordinates
Weaviate allows you to store geo coordinates. When querying Weaviate, you can use this type to find items in a radius around this area. A geo coordinate value is a float, and is processed as decimal degree according to the ISO standard.
To supply a geoCoordinates
property, specify the latitude
and longitude
as floating point decimal degrees.
An example of how geo coordinates are used in a data object:
{
"City": {
"location": {
"latitude": 52.366667,
"longitude": 4.9
}
}
}
Currently, geo-coordinate filtering is limited to the nearest 800 results from the source location, which will be further reduced by any other filter conditions and search parameters.
If you plan on a densely populated dataset, consider using another strategy such as geo-hashing into a text
datatype, and filtering further, such as with a ContainsAny
filter.
DataType: phoneNumber
There is a special, primitive data type phoneNumber
. When a phone number is added to this field, the input will be normalized and validated, unlike the single fields as number
and string
. The data field is an object, as opposed to a flat type similar to geoCoordinates
. The object has multiple fields:
{
"phoneNumber": {
"input": "020 1234567", // Required. Raw input in string format
"defaultCountry": "nl", // Required if only a national number is provided, ISO 3166-1 alpha-2 country code. Only set if explicitly set by the user.
"internationalFormatted": "+31 20 1234567", // Read-only string
"countryCode": 31, // Read-only unsigned integer, numerical country code
"national": 201234567, // Read-only unsigned integer, numerical representation of the national number
"nationalFormatted": "020 1234567", // Read-only string
"valid": true // Read-only boolean. Whether the parser recognized the phone number as valid
}
}
There are two fields that accept input. input
must always be set, while defaultCountry
must only be set in specific situations. There are two scenarios possible:
- When you entered an international number (e.g.
"+31 20 1234567"
) to theinput
field, nodefaultCountry
needs to be entered. The underlying parser will automatically recognize the number's country. - When you entered a national number (e.g.
"020 1234567"
), you need to specify the country indefaultCountry
(in this case,"nl"
), so that the parse can correctly convert the number into all formats. The string indefaultCountry
should be an ISO 3166-1 alpha-2 country code.
As you can see in the code snippet above, all other fields are read-only. These fields are filled automatically, and will appear when reading back a field of type phoneNumber
.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.