Data types
Introduction
When creating a property, you must specify a data type. Weaviate accepts the following types.
Arrays of a data type are specified by adding []
to the type (e.g. text
➡ text[]
). Note that not all data types support arrays.
Name | Exact type | Formatting | Array ([] ) available (example) | Note |
---|---|---|---|---|
text | string | string | ✅ ["string one", "string two"] | |
boolean | boolean | true /false | ✅ [true, false] | |
int | int64 (see notes) | 123 | ✅ [123, -456] | |
number | float64 | 0.0 | ✅ [0.0, 1.1] | |
date | string | more info | ✅ | |
uuid | string | "c8f8176c-6f9b-5461-8ab3-f3c7ce8c2f5c" | ✅ ["c8f8176c-6f9b-5461-8ab3-f3c7ce8c2f5c", "36ddd591-2dee-4e7e-a3cc-eb86d30a4303"] | |
geoCoordinates | string | more info | ❌ | |
phoneNumber | string | more info | ❌ | |
blob | base64 encoded string | more info | ❌ | |
object | object | {"child": "I'm nested!"} | ✅ [{"child": "I'm nested!"}, {"child": "I'm nested too!"} | Available from 1.22 |
cross reference | string | more info | ❌ |
Deprecated types
Name | Exact type | Formatting | Array available (example) | Deprecated from |
---|---|---|---|---|
string | string | "string" | ✅ ["string", "second string"] | v1.19 |
Further details on each data type are provided below.
text
Use this type for any text data.
- Properties with the
text
type is used for vectorization and keyword search unless specified otherwise in the property settings. - If using named vectors, the property vectorization is defined in the named vector definition.
- Text properties are tokenized prior to being indexed for keyword/BM25 searches. See collection definition: tokenization for more information.
string
is deprecated
Prior to v1.19
, Weaviate supported an additional datatype string
, which was differentiated by tokenization behavior to text
. As of v1.19
, this type is deprecated and will be removed in a future release.
Use text
instead of string
. text
supports the tokenization options that are available through string
.
Examples
Property definition
- Python Client v4
- JS/TS Client v3
from weaviate.classes.config import Property, DataType, Configure, Tokenization
# Create collection
my_collection = client.collections.create(
name="Movie",
properties=[
Property(
name="title", data_type=DataType.TEXT, tokenization=Tokenization.LOWERCASE
),
Property(
name="movie_id", data_type=DataType.TEXT, tokenization=Tokenization.FIELD
),
Property(
name="genres", data_type=DataType.TEXT_ARRAY, tokenization=Tokenization.WORD
),
],
# Other properties are omitted for brevity
)
import { vectorizer, dataType, tokenization } from 'weaviate-client';
// Create collection
const myCollection = await client.collections.create({
name: 'Movie',
properties: [
{
name: 'title',
dataType: dataType.TEXT,
tokenization: tokenization.LOWERCASE,
},
{
name: 'movie_id',
dataType: dataType.TEXT,
tokenization: tokenization.FIELD,
},
{
name: 'genres',
dataType: dataType.TEXT_ARRAY,
tokenization: tokenization.WORD,
},
],
// Other properties are omitted for brevity
});
Object insertion
- Python Client v4
- JS/TS Client v3
# Create an object
example_object = {
"title": "Rogue One",
"movie_id": "ro123456",
"genres": ["Action", "Adventure", "Sci-Fi"],
}
obj_uuid = my_collection.data.insert(example_object)
const exampleObject = {
title: 'Rogue One',
movie_id: 'ro123456',
genres: ['Action', 'Adventure', 'Sci-Fi'],
}
const obj_uuid = await myCollection.data.insert(exampleObject);
boolean
/ int
/ number
The boolean
, int
, and number
types are used for storing boolean, integer, and floating-point numbers, respectively.
Examples
Property definition
- Python Client v4
- JS/TS Client v3
from weaviate.classes.config import Property, DataType
# Create collection
my_collection = client.collections.create(
name="Product",
properties=[
Property(name="name", data_type=DataType.TEXT),
Property(name="price", data_type=DataType.NUMBER),
Property(name="stock_quantity", data_type=DataType.INT),
Property(name="is_on_sale", data_type=DataType.BOOL),
Property(name="customer_ratings", data_type=DataType.NUMBER_ARRAY),
],
# Other properties are omitted for brevity
)
import { dataType } from 'weaviate-client';
// Create collection
const myCollection = await client.collections.create({
name: 'Product',
properties: [
{
name: 'name',
dataType: dataType.TEXT,
},
{
name: 'price',
dataType: dataType.NUMBER,
},
{
name: 'stock_quantity',
dataType: dataType.INT,
},
{
name: 'is_on_sale',
dataType: dataType.BOOLEAN,
},
{
name: 'customer_ratings',
dataType: dataType.NUMBER_ARRAY,
},
],
// Other properties are omitted for brevity
});
Object insertion
- Python Client v4
- JS/TS Client v3
# Create an object
example_object = {
"name": "Wireless Headphones",
"price": 95.50,
"stock_quantity": 100,
"is_on_sale": True,
"customer_ratings": [4.5, 4.8, 4.2],
}
obj_uuid = my_collection.data.insert(example_object)
const exampleObject = {
name: 'Wireless Headphones',
price: 95.5,
stock_quantity: 100,
is_on_sale: true,
customer_ratings: [4.5, 4.8, 4.2],
};
const obj_uuid = await myCollection.data.insert(exampleObject);
Note: GraphQL and int64
Although Weaviate supports int64
, GraphQL currently only supports int32
, and does not support int64
. This means that currently integer data fields in Weaviate with integer values larger than int32
, will not be returned using GraphQL queries. We are working on solving this issue. As current workaround is to use a string
instead.
date
A date
in Weaviate is represented by an RFC 3339 timestamp in the date-time
format. The timestamp includes the time and an offset.
For example:
"1985-04-12T23:20:50.52Z"
"1996-12-19T16:39:57-08:00"
"1937-01-01T12:00:27.87+00:20"
To add a list of dates as a single entity, use an array of date-time
formatted strings. For example: ["1985-04-12T23:20:50.52Z", "1937-01-01T12:00:27.87+00:20"]
In specific client libraries, you may be able to use the native date object as shown in the following examples.
Examples
Property definition
- Python Client v4
- JS/TS Client v3
from weaviate.classes.config import Property, DataType
from datetime import datetime, timezone
# Create collection
my_collection = client.collections.create(
name="ConcertTour",
properties=[
Property(name="artist", data_type=DataType.TEXT),
Property(name="tour_name", data_type=DataType.TEXT),
Property(name="tour_start", data_type=DataType.DATE),
Property(name="tour_dates", data_type=DataType.DATE_ARRAY),
],
# Other properties are omitted for brevity
)
import { dataType } from 'weaviate-client';
// Create collection
const myCollection = await client.collections.create({
name: 'ConcertTour',
properties: [
{
name: 'artist',
dataType: dataType.TEXT,
},
{
name: 'tour_name',
dataType: dataType.TEXT,
},
{
name: 'tour_start',
dataType: dataType.DATE,
},
{
name: 'tour_dates',
dataType: dataType.DATE_ARRAY,
},
],
// Other properties are omitted for brevity
});
Object insertion
- Python Client v4
- JS/TS Client v3
# Create an object
# In Python, you can use the RFC 3339 format or a datetime object (preferably with a timezone)
example_object = {
"artist": "Taylor Swift",
"tour_name": "Eras Tour",
"tour_start": datetime(2023, 3, 17).replace(tzinfo=timezone.utc),
"tour_dates": [
# Use `datetime` objects with a timezone
datetime(2023, 3, 17).replace(tzinfo=timezone.utc),
datetime(2023, 3, 18).replace(tzinfo=timezone.utc),
# .. more dates
# Or use RFC 3339 format
"2024-12-07T00:00:00Z",
"2024-12-08T00:00:00Z",
],
}
obj_uuid = my_collection.data.insert(example_object)
const exampleObject = {
name: 'Taylor Swift',
tour_name: 'Eras Tour',
tour_start: new Date(2023, 3, 17),
// Use JavaScript Date object
tour_dates: [
new Date(2023, 3, 17),
new Date(2023, 3, 18),
// .. more dates
new Date(2024, 12, 6),
new Date(2024, 12, 7),
],
// // Or, use RFC3339 string
// tour_dates: [
// '2023-03-17T00:00:00Z',
// '2023-03-18T00:00:00Z',
// // .. more dates
// '2024-12-07T00:00:00Z',
// '2024-12-08T00:00:00Z',
// ]
};
const obj_uuid = await myCollection.data.insert(exampleObject);
uuid
v1.19
The dedicated uuid
and uuid[]
data types efficiently store UUIDs.
- Each
uuid
is a 128-bit (16-byte) number. - The filterable index uses roaring bitmaps.
It is currently not possible to aggregate or sort by uuid
or uuid[]
types.
Examples
Property definition
- Python Client v4
- JS/TS Client v3
from weaviate.classes.config import Property, DataType
from weaviate.util import generate_uuid5
# Create collection
my_collection = client.collections.create(
name="Movie",
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="movie_uuid", data_type=DataType.UUID),
Property(name="related_movie_uuids", data_type=DataType.UUID_ARRAY),
],
# Other properties are omitted for brevity
)
import { dataType } from 'weaviate-client';
import { generateUuid5 } from 'weaviate-client';
// Create collection
const myCollection = await client.collections.create({
name: 'Movie',
properties: [
{
name: 'title',
dataType: dataType.TEXT,
},
{
name: 'movie_uuid',
dataType: dataType.UUID,
},
{
name: 'related_movie_uuids',
dataType: dataType.UUID_ARRAY,
},
],
// Other properties are omitted for brevity
});
Object insertion
- Python Client v4
- JS/TS Client v3
# Create an object
example_object = {
"title": "The Matrix",
"movie_uuid": generate_uuid5("The Matrix"),
"related_movie_uuids": [
generate_uuid5("The Matrix Reloaded"),
generate_uuid5("The Matrix Revolutions"),
generate_uuid5("Matrix Resurrections"),
],
}
obj_uuid = my_collection.data.insert(example_object)
const exampleObject = {
title: 'The Matrix',
movie_uuid: generateUuid5('The Matrix'),
related_movie_uuids: [
generateUuid5('The Matrix Reloaded'),
generateUuid5('The Matrix Revolutions'),
generateUuid5('The Matrix Resurrections'),
],
};
const obj_uuid = await myCollection.data.insert(exampleObject);
geoCoordinates
Geo coordinates can be used to find objects in a radius around a query location. A geo coordinate value stored as a float, and is processed as decimal degree according to the ISO standard.
To supply a geoCoordinates
property, specify the latitude
and longitude
as floating point decimal degrees.
Examples
Property definition
- Python Client v4
- JS/TS Client v3
from weaviate.classes.config import Property, DataType
from weaviate.classes.data import GeoCoordinate
# Create collection
my_collection = client.collections.create(
name="City",
properties=[
Property(name="name", data_type=DataType.TEXT),
Property(name="location", data_type=DataType.GEO_COORDINATES),
],
# Other properties are omitted for brevity
)
import { dataType } from 'weaviate-client';
import { generateUuid5 } from 'weaviate-client';
// Create collection
const myCollection = await client.collections.create({
name: 'Movie',
properties: [
{
name: 'title',
dataType: dataType.TEXT,
},
{
name: 'movie_uuid',
dataType: dataType.UUID,
},
{
name: 'related_movie_uuids',
dataType: dataType.UUID_ARRAY,
},
],
// Other properties are omitted for brevity
});
Object insertion
- Python Client v4
- JS/TS Client v3
# Create an object
example_object = {
"name": "Sydney",
"location": GeoCoordinate(latitude=-33.8688, longitude=151.2093),
}
obj_uuid = my_collection.data.insert(example_object)
const exampleObject = {
name: 'Sydney',
location: {
latitude: -33.8688,
longitude: 151.2093
},
};
const obj_uuid = await myCollection.data.insert(exampleObject);
Currently, geo-coordinate filtering is limited to the nearest 800 results from the source location, which will be further reduced by any other filter conditions and search parameters.
If you plan on a densely populated dataset, consider using another strategy such as geo-hashing into a text
datatype, and filtering further, such as with a ContainsAny
filter.
phoneNumber
A phoneNumber
input will be normalized and validated, unlike the single fields as number
and string
. The data field is an object with multiple fields.
{
"phoneNumber": {
"input": "020 1234567", // Required. Raw input in string format
"defaultCountry": "nl", // Required if only a national number is provided, ISO 3166-1 alpha-2 country code. Only set if explicitly set by the user.
"internationalFormatted": "+31 20 1234567", // Read-only string
"countryCode": 31, // Read-only unsigned integer, numerical country code
"national": 201234567, // Read-only unsigned integer, numerical representation of the national number
"nationalFormatted": "020 1234567", // Read-only string
"valid": true // Read-only boolean. Whether the parser recognized the phone number as valid
}
}
There are two fields that accept input. input
must always be set, while defaultCountry
must only be set in specific situations. There are two scenarios possible:
- When you enter an international number (e.g.
"+31 20 1234567"
) to theinput
field, nodefaultCountry
needs to be entered. The underlying parser will automatically recognize the number's country. - When you enter a national number (e.g.
"020 1234567"
), you need to specify the country indefaultCountry
(in this case,"nl"
), so that the parse can correctly convert the number into all formats. The string indefaultCountry
should be an ISO 3166-1 alpha-2 country code.
Weaviate will also add further read-only fields such as internationalFormatted
, countryCode
, national
, nationalFormatted
and valid
when reading back a field of type phoneNumber
.
Examples
Property definition
- Python Client v4
- JS/TS Client v3
from weaviate.classes.config import Property, DataType
from weaviate.classes.data import PhoneNumber
# Create collection
my_collection = client.collections.create(
name="Person",
properties=[
Property(name="name", data_type=DataType.TEXT),
Property(name="phone", data_type=DataType.PHONE_NUMBER),
],
# Other properties are omitted for brevity
)
import { dataType } from 'weaviate-client';
// Create collection
const myCollection = await client.collections.create({
name: 'Person',
properties: [
{
name: 'name',
dataType: dataType.TEXT,
},
{
name: 'phone',
dataType: dataType.PHONE_NUMBER,
},
],
// Other properties are omitted for brevity
});
Object insertion
- Python Client v4
- JS/TS Client v3
# Create an object
example_object = {
"name": "Ray Stantz",
"phone": PhoneNumber(number="212 555 2368", default_country="us"),
}
obj_uuid = my_collection.data.insert(example_object)
const exampleObject = {
name: 'Ray Stantz',
phone: {
number: '212 555 2368',
defaultCountry: 'us'
}
};
const obj_uuid = await myCollection.data.insert(exampleObject);
blob
The datatype blob accepts any binary data. The data should be base64
encoded, and passed as a string
. Characteristics:
- Weaviate doesn't make assumptions about the type of data that is encoded. A module (e.g.
img2vec
) can investigate file headers as it wishes, but Weaviate itself does not do this. - When storing, the data is
base64
decoded (so Weaviate stores it more efficiently). - When serving, the data is
base64
encoded (so it is safe to serve asjson
). - There is no max file size limit.
- This
blob
field is always skipped in the inverted index, regardless of setting. This mean you can not search by thisblob
field in a Weaviate GraphQLwhere
filter, and there is novalueBlob
field accordingly. Depending on the module, this field can be used in module-specific filters (e.g.nearImage
{} in theimg2vec-neural
filter).
To obtain the base64-encoded value of an image, you can run the following command - or use the helper methods in the Weaviate clients - to do so:
cat my_image.png | base64
Examples
Property definition
- Python Client v4
- JS/TS Client v3
from weaviate.classes.config import Property, DataType
# Create collection
my_collection = client.collections.create(
name="Poster",
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="image", data_type=DataType.BLOB),
],
# Other properties are omitted for brevity
)
import { dataType } from 'weaviate-client';
// Create collection
const myCollection = await client.collections.create({
name: 'Poster',
properties: [
{
name: 'title',
dataType: dataType.TEXT,
},
{
name: 'image',
dataType: dataType.BLOB,
},
],
// Other properties are omitted for brevity
});
Object insertion
- Python Client v4
- JS/TS Client v3
# Create an object
example_object = {
"title": "The Matrix",
"image": blob_string
}
obj_uuid = my_collection.data.insert(example_object)
const exampleObject = {
title: 'The Matrix',
image: blob_string
};
const obj_uuid = await myCollection.data.insert(exampleObject);
object
v1.22
The object
type allows you to store nested data as a JSON object that can be nested to any depth.
For example, a Person
collection could have an address
property as an object. It could in turn include nested properties such as street
and city
:
Currently, object
and object[]
datatype properties are not indexed and not vectorized.
Future plans include the ability to index nested properties, for example to allow for filtering on nested properties and vectorization options.
Examples
Property definition
- Python Client v4
- JS/TS Client v3
from weaviate.classes.config import Property, DataType
# Create collection
my_collection = client.collections.create(
name="Person",
properties=[
Property(name="name", data_type=DataType.TEXT),
Property(
name="home_address",
data_type=DataType.OBJECT,
nested_properties=[
Property(
name="street",
data_type=DataType.OBJECT,
nested_properties=[
Property(name="number", data_type=DataType.INT),
Property(name="name", data_type=DataType.TEXT),
],
),
Property(name="city", data_type=DataType.TEXT),
],
),
Property(
name="office_addresses",
data_type=DataType.OBJECT_ARRAY,
nested_properties=[
Property(name="office_name", data_type=DataType.TEXT),
Property(
name="street",
data_type=DataType.OBJECT,
nested_properties=[
Property(name="name", data_type=DataType.TEXT),
Property(name="number", data_type=DataType.INT),
],
),
],
),
],
# Other properties are omitted for brevity
)
import { dataType } from 'weaviate-client';
// Create collection
const myCollection = await client.collections.create({
name: 'Person',
properties: [
{
name: 'name',
dataType: dataType.TEXT,
},
{
name: 'home_address',
dataType: dataType.OBJECT,
nestedProperties: [
{
name: 'street',
dataType: dataType.OBJECT,
nestedProperties: [
{
name: 'number',
dataType: dataType.INT,
},
{
name: 'name',
dataType: dataType.TEXT,
},
],
},
{
name: 'city',
dataType: dataType.TEXT,
},
],
},
{
name: 'office_addresses',
dataType: dataType.OBJECT_ARRAY,
nestedProperties: [
{
name: 'office_name',
dataType: dataType.TEXT,
},
{
name: 'street',
dataType: dataType.OBJECT,
nestedProperties: [
{
name: 'number',
dataType: dataType.INT,
},
{
name: 'name',
dataType: dataType.TEXT,
},
],
},
],
},
],
// Other properties are omitted for brevity
});
Object insertion
- Python Client v4
- JS/TS Client v3
# Create an object
example_object = {
"name": "John Smith",
"home_address": {
"street": {
"number": 123,
"name": "Main Street",
},
"city": "London",
},
"office_addresses": [
{
"office_name": "London HQ",
"street": {"number": 456, "name": "Oxford Street"},
},
{
"office_name": "Manchester Branch",
"street": {"number": 789, "name": "Piccadilly Gardens"},
},
],
}
obj_uuid = my_collection.data.insert(example_object)
const exampleObject = {
name: 'John Smith',
home_address: {
street: {
number: 123,
name: 'Main Street',
},
city: 'London',
},
office_addresses: [
{
office_name: 'London HQ',
street: { number: 456, name: 'Oxford Street' },
},
{
office_name: 'Manchester Branch',
street: { number: 789, name: 'Piccadilly Gardens' },
},
],
};
const obj_uuid = await myCollection.data.insert(exampleObject);
cross-reference
The cross-reference
type allows a link to be created from one object to another. This is useful for creating relationships between collections, such as linking a Person
collection to a Company
collection.
The cross-reference
type objects are arrays
by default. This allows you to link to any number of instances of a given collection (including zero).
For more information on cross-references, see the cross-references. To see how to work with cross-references, see how to manage data: cross-references.
More information
Notes
Formatting in payloads
In raw payloads (e.g. JSON payloads for REST), data types are specified as an array (e.g. ["text"]
, or ["text[]"]
), as it is required for some cross-reference specifications.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.