Skip to main content

REST - /v1/batch

caution
This section of the documentation is deprecated and will be removed in the future.
Please refer to the OpenAPI documentation for the most up-to-date information.

For client examples, see this section.

Batch create objects

For sending data objects to Weaviate in bulk.

Performance

For best performance, we recommend using batching for insertion and deletion. Also consider that:

  1. The vectorization module/tool may be acting as a bottleneck.
  2. Avoid duplicate vectors for multiple data objects.
  3. Object-level errors may occur even if the batch request is successful.
  4. If your import slows down after a particular number of objects (e.g. 2M), check to see if the vectorCacheMaxObjects in your schema is larger than the number of objects. Also, see this example.

Method and URL

POST /v1/batch/objects[?consistency_level=ONE|QUORUM|ALL]

Parameters

The URL supports an optional consistency level query parameter:

NameLocationTypeDescription
consistency_levelquery paramstringOptional consistency level: ONE, QUORUM (default) or ALL.

The POST body requires the following field:

NameTypeRequiredDescription
objectsarray of data objectsyesArray of objects

Example request

import weaviate
import weaviate.classes as wvc
from weaviate.util import generate_uuid5

client = weaviate.connect_to_local()

try:
first_object_props = {"name": "Jane Doe"}
first_object_uuid = generate_uuid5(first_object_props)

with client.batch.fixed_size( # client.batch.dynamic() or client.batch.rate_limit() also possible
batch_size=100,
consistency_level=wvc.ConsistencyLevel.QUORUM
) as batch:
# Add objects to the batch, e.g.
batch.add_object(
collection="Author",
properties=first_object_props,
uuid=first_object_uuid,
# tenant="tenantA" # Optional; specify the tenant in multi-tenancy collections
)

finally:
client.close()

Batch create references

For batch adding cross-references between data objects in bulk.

Method and URL

POST /v1/batch/references

Parameters

The URL supports an optional consistency level query parameter:

NameLocationTypeDescription
consistency_levelquery paramstringOptional consistency level: ONE, QUORUM (default) or ALL.

The POST body is an array of elements with the following fields:

NameTypeRequiredDescription
fromWeaviate Beacon (long-form)yesThe beacon, with the cross-reference property name at the end: weaviate://localhost/{CollectionName}/{id}/{crefPropertyName}
toWeaviate Beacon (regular)yesThe beacon, formatted as weaviate://localhost/{CollectionName}/{id}
note

For backward compatibility, you can omit the collection name in the short-form beacon format that is used for to. You can specify it as weaviate://localhost/{id}. This is, however, considered deprecated and will be removed with a future release, as duplicate IDs across collections could mean that this beacon is not uniquely identifiable. For the long-form beacon - used as part of from - you always need to specify the full beacon, including the reference property name.

Example request

import weaviate
import weaviate.classes as wvc
from weaviate.util import generate_uuid5

client = weaviate.connect_to_local()

try:
first_object_props = {"name": "Jane Doe"}
first_object_uuid = generate_uuid5(first_object_props)

with client.batch.fixed_size( # client.batch.dynamic() or client.batch.rate_limit(requests_per_minute=<N>) also possible
batch_size=100,
consistency_level=wvc.ConsistencyLevel.QUORUM
) as batch:
# Add references to the batch, e.g.
batch.add_reference(
from_collection="Author",
from_property="writesFor",
from_uuid=first_object_uuid,
to=first_target_uuid,
# tenant="tenantA" # Optional; specify the tenant in multi-tenancy collections
)

finally:
client.close()

Batch delete

You can use the HTTP verb DELETE on the /v1/batch/objects endpoint to delete all objects that match a particular expression.

The request body takes a single where Filter and will delete all objects matched. It also returns the number of matched objects and potential errors. Note that there is a limit to the number of objects to be deleted at once using this filter.

Maximum number of deletes per query

There is an upper limit (QUERY_MAXIMUM_RESULTS) to how many objects can be deleted using a single query. This protects against unexpected memory surges and very-long-running requests which would be prone to client-side timeouts or network interruptions.

Objects are deleted in the same order that they would be returned in an equivalent Get query. To delete more objects than the limit, run the same query multiple times until no objects are matched anymore.

The default QUERY_MAXIMUM_RESULTS value is 10,000. This may be configurable, e.g. in the environment variables.

Dry-run before deletion

Set the dry-run option to show which objects would be matched using the specified filter without deleting any objects. Depending on the configured verbosity, you will either receive a count of affected objects, or a list of IDs.

Method and URL

DELETE /v1/batch/objects[?consistency_level=ONE|QUORUM|ALL]

Parameters

The URL supports an optional consistency level query parameter:

NameLocationTypeDescription
consistency_levelquery paramstringOptional consistency level: ONE, QUORUM (default) or ALL.

The body requires the following fields:

NameTypeRequiredDescription
matchobjectyesObject outlining how to find the objects to be deleted (see example below)
outputstringnoOptional verbosity level, minimal (default) or verbose
dryRunboolnoIf true, objects will not be deleted yet, but merely listed. Defaults to false.

A request body in detail

{
"match": {
"class": "<CollectionName>", # required
"where": { /* where filter object */ }, # required
},
"output": "<output verbosity>", # Optional, one of "minimal" or "verbose". Defaults to "minimal".
"dryRun": <bool> # Optional. If true, objects will not be deleted yet, but merely listed. Defaults to "false".
}

Possible values for output:

ValueEffect
minimalThe result only includes counts. Information about objects is omitted if the deletes were successful. Only if an error occurred, will the object be described.
verboseThe result lists all affected objects with their ID and deletion status, including both successful and unsuccessful deletes.

A response body in detail

{
"match": {
"class": "<CollectionName>", # matches the request
"where": { /* where filter object */ }, # matches the request
},
"output": "<output verbosity>", # matches the request
"dryRun": <bool>,
"results": {
"matches": "<int>", # how many objects were matched by the filter
"limit": "<int>", # the most amount of objects that can be deleted in a single query, matches QUERY_MAXIMUM_RESULTS
"successful": "<int>", # how many objects were successfully deleted in this round
"failed": "<int>", # how many objects should have been deleted but could not be deleted
"objects": [{ # one JSON object per weaviate object
"id": "<id>", # this successfully deleted object would be omitted with output=minimal
"status": "SUCCESS", # possible status values are: "SUCCESS", "FAILED", "DRYRUN"
"error": null
}, {
"id": "<id>", # this error object will always be listed, even with output=minimal
"status": "FAILED",
"errors": {
"error": [{
"message": "<error-string>"
}]
}
}]
}
}

Example request

import weaviate
import weaviate.classes as wvc
from weaviate.util import generate_uuid5

client = weaviate.connect_to_local()

try:
authors = client.collections.get("Author")
# authors = authors.with_tenant("tenantA") # Optional; specify the tenant in multi-tenancy collections
# authors = authors.with_consistency_level(wvc.config.ConsistencyLevel.QUORUM) # Optional; specify the consistency level

response = authors.data.delete_many(
where=wvc.query.Filter.by_property("name").equal("Jane Doe"),
verbose=True,
dry_run=False,
)

print(f"Matched {response.matches} objects.")
print(f"Deleted {response.successful} objects.")

finally:
client.close()

Multi-tenancy

You can use batching in collections with multi-tenancy is enabled. For example, batch creation of objects works similarly to single object creation, by passing the tenant parameter in the object body.

Error handling

You can check if an error occurred, and of what kind.

Errors may occur on a batch request, for example when the connection to Weaviate is lost or when there is a mistake in any data objects.

A batch request will always return an HTTP 200 status code when the batch request was successful. This indicates that:

  • The batch was successfully sent to Weaviate.
  • There were no issues with the connection or processing of the batch.
  • The request was not malformed (4xx status code).

However, a 200 status code does not guarantee that each batch item has been added/created. For example, adding an object to the batch that is in conflict with the schema (for example a non-existing collection name) will cause an error.

Accordingly, we recommend you check the response body for errors.

Python client library

The Weaviate Python client library provides additional functionalities for batch imports. For example, the latest Python client library supports various batching modes, as well as improved error handling.

Please refer to the client documentation for more detail.

note

The v4 Python client's batching is handled via the gRPC API. Please refer to the client documentation for client-specific discussions.

Notes

caution

In the beacon format, you need to always use localhost as the host, rather than the actual hostname. localhost refers to the fact that the beacon's target is on the same Weaviate instance, as opposed to a foreign instance.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.