Skip to main content

Read all objects

Overviewโ€‹

Sometimes, you may wish to list every object in a class, such as for manual backup when the backup feature is not suitable. You may also wish to then restore these objects as well, to a different Weaviate instance for example.

The best way to do this is with the after parameter, also called the cursor API.

Alternative ordering not possible

The after parameter is based on the order of IDs. No other ordering of data, such as sorting or searching, is possible.

Retrieve and restore objectsโ€‹

List every objectโ€‹

You can list (i.e. retrieve) every object as shown in the below example, looping through the data with the after parameter.

The below example connects to a "source" instance at https://some-endpoint.weaviate.network. It also defines a function that will fetch a group of objects (and their title property) in the WineReview class from using the ID of the last object retrieved as the cursor:

import weaviate

source_client = weaviate.Client(
url="https://some-endpoint.weaviate.network", # Replace with your endpoint
auth_client_secret=weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # If auth enabled. Replace w/ your Weaviate instance API key
)

batch_size = 20
class_name = "WineReview"
class_properties = ["title"]
cursor = None


def get_batch_with_cursor(client, class_name, class_properties, batch_size, cursor=None):

query = (
client.query.get(class_name, class_properties)
.with_additional(["id"])
.with_limit(batch_size)
)

if cursor is not None:
return query.with_after(cursor).do()
else:
return query.do()

Fetch the schemaโ€‹

You can fetch the existing class definition like this:

class_schema = source_client.schema.get(class_name)

Restore to a target instanceโ€‹

And then restore to a target instance, by:

  • Creating the same class in the target instance using the fetched class definition, and
  • Then streaming the objects from the source instance to the target instance using batch imports.
target_client = weaviate.Client(
url="https://anon-endpoint.weaviate.network", # Replace with your endpoint
)

target_client.schema.create_class(class_schema)

with target_client.batch(
batch_size=50,
) as batch:

# Batch import all objects to the target instance
while True:
# From the SOURCE instance, get the next group of objects
results = get_batch_with_cursor(source_client, class_name, class_properties, batch_size, cursor)

# If empty, we're finished
if len(results["data"]["Get"][class_name]) == 0:
break

# Otherwise, add the objects to the batch to be added to the target instance
objects_list = results["data"]["Get"][class_name]
aggregate_count += len(objects_list)

for retrieved_object in objects_list:
new_object = dict()
for prop in class_properties:
new_object[prop] = retrieved_object[prop]
target_client.batch.add_data_object(new_object, class_name=class_name)

# Update the cursor
cursor = results["data"]["Get"][class_name][-1]["_additional"]["id"]

Putting it togetherโ€‹

Putting the pieces together, the below example will retrieve all objects and the schema from the WineReview class from https://some-endpoint.weaviate.network and populate https://anon-endpoint.weaviate.network with the same:

# Retrieve data
import weaviate

source_client = weaviate.Client(
url="https://some-endpoint.weaviate.network", # Replace with your endpoint
auth_client_secret=weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # If auth enabled. Replace w/ your Weaviate instance API key
)

batch_size = 20
class_name = "WineReview"
class_properties = ["title"]
cursor = None


def get_batch_with_cursor(client, class_name, class_properties, batch_size, cursor=None):

query = (
client.query.get(class_name, class_properties)
.with_additional(["id"])
.with_limit(batch_size)
)

if cursor is not None:
return query.with_after(cursor).do()
else:
return query.do()
# Use this function to retrieve data


# Fetch the schema
class_schema = source_client.schema.get(class_name)
# Finished fetching the schema

# Restore to a new (target) instance
target_client = weaviate.Client(
url="https://anon-endpoint.weaviate.network", # Replace with your endpoint
)

target_client.schema.create_class(class_schema)

with target_client.batch(
batch_size=50,
) as batch:

# Batch import all objects to the target instance
while True:
# From the SOURCE instance, get the next group of objects
results = get_batch_with_cursor(source_client, class_name, class_properties, batch_size, cursor)

# If empty, we're finished
if len(results["data"]["Get"][class_name]) == 0:
break

# Otherwise, add the objects to the batch to be added to the target instance
objects_list = results["data"]["Get"][class_name]
aggregate_count += len(objects_list)

for retrieved_object in objects_list:
new_object = dict()
for prop in class_properties:
new_object[prop] = retrieved_object[prop]
target_client.batch.add_data_object(new_object, class_name=class_name)

# Update the cursor
cursor = results["data"]["Get"][class_name][-1]["_additional"]["id"]

More Resourcesโ€‹

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For more involved discussion: Weaviate Community Forum. Or,
  5. We also have a Slack channel.