Skip to main content

Tutorial - Backup and Restore in Weaviate

· 8 min read
Stefan Bogdan

Tutorial - Backup and Restore in Weaviate

Maintaining data integrity is one of the key goals for database users. So it should come as no surprise that backing up the data is an important part of database best practices.

Although it has always been possible to back up Weaviate data, doing so used to require many manual and inelegant steps. So, we have introduced a backup feature in Weaviate v1.15 that streamlines the backup process, whether it be to a local file system or to a cloud storage provider.

If you have not yet had a chance to use this cool feature, don't worry! This tutorial will show you how to use this feature to back up your data and restore it to another Weaviate instance.

Objectives

By the end of this tutorial, you will have:

  • Spun up two instances of Weaviate,
  • Populated one Weaviate instance with a new schema & data,
  • Backed up the Weaviate instance, and
  • Restored the backup data to the other instance.

Preparations

To get started, clone github.com/weaviate-tutorials/weaviate-backup repository and spin up Weaviate:

docker compose up -d

The Docker Compose file (docker-compose.yml) has been set up to spin up two Weaviate instances for this tutorial. You should be able to connect to them at http://localhost:8080 and http://localhost:8090 respectively. We'll call them W1 and W2 from this point on for convenience.

Think of W1 as the source instance you want to back up, while W2 is the destination instance you want to restore the backup to.

Local backups

The docker-compose.yml also specifies the below parameters to enable local backups.

environment:

ENABLE_MODULES: 'backup-filesystem'
BACKUP_FILESYSTEM_PATH: '/tmp/backups'
volumes:
- ./backups:/tmp/backups

This enables the backup-filesystem module to back up data from Weaviate to the filesystem, and sets /tmp/backups as the BACKUP_FILESYSTEM_PATH, which is the backup path within the Docker container.

The volumes parameter mounts a volume from outside the container such that Weaviate can save data to it. Setting it to ./backups:/tmp/backups maps ./backups on the local device to /tmp/backups within the container, so the generated backup data will end up in the ./backups directory as you will see later on.

Now let's dive into it to see the backup functionality in action!

Utility scripts

To begin with, both Weaviate instances W1 and W2 should be empty. In order to get straight to the point of this tutorial, we've prepared a set of scripts (located in the scripts folder) that will help you prepare your Weaviate instances and test them out.

Examples available in other languages

The tutorial text refers to shell scripts (e.g. 0_query_instances.sh), but we also provide code examples in other languages including Python and JavaScript. These files are located in scripts subdirectory and we encourage you to try them out yourself using your favorite client.

If you run the 0_query_instances script, you should see that neither instances contain a schema.

scripts/0_query_instances.sh

If it is not empty, or if at any point you would like to reset your Weaviate instances, you can run the 9_delete_all script, which will delete all of the existing schema and data at those locations.

scripts/9_delete_all.sh

Populate W1 with data

As our first order of action, we will populate W1 with data. Run the following to create a schema and import data:

scripts/1_create_schema.sh
scripts/2_import.sh

Now, running the 0_query_instances script will return results showing the Author and Book classes in the schema as well as the objects. Great! We are ready to get to the main point of this tutorial –> back up and restore.

scripts/0_query_instances.sh

Back up & restore data

Now let's move on to creating our first backup. Initiating a backup involves just a short bit of code.

The below curl command will back up all classes in W1, and call the backup my-very-first-backup.

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "my-very-first-backup"
}' \
http://localhost:8080/v1/backups/filesystem
The backup_id must be unique.

The ID value is used to create a subdirectory in the backup location, and attempting to reuse an existing ID will cause Weaviate to throw an error. Delete the existing directory if one already exists.

Now try running 3_backup yourself to back up data from W1.

scripts/3_backup.sh

If you check the contents of the backup directory again, you should see a new directory called my-very-first-backup containing the backup data files.

Restoring this data can be done with a similarly short piece of code. The curl command below will restore our backup:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "my-very-first-backup"
}' \
http://localhost:8090/v1/backups/filesystem/my-very-first-backup/restore

Try running 4_restore yourself to restore the W1 backup data to W2.

scripts/4_restore.sh

Now, check the schemas again for W1 and W2.

scripts/0_query_instances.sh

Do they both now contain the same schema? What about the data objects? They should be identical.

Great. You have successfully backed up data from W1 and restored it onto W2!

Backup features

Before we finish, let's go back to talk a little more about additional backup options, and some important notes.

Local backup location

If you wish to back up your data to a different location, edit the volumes parameter in docker-compose.yml to replace ./backups with the desired location.

For example, changing it from ./backups:/tmp/backups to ./my_archive:/tmp/backups would change the backup destination from ./backups to ./my_archive/.

Cloud storage systems

Note, you can also configure Weaviate backup to work with cloud storage systems like Google Cloud Storage (GCS) and S3-compatible blob storage (like AWS S3 or MinIO).

Each requires a different Weaviate backup module (backup-gcs and backup-s3), configuration parameters and authentication.

Learn more

Check our documentation to learn more about:

Partial backup and restore

Weaviate's backup feature allows you to back up or restore a subset of available classes. This might be very useful in cases where, for example, you may wish to partially export a subset of data to a development environment or import an updated class.

For example, the below curl command will restore only the Author class regardless of whether any other classes have been also included in my-very-first-backup.

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "my-very-first-backup",
"include": ["Author"]
}' \
http://localhost:8090/v1/backups/filesystem/my-very-first-backup/restore

Delete everything in W2 first with 8_delete_w2, and try out the partial restore with 4a_partial_restore.

scripts/8_delete_w2.sh
scripts/4a_partial_restore.sh

You should see that W2 will only contain one class even though its data was restored from a backup that contains multiple classes.

The restore function allows you to restore a class as long as the target Weaviate instance does not already contain that class. So if you run another operation to restore the Book class to W2, it will result in an instance containing both Author and Book classes.

Asynchronous operations

In some cases, Weaviate's response to your backup or restore request may have "status":"STARTED".
Isn't it interesting that the status was not indicative of a completion?

That is because Weaviate's backup operation can be initiated and monitored asynchronously.

This means that you don't need to maintain a connection to the server for the operation to complete. And you can look in on the status of a restore operation with a command like:

curl http://localhost:8090/v1/backups/filesystem/my-very-first-backup/restore

Weaviate remains available for read and write operations while backup operations are ongoing. And you can poll the endpoint to check its status, without worrying about any potential downtime.

Check out 3a_check_backup_status.sh and 4b_check_restore_status.sh for examples of how to query W1 for the backup status, or W2 for the restore status respectively.

Wrap-up

That's it for our quick overview of the new backup feature available in Weaviate. We are excited about this feature as it will make it easier and faster for you to back up your data, which will help make your applications more robust.

To recap, Weaviate's new backup feature allows you to back up one or more classes from an instance of Weaviate to a backup, and to restore one or more classes from a backup to Weaviate. Weaviate remains functional during these processes, and you can poll the backup or restore operation status periodically if you wish.

Weaviate currently supports backing up to your local filesystem, AWS or GCS. But as the backup orchestration is decoupled from the remote backup storage backends, it is relatively easy to add new providers and use them with the existing backup API.

If you would like to use another storage provider to use with Weaviate, we encourage you to open a feature request or consider contributing it yourself. For either option, join our Slack community to have a quick chat with us on how to get started.

Stay connected

Thank you so much for reading! If you would like to talk to us more about this topic, please connect with us: