Maintaining data integrity is one of the key goals for database users. So it should come as no surprise that backing up the data is an important part of database best practices.
Although it has always been possible to back up Weaviate data, doing so used to require many manual and inelegant steps. So, we have introduced a backup feature in Weaviate v1.15
that streamlines the backup process, whether it be to a local file system or to a cloud storage provider.
If you have not yet had a chance to use this cool feature, don't worry! This tutorial will show you how to use this feature to back up your data and restore it to another Weaviate instance.
Objectives
By the end of this tutorial, you will have:
- Spun up two instances of Weaviate,
- Populated one Weaviate instance with a new schema & data,
- Backed up the Weaviate instance, and
- Restored the backup data to the other instance.
Preparations
To get started, clone github.com/weaviate-tutorials/weaviate-backup repository and spin up Weaviate:
docker compose up -d
The Docker Compose file (docker-compose.yml
) has been set up to spin up two Weaviate instances for this tutorial. You should be able to connect to them at http://localhost:8080
and http://localhost:8090
respectively. We'll call them W1 and W2 from this point on for convenience.
Think of W1 as the source instance you want to back up, while W2 is the destination instance you want to restore the backup to.
Local backups
The docker-compose.yml
also specifies the below parameters to enable local backups.
environment:
…
ENABLE_MODULES: 'backup-filesystem'
BACKUP_FILESYSTEM_PATH: '/tmp/backups'
volumes:
- ./backups:/tmp/backups
This enables the backup-filesystem
module to back up data from Weaviate to the filesystem, and sets /tmp/backups
as the BACKUP_FILESYSTEM_PATH
, which is the backup path within the Docker container.
The volumes
parameter mounts a volume from outside the container such that Weaviate can save data to it. Setting it to ./backups:/tmp/backups
maps ./backups
on the local device to /tmp/backups
within the container, so the generated backup data will end up in the ./backups
directory as you will see later on.
Now let's dive into it to see the backup functionality in action!
Utility scripts
To begin with, both Weaviate instances W1 and W2 should be empty. In order to get straight to the point of this tutorial, we've prepared a set of scripts (located in the scripts
folder) that will help you prepare your Weaviate instances and test them out.
The tutorial text refers to shell scripts (e.g. 0_query_instances.sh
), but we also provide code examples in other languages including Python and JavaScript. These files are located in scripts
subdirectory and we encourage you to try them out yourself using your favorite client.
If you run the 0_query_instances
script, you should see that neither instances contain a schema.
scripts/0_query_instances.sh
If it is not empty, or if at any point you would like to reset your Weaviate instances, you can run the 9_delete_all
script, which will delete all of the existing schema and data at those locations.
scripts/9_delete_all.sh
Populate W1 with data
As our first order of action, we will populate W1 with data. Run the following to create a schema and import data:
scripts/1_create_schema.sh
scripts/2_import.sh
Now, running the 0_query_instances
script will return results showing the Author
and Book
classes in the schema as well as the objects. Great! We are ready to get to the main point of this tutorial –> back up and restore.
scripts/0_query_instances.sh
Back up & restore data
Now let's move on to creating our first backup. Initiating a backup involves just a short bit of code.
The below curl
command will back up all classes in W1, and call the backup my-very-first-backup
.
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "my-very-first-backup"
}' \
http://localhost:8080/v1/backups/filesystem
backup_id
must be unique.The ID value is used to create a subdirectory in the backup location, and attempting to reuse an existing ID will cause Weaviate to throw an error. Delete the existing directory if one already exists.
Now try running 3_backup
yourself to back up data from W1.
scripts/3_backup.sh
If you check the contents of the backup directory again, you should see a new directory called my-very-first-backup
containing the backup data files.
Restoring this data can be done with a similarly short piece of code. The curl
command below will restore our backup:
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "my-very-first-backup"
}' \
http://localhost:8090/v1/backups/filesystem/my-very-first-backup/restore
Try running 4_restore
yourself to restore the W1 backup data to W2.
scripts/4_restore.sh
Now, check the schemas again for W1 and W2.
scripts/0_query_instances.sh
Do they both now contain the same schema? What about the data objects? They should be identical.
Great. You have successfully backed up data from W1 and restored it onto W2!
Backup features
Before we finish, let's go back to talk a little more about additional backup options, and some important notes.
Local backup location
If you wish to back up your data to a different location, edit the volumes
parameter in docker-compose.yml
to replace ./backups
with the desired location.
For example, changing it from ./backups:/tmp/backups
to ./my_archive:/tmp/backups
would change the backup destination from ./backups
to ./my_archive/
.
Cloud storage systems
Note, you can also configure Weaviate backup to work with cloud storage systems like Google Cloud Storage (GCS) and S3-compatible blob storage (like AWS S3 or MinIO).
Each requires a different Weaviate backup module (backup-gcs
and backup-s3
), configuration parameters and authentication.
Learn more
Check our documentation to learn more about:
Partial backup and restore
Weaviate's backup feature allows you to back up or restore a subset of available classes. This might be very useful in cases where, for example, you may wish to partially export a subset of data to a development environment or import an updated class.
For example, the below curl
command will restore only the Author
class regardless of whether any other classes have been also included in my-very-first-backup
.
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "my-very-first-backup",
"include": ["Author"]
}' \
http://localhost:8090/v1/backups/filesystem/my-very-first-backup/restore
Delete everything in W2 first with 8_delete_w2
, and try out the partial restore with 4a_partial_restore
.
scripts/8_delete_w2.sh
scripts/4a_partial_restore.sh
You should see that W2 will only contain one class even though its data was restored from a backup that contains multiple classes.
The restore function allows you to restore a class as long as the target Weaviate instance does not already contain that class. So if you run another operation to restore the Book
class to W2, it will result in an instance containing both Author
and Book
classes.
Asynchronous operations
In some cases, Weaviate's response to your backup
or restore
request may have "status":"STARTED"
.
Isn't it interesting that the status was not indicative of a completion?
That is because Weaviate's backup operation can be initiated and monitored asynchronously.
This means that you don't need to maintain a connection to the server for the operation to complete. And you can look in on the status of a restore operation with a command like:
curl http://localhost:8090/v1/backups/filesystem/my-very-first-backup/restore
Weaviate remains available for read and write operations while backup operations are ongoing. And you can poll the endpoint to check its status, without worrying about any potential downtime.
Check out 3a_check_backup_status.sh
and 4b_check_restore_status.sh
for examples of how to query W1 for the backup status, or W2 for the restore status respectively.
Wrap-up
That's it for our quick overview of the new backup feature available in Weaviate. We are excited about this feature as it will make it easier and faster for you to back up your data, which will help make your applications more robust.
To recap, Weaviate's new backup feature allows you to back up one or more classes from an instance of Weaviate to a backup, and to restore one or more classes from a backup to Weaviate. Weaviate remains functional during these processes, and you can poll the backup or restore operation status periodically if you wish.
Weaviate currently supports backing up to your local filesystem, AWS or GCS. But as the backup orchestration is decoupled from the remote backup storage backends, it is relatively easy to add new providers and use them with the existing backup API.
If you would like to use another storage provider to use with Weaviate, we encourage you to open a feature request or consider contributing it yourself. For either option, join our Slack community to have a quick chat with us on how to get started.
Ready to start building?
Check out the Quickstart tutorial, or build amazing apps with a free trial of Weaviate Cloud (WCD).
Don't want to miss another blog post?
Sign up for our bi-weekly newsletter to stay updated!
By submitting, I agree to the Terms of Service and Privacy Policy.