Because Weaviate is run by using Docker or Kubernetes, you can create a backup of your data by mounting a volume to store the data outside of the containers. When restarting a Weaviate, the data from the mounted volume is used to restore the dataset.
Creating backups is divided into two sections. First, we want to make the setup persistent. Second, we can create backups by copying the folder outside the container that contains the Weaviate DB.
When running Weaviate with docker-compose, you can set the
volumes variable under the
weaviate service and a unique cluster hostname as an environment variable.
services: weaviate: volumes: - /var/weaviate:/var/lib/weaviate environment: CLUSTER_HOSTNAME: 'node1'
- About the volumes
/var/weaviateis the location where you want to store the data on the local machine
/var/lib/weaviate(after the colon) is the location inside the container, don’t change this
- About the hostname
CLUSTER_HOSTNAMEcan be any arbitrarily chosen name
In the case you want a more verbose output, you can change the environment variable for the
services: weaviate: environment: LOG_LEVEL: 'debug'
A complete example of a Weaviate without modules but with an externally mounted volume and more verbose output:
--- version: '3.4' services: weaviate: command: - --host - 0.0.0.0 - --port - '8080' - --scheme - http image: semitechnologies/weaviate:v1.15.2 ports: - 8080:8080 restart: on-failure:0 volumes: - /var/weaviate:/var/lib/weaviate # <== set a volume here environment: QUERY_DEFAULTS_LIMIT: 25 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' PERSISTENCE_DATA_PATH: '/var/lib/weaviate' DEFAULT_VECTORIZER_MODULE: 'none' ENABLE_MODULES: '' CLUSTER_HOSTNAME: 'node1' # <== this can be set to an arbitrary name ...
The folder that you’ve choosen contains as your external volume contains the Weaviate DB. You can simply copy it and store it.
$ mkdir /var/weaviate.BAK $ cp /var/weaviate /var/weaviate.BAK
Running vs. stopped instance
- Ideally, the setup is stopped (
docker-compose down), because an orderly shutdown will flush everything to disk and make sure it can be read easily
- If you create a backup from a running setup, no data is lost, but not all segments have been flushed yet. This means the next startup will recover the data from an active commit log. This will result in a message:
“did Weaviate crash? Trying to recover”. This is slightly slower than an orderly shutdown.
For Kubernetes setup, the only thing to bear in mind is that Weaviate needs a
PersistentVolumeClaims (more info) but the Helm chart is already configured to store the data on an external volume.
If you can’t find the answer to your question here, please look at the: