Skip to main content

Embedded Weaviate

Overviewโ€‹

Embedded Weaviate is a new deployment model, which allows you to start a Weaviate instance, straight in your application code using a Weaviate Client library.

Experimental

Embedded Weaviate is still in the Experimental phase.

Some of the APIs and parameters might change over time, as we work towards a perfect implementation.

How does it work?โ€‹

With every Weaviate release we also publish executable Linux binaries (see assets).

This allows launching the Weaviate database server from the client instantiation call, which makes the "installation" step invisible by pushing it to the background:

import weaviate
from weaviate.embedded import EmbeddedOptions

client = weaviate.Client(
embedded_options=EmbeddedOptions()
)

data_obj = {
"name": "Chardonnay",
"description": "Goes with fish"
}

client.data_object.create(data_obj, "Wine")

Embedded optionsโ€‹

The Weaviate server spawned from the client can be configured via parameters passed at instantiation time, and via environment variables. All parameters are optional.

ParameterTypeDescriptionDefault value
persistence_data_pathstringDirectory where the files making up the database are stored.When the XDG_DATA_HOME env variable is set, the default value is:
XDG_DATA_HOME/weaviate/

Otherwise it is:
~/.local/share/weaviate
binary_pathstringDirectory where to download the binary. If deleted, the client will download the binary again.When the XDG_CACHE_HOME env variable is set, the default value is:
XDG_CACHE_HOME/weaviate-embedded/

Otherwise it is:
~/.cache/weaviate-embedded
versionstringVersion takes two types of input:
- version number - for example "1.18.3" or "latest"
- full URL pointing to a Linux AMD64 or ARM64 binary
Latest stable version
portintegerWhich port the Weaviate server will listen to. Useful when running multiple instances in parallel.6666
hostnamestringHostname/IP to bind to.127.0.0.1
additional_env_varskey: valueUseful to pass additional environment variables to the server, such as API keys.
XDG Base Directory standard values

It is not recommended to modify the XDG_DATA_HOME and XDG_CACHE_HOME environment variables from the standard XDG Base Directory values, as that might affect many other (non-Weaviate related) applications and services running on the same server.

Providing the Weaviate version as a URL

To find the full URL for version:

  • head to Weaviate releases,
  • find the Assets section for the required Weaviate version
  • and copy the link to required (name).tar.gz file.

For example, here is the URL of the Weaviate 1.18.2 AMD64 binary: https://github.com/weaviate/weaviate/releases/download/v1.18.2/weaviate-v1.18.2-linux-amd64.tar.gz.

Default modulesโ€‹

The following modules are enabled by default:

  • generative-openai
  • qna-openai
  • ref2vec-centroid
  • text2vec-cohere
  • text2vec-huggingface
  • text2vec-openai

Additional modules can be enabled by setting additional environment variables as laid out above. For instance, to add a module called backup-s3 to the set, you would pass it at instantiation as follows:

Python:

import weaviate
from weaviate.embedded import EmbeddedOptions

client = weaviate.Client(
embedded_options=EmbeddedOptions(
additional_env_vars={
"ENABLE_MODULES":
"backup-s3,text2vec-openai,text2vec-cohere,text2vec-huggingface,ref2vec-centroid,generative-openai,qna-openai"}
)
)

TypeScript:

import weaviate, { EmbeddedClient, EmbeddedOptions } from 'weaviate-ts-embedded';

const client: EmbeddedClient = weaviate.client(
new EmbeddedOptions({
env: {
ENABLE_MODULES: "backup-s3,text2vec-openai,text2vec-cohere,text2vec-huggingface,ref2vec-centroid,generative-openai,qna-openai",
},
})
);

Starting Embedded Weaviate under the hoodโ€‹

Here's what happens behind the scenes when the client uses the embedded options in the instantiation call:

  1. The client downloads a Weaviate release from GitHub and caches it
  2. It then spawns a Weaviate process with a data directory configured to a specific location, and listening to the specified port (by default 6666)
  3. The server's STDOUT and STDERR are piped to the client
  4. The client connects to this server process (e.g. to http://127.0.0.1:6666) and runs the client code
  5. After running the code (when the application terminates), the client shuts down the Weaviate process
  6. The data directory is preserved, so subsequent invocations have access to the data from all previous invocations, across all clients using the embedded option.

Lifecycleโ€‹

The embedded instance will stay alive for as long as the parent application is running.

When the application exits (e.g. due to an exception or by reaching the end of the script), Weaviate will shut down the embedded instance, but the data will persist.

Embedded with Jupyter Notebooks

An Embedded instance will stay alive for as long as the Jupyter notebook is active.

This is really useful, as it will let you experiment and work with your Weaviate projects and examples.

Supported Environmentsโ€‹

Operating Systemsโ€‹

Embedded Weaviate is currently supported on Linux only.

We are actively working to provide support for MacOS. We hope to share an update in the near future.

Language Clientsโ€‹

Pythonโ€‹

The Python client โ€“ v3.15.4 or newer

TypeScriptโ€‹

Due to use of server-side dependencies which are not available in the browser platform, the embedded TypeScript client has been split out into its own project. Therefore the original non-embedded TypeScript client can remain isomorphic.

The TypeScript embedded client simply extends the original TypeScript client, so once instantiated it can be used exactly the same way to interact with Weaviate. It can be installed with the following command:

npm install weaviate-ts-embedded

GitHub repositories:

More Resourcesโ€‹

If you can't find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For more involved discussion: Weaviate Community Forum. Or,
  5. We also have a Slack channel.