Weaviate on Stackoverflow badge Weaviate issues on Github badge Weaviate total Docker pulls badge

💡 You are looking at older or release candidate documentation. The current Weaviate version is v1.15.2

Welcome to the documentation about Weaviate! Here you will find what's Weaviate all about, how to start your own Weaviate instance and interact with it, and how to use it to perform vector searches and classification.

Like what you see? Consider giving us a ⭐ on Github.

Introduction video

What is Weaviate?

Weaviate is a cloud-native, modular, real-time vector search engine built to scale your machine learning models.

  • Highly efficient large scale vector searches
  • State of the Art classification techniques
  • Easily retrieve your data in graph format using GraphQL

You can extend Weaviate with:

  1. An out-of-the box vectorization module which creates vectors from your data on import.
  2. Your existing machine-learning models, import your vectors directly

You can use multiple vectorization modules in one Weaviate instance. For example, if you want to vectorize a text object with an NLP module and an image with an image vectorization module.


Weaviate makes it easy to use state-of-the-art AI models while giving you the scalability, ease of use, safety and cost-effectiveness of a purpose-built vector database. Most notably:

  • Fast queries
    Weaviate typically performs a 10-NN neighbor search out of millions of objects in considerably less than 100ms.

  • Any media type with Weaviate Modules
    Use State-of-the-Art AI model inference (e.g. Transformers) for Text, Images, etc. at search and query time to let Weaviate manage the process of vectorizing your data for your - or import your own vectors.

  • Combine vector and scalar search
    Weaviate allows for efficient combined vector and scalar searches, e.g “articles related to the COVID 19 pandemic published within the past 7 days”. Weaviate stores both your objects and the vectors and make sure the retrieval of both is always efficient. There is no need for a third party object storage.

  • Real-time and persistent
    Weaviate let’s you search through your data even if it’s currently being imported or updated. In addition, every write is written to a Write-Ahead-Log (WAL) for immediately persisted writes - even when a crash occurs.

  • Horizontal Scalability
    Scale Weaviate for your exact needs, e.g. High-Availability, maximum ingestion, largest possible dataset size, maximum queries per second, etc. (Currently under development, ETA Fall 2021)

  • Cost-Effectiveness
    Very large datasets do not need to be kept entirely in memory in Weaviate. At the same time available memory can be used to increase the speed of queries. This allows for a conscious speed/cost trade-off to suit every use case.

  • Graph-like connections between objects
    Make arbitrary connections between your objects in a graph-like fashion to resemble real-life connections between your data points. Traverse those connections using GraphQL.

How does Weaviate work?

Within Weaviate, all individual data objects are based on a class property structure where a vector represents each data object. You can connect data objects (like in a traditional graph) and search for data objects in the vector space.

You can add data to Weaviate through the RESTful API end-points and retrieve data through the GraphQL interface.

Weaviates vector indexing mechanism is modular, and the current available plugin is the Hierarchical Navigable Small World (HNSW) multilayered graph.

What are Weaviate modules?

Weaviate modules are used to extend Weaviate’s capabilities and they are optional. There are Weaviate modules that automatically vectorize your content (i.e., *2vec) or extend Weaviate’s capabilities (often related to the type of vectors you have.) You can also create your own modules. Click here to get started with Weaviate modules.

Why a vector search engine?

If you work with data, you probably work with search engine technology. The best search engines are amazing pieces of software, but because of their core architecture, they come with limitations when it comes to finding the data you are looking for.

Take for example the data object: { "data": "The Eiffel Tower is a wrought iron lattice tower on the Champ de Mars in Paris" }

Storing this in a traditional search engine might leverage inverted indices to index the data. This means that to retrieve the data; you need to search for “Eiffel Tower” or “wrought iron lattice”, etc to find it. But what if you have vast amounts of data and you want the document about the Eiffel Tower but you search for: “landmarks in France”? Traditional search engines can’t help you there and this is where vector search engines enter the stage.

Weaviate uses vector indexing mechanisms at its core to represent the data. The vectorization modules (e.g., the NLP module) vectorizes the above-mentioned data object in a vector-space where the data object sits near the text ”landmarks in France”. This means that Weaviate can’t make a 100% match, but a very high one to show you the results.

The above example is for text (i.e., NLP) but you can use it for any machine learning model that vectorizes, like images, audio, video, genes etcetera.

Last but not least, using a vector search engine for your machine learning is something you do for the same reason you’ve been using a traditional search engine. It allows you to scale fast, search, and classify in realtime and it can be used as a robust production environment.

When should I use Weaviate?

There are four main situations when you should consider using Weaviate.

  • If you don’t like the quality of results that your current search engine gives you.
  • If it is too much work to bring your machine learning models to scale.
  • If you need to classify large datasets fast and near-realtime.
  • If you need to scale your machine learning models to production size

People use Weaviate for cases such as semantic search, image search, similarity search, anomaly detection, power recommendation engines, e-commerce search, data classification in ERP systems, automated data harmonization, cybersecurity threat analysis, and many many more cases.

Get started

Want to get started or want to learn more? These resources might help you further:

More resources

If you can’t find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For issues: Github. Or,
  5. Ask your question in the Slack channel: Slack.