Skip to main content

Introduction to Weaviate

What is Weaviate?


Weaviate is an open-source vector database. But what does that mean? Let's unpack it here.

Vector database

Weaviate is a fantastic tool for retrieving the information you need, quickly and accurately. It does this by being an amazing vector database.

You may be familiar with traditional databases such as relational databases that use SQL. A database can catalog, store and retrieve information. A vector database can carry out these tasks also, with the key difference being that they can perform these tasks based on similarity.

How traditional searches work

Imagine that you are searching a relational database containing articles on cities, to retrieve a list of "major" European cities. Using SQL, you might construct a query like this:

SELECT city_name wiki_summary
FROM wiki_city
WHERE (wiki_summary LIKE '%major European city%' OR
wiki_summary LIKE '%important European city%' OR
wiki_summary LIKE '%prominent European city%' OR
wiki_summary LIKE '%leading European city%' OR
wiki_summary LIKE '%significant European city%' OR
wiki_summary LIKE '%top European city%' OR
wiki_summary LIKE '%influential European city%' OR
wiki_summary LIKE '%notable European city%')
(and so on)

Which would return cities that contained any of these strings (major, important, prominent, ... etc) in the wiki_summary column.

This works well in many circumstances. However, there are two significant limitations with this approach.

Using this type of search requires you to identify terms that may have been used to describe the concept, which is no easy feat.

What's more, this doesn't solve the problem of how to rank the list of resulting objects.

With the above search query, an entry merely containing a mention of a different European city (i.e. not very relevant) would be given equal weighting to an entry for Paris, or Rome, which would be highly relevant.

A vector database makes this job simpler by enabling searches based on similarity.

Instead of searching for an exact match, you could perform a query to find objects that are "nearest" to "Major European city".

What it would then return is a list of entries that are ranked by their similarity to the query.

In other words, the results would reflect their similarity to the idea, or meaning, of "Major European city".

What's more, Weaviate "indexes" the data based on their similarity, making this type of data retrieval lightning-fast.

Weaviate can help you to do all this, and actually a lot more. Another way to think about Weaviate is that it supercharges the way you use information.

Vector vs semantic search

A vector search is also referred to as a "semantic search" because it returns results based on the similarity of meaning (therefore "semantic").

Open-source

Weaviate is open-source. In other words, its codebase is available online for anyone to see and use(1).

And that is the codebase, regardless of how you use it. So whether you run Weaviate on your own computer, on a cloud computing environment, or through our managed service Weaviate Cloud (WCD), you are using the exact same technology.

So, if you want, you can run Weaviate for free on your own device, or use our managed service for convenience. You can also take comfort in that you can see exactly what you are running, and be a part of the open-source community, as well as to shape its development.

It also means that your knowledge about Weaviate is fungible, between local, cloud, or managed instances of Weaviate. So anything you learn here about Weaviate using WCD will be equally applicable to running it locally, and vice versa. 😉

Information, made dynamic

We are used to thinking of information as static, like a book. But with Weaviate and modern AI-driven language models, we can do much more than just retrieve static information but easily build on top of it. Take a look at these examples:

Question answering

Given a list of Wikipedia entries, you could ask Weaviate:

We asked Weaviate:

When was Lewis Hamilton born?

And it would answer with:

Weaviate responded:

Lewis Hamilton was born on January 7, 1985. (check for yourself)

Or you can synthesize passages using retrieved information with Weaviate:

Here is one, where we searched Weaviate for an entry on a "racing driver", and produce the result in the format of:

We asked Weaviate:

Write a fun tweet encouraging people to read about this: ## {title} by summarizing highlights from: ## {wiki_summary}

Which produces:

Weaviate responded:

Check out the amazing story of Lewis Hamilton, the 7-time Formula One World Drivers' Championship winner! From his humble beginnings to becoming one of the world's most influential people, his journey is an inspiring one. #LewisHamilton #FormulaOne #Motorsport #Racing

We will cover these and many more capabilities, such as vectorization, summarization and classification, in our units.

For now, keep in mind that Weaviate is a vector database at its core which can also leverage AI tools to do more with the retrieved information.

Review

In this section, you learned about what Weaviate is and how it works at a very high level. You have also been introduced to what vector search is at a high level, that it is a similarity-based search method.

Review exercises

  Question
What is the difference in the Weaviate codebase between local and cloud deployments?
  Question
What is the best description of vector search?

Key takeaways

  • Weaviate is an open source vector database.
  • The core Weaviate library is the same whether you run it locally, on the cloud, or with WCD.
  • Vector searches are similarity-based searches.
  • Weaviate can also transform your data after retrieving it before returning it to you.

Notes

(1) Subject to terms of its license, of course.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.