Introduction to Weaviate
What is Weaviate?
Weaviate is an open-source vector database. But what does that mean? Let's unpack it here.
Vector database
Weaviate is a fantastic tool for retrieving the information you need, quickly and accurately. It does this by being an amazing vector database.
You may be familiar with traditional databases such as relational databases that use SQL. A database can catalog, store and retrieve information. A vector database can carry out these tasks also, with the key difference being that they can perform these tasks based on similarity.
How traditional searches work
Imagine that you are searching a relational database containing articles on cities, to retrieve a list of "major" European cities. Using SQL, you might construct a query like this:
SELECT city_name wiki_summary
FROM wiki_city
WHERE (wiki_summary LIKE '%major European city%' OR
wiki_summary LIKE '%important European city%' OR
wiki_summary LIKE '%prominent European city%' OR
wiki_summary LIKE '%leading European city%' OR
wiki_summary LIKE '%significant European city%' OR
wiki_summary LIKE '%top European city%' OR
wiki_summary LIKE '%influential European city%' OR
wiki_summary LIKE '%notable European city%')
(… and so on)
Which would return cities that contained any of these strings (major
, important
, prominent
, ... etc) in the wiki_summary
column.
This works well in many circumstances. However, there are two significant limitations with this approach.
Limitations of traditional search
Using this type of search requires you to identify terms that may have been used to describe the concept, which is no easy feat.
What's more, this doesn't solve the problem of how to rank the list of resulting objects.
With the above search query, an entry merely containing a mention of a different European city (i.e. not very relevant) would be given equal weighting to an entry for Paris, or Rome, which would be highly relevant.
A vector database makes this job simpler by enabling searches based on similarity.
Examples of vector search
Instead of searching for an exact match, you could perform a query to find objects that are "nearest" to "Major European city".
What it would then return is a list of entries that are ranked by their similarity to the query.
In other words, the results would reflect their similarity to the idea, or meaning, of "Major European city".
What's more, Weaviate "indexes" the data based on their similarity, making this type of data retrieval lightning-fast.
Weaviate can help you to do all this, and actually a lot more. Another way to think about Weaviate is that it supercharges the way you use information.
A vector search is also referred to as a "semantic search" because it returns results based on the similarity of meaning (therefore "semantic").
Open-source
Weaviate is open-source. In other words, its codebase is available online for anyone to see and use(1).
And that is the codebase, regardless of how you use it. So whether you run Weaviate on your own computer, on a cloud computing environment, or through our managed service Weaviate Cloud (WCD), you are using the exact same technology.
So, if you want, you can run Weaviate for free on your own device, or use our managed service for convenience. You can also take comfort in that you can see exactly what you are running, and be a part of the open-source community, as well as to shape its development.
It also means that your knowledge about Weaviate is fungible, between local, cloud, or managed instances of Weaviate. So anything you learn here about Weaviate using WCD will be equally applicable to running it locally, and vice versa. 😉
Information, made dynamic
We are used to thinking of information as static, like a book. But with Weaviate and modern AI-driven language models, we can do much more than just retrieve static information but easily build on top of it. Take a look at these examples:
Question answering
Given a list of Wikipedia entries, you could ask Weaviate:
When was Lewis Hamilton born?
And it would answer with:
Lewis Hamilton was born on January 7, 1985. (check for yourself)
Generative search
Or you can synthesize passages using retrieved information with Weaviate:
Here is one, where we searched Weaviate for an entry on a "racing driver", and produce the result in the format of:
Write a fun tweet encouraging people to read about this: ## {title} by summarizing highlights from: ## {wiki_summary}
Which produces:
Check out the amazing story of Lewis Hamilton, the 7-time Formula One World Drivers' Championship winner! From his humble beginnings to becoming one of the world's most influential people, his journey is an inspiring one. #LewisHamilton #FormulaOne #Motorsport #Racing
We will cover these and many more capabilities, such as vectorization, summarization and classification, in our units.
For now, keep in mind that Weaviate is a vector database at its core which can also leverage AI tools to do more with the retrieved information.
Review
In this section, you learned about what Weaviate is and how it works at a very high level. You have also been introduced to what vector search is at a high level, that it is a similarity-based search method.
Review exercises
Key takeaways
- Weaviate is an open source vector database.
- The core Weaviate library is the same whether you run it locally, on the cloud, or with WCD.
- Vector searches are similarity-based searches.
- Weaviate can also transform your data after retrieving it before returning it to you.
Notes
(1) Subject to terms of its license, of course.Questions and feedback
If you have any questions or feedback, let us know in the user forum.