Vectors - An overview
What is a vector?
We've covered that Weaviate is a vector database, and that a vector search is similarity-based. But what is a vector?
A vector in this context is just a series of numbers - like [1, 0]
or [0.513, 0.155, 0.983, ..., 0.001, 0.932]
. Vectors like these are used to capture meaning as a series of numbers.
This might seem like an odd concept. But in fact, many people have used vectors already without realizing - for example if they have tried photo editing, or MS Paint.
How do numbers represent meaning?
The RGB system use numbers to represent colors. For example:
- (255, 0, 0) = red
- (80, 200, 120) = emerald
In these examples, each number can be thought of as a dial for how red, green or blue a color is.
Now, imagine having hundreds, or even thousands of these dials. That’s how vectors are used to represent meaning. Modern models such as GPT-x, or those used with Weaviate use vectors in this manner to represent some "essence", or "meaning" of objects. And this can be done for any object type, such as text, code, images, videos and more.
Each vector representation of such "meaning" is called a vector embedding.
Vector embeddings in Weaviate
Weaviate enables vector searches by indexing and storing data objects and corresponding vector embeddings from machine learning models.
In plain terms, Weaviate processes and organizes your data in such a way that objects can be retrieved based on their similarity to a query. In order for it to perform these tasks at speed, Weaviate does two things that traditional databases do not. They are:
- Quantifying similarity, and
- Indexing vector data
These aspects enable Weaviate to do what it does.
Quantifying similarity
As we've mentioned, vector searches are similarity-based, but what does that actually mean? How do we determine that two pieces of data are "similar"? What does it mean for two pieces of text, two images, or two objects in general, to be similar?
This is a relatively simple idea that is actually incredibly interesting and intricate once we start to dive into the details.
But for now, you should know that machine learning (ML) models are key to this whole process. Similar models to those that allows clever text generation from prompts power vector searches. Instead of generating new text, here these models capture "meaning" of pieces of text or other media. We will cover this in more detail later on.
Indexing (vector) data
Vector searches can be very computationally intensive.
To overcome this problem, Weaviate uses a combination of indexes including an approximate nearest neighbor (ANN) index and an inverted index. They respectively allow Weaviate to perform extremely fast vector searches, as well as to filter data using Boolean criteria on data.
We will get into this in more detail later - but for now, it's enough to know that Weaviate can perform fast vector searches as well as filtering.
Review
In this section, you learned about what vectors are and how Weaviate utilizes them at a very high level. You have also been introduced to Weaviate's two key capabilities that helps it to enable vector search at speed.
Review exercise
Can you describe, in your own words, what vectors are?
Key takeaways
- A vector is a series of numbers that capture the meaning or essence of objects.
- Machine learning models help quantify similarity between different objects, which is essential for vector searches.
- Weaviate uses a combination of approximate nearest neighbor (ANN) index and an inverted index to perform fast vector searches with filtering.