Weaviate attracts different users with various backgrounds. Some have been working with containers for years, but we understand that not everyone has. Inspired by a few recent questions and comments about Docker on the Weaviate Slack, I've set out to write an article to provide a better background on Docker and containers in general. After reading this article, your most common questions about these technologies should be answered and there should be nothing in the way of building amazing use cases with Weaviate.
In the v1.0 release of Weaviate (docs — Github) we introduced the concept of modules. Weaviate modules are used to extend the vector search engine with vectorizers or functionality that can be used to query your dataset. With the release of Weaviate v1.2, we have introduced the use of transformers (DistilBERT, BERT, RoBERTa, Sentence-BERT, etc) to vectorize and semantically search through your data.
With the rising popularity of machine learning models, the demand for vector similarity search solutions has also increased dramatically. Machine learning models typically output vectors and common search queries involve finding the closest set of related vectors. For example, for a text-based vector search the search query "landmarks in Paris" would be encoded to a vector, it is then the job of the search engine to find the documents with the vector closest to this query. This might be a document with the title "Eiffel Tower" whose vector matched the search vector most closely.
Choosing a good API, its design and development, is a crucial but time-consuming process, especially if you want to develop one in an ongoing software development project.
In this article, I want to share the history of Weaviate, how the concept was born, and where we are heading towards in the near future.
These days, more and more organizations are adopting a data-driven culture. Business processes and customer experience benefit from good data collection, management and analysis. But in order to really benefit from available data, it is essential to also understand the unstructured data, like free text in PDF documents, emails, invoices or voice transcriptions. Unstructured data is especially hard to index, manage and understand. Since around 80% of all data is unstructured, it is hard to actually search and retrieve insights from most of the data.
The Weaviate search engine unlocks the potential of unstructured data. Searching by fuzzy terms and classification of rich data like free text becomes possible with Weaviate. It uses AI-driven indexing and search technologies to enable real-time text processing. With machine learning methods, Weaviate automatically classifies texts. Finding information you are looking for and providing recommendations is possible because knowledge and information is placed in context.