Without a doubt, 2022 has been the most exciting year for Weaviate so far. The company and the product have grown tremendously, and we are incredibly excited about 2023. Weaviate’s usage numbers are through the roof, and so are your feedback and requests for what you’re still missing from Weaviate.
In this blog post, I will introduce you to the six pillars outlining how Weaviate will get even better in 2023. Weaviate development is highly dynamic – we don’t waterfall-plan for the entire year – but nevertheless, we want to give you the best possible overview of what to expect in the coming year.
The Six Pillars for 2023
Ingestion and Search Pipelines
Weaviate’s strong and growing module ecosystem gives you plenty of flexibility. Whether you use Weaviate as a pure vector search engine or with the addition of vectorizer, reader, and generator modules, you can always configure it to your liking. In early 2023 we even saw the addition of the
generative-openai module (with other generative modules to come).
We want to give you even more flexibility in combining these steps this year. You can control arbitrary querying steps through the proposed Pipe API, such as reading, re-ranking, summarizing, generating, and others. Similarly, we want to give you more flexibility during ingestion time: how about extracting PDFs or applying stemming to your BM25 and hybrid search?
Beyond Billion Scale: Large-Scale Performance
In 2022, we published the Sphere Demo Dataset for Weaviate. This marked the first time (to our knowledge) that more than a billion objects and vectors were imported into Weaviate. Dealing with ever-growing datasets is not only about being able to handle their size. Our users run complex queries in production and often have strict latency requirements. This pillar is all about performance. The first big step will be the move towards a Native Roaring Bitmap Index. In the most extreme case, this new index time can speed up filtered vector search by a factor of 1000. But it doesn’t stop there; we are already thinking about the next steps. Whether you want faster aggregations or new types of specialized indexes, we will ensure you can hit all your p99 latency targets with Weaviate.
Cloud Operations & Scaling
When we introduced Replication to Weaviate in late 2022, we celebrated a significant milestone. It’s never been easier to achieve a highly available setup, and you can even dynamically scale your setup to increase throughput. 2023 is all about improving your cloud and operations experience. We will give you more control over how to structure your workloads in a distributed setup and more flexibility to adapt to your ever-changing needs. And, of course, we’re constantly working on making your distributed cluster even more resilient.
Speaking of Cloud, arguably the easiest way to spin up a new use case with Weaviate is through the Weaviate Cloud Services. Having spent some time in private beta, the public beta is approaching rapidly.
New Vector Indexes
Last year we gave you a sneak peek into our Vector Indexing Research, and this year you will be able to try out new vector indexes for yourself. Since the beginning, Weaviate has supported vector indexing with HNSW, which leads to best-in-class query times. But not every use case requires single-digit millisecond latencies. Instead, some prefer cost-effectiveness. Due to its relatively high memory footprint, HNSW is only cost-efficient in high-throughput scenarios. However, HNSW is inherently optimized for in-memory access. Simply storing the index or vectors on disk or memory-mapping the index kills performance.
This is why we will offer you not just one but two memory-saving options to index your vectors without sacrificing latency and throughput. In early 2023, you will be able to use Product Quantization, a vector compression algorithm, in Weaviate for the first time. We are already working on a fully disk-based solution which we will release in late 2023.
Improving our Client and Module Ecosystem
So far, we have only discussed features related to Weaviate Core, the server in your setup. But the Weaviate experience is more than that. Modules allow you to integrate seamlessly with various embedding providers, and our language clients make Weaviate accessible right from your application. This year, we will further improve both. This includes improved APIs on the client side, new modules, for example, for generative search, and improvements to our existing modules.
The most important pillar is all of you – our community. This includes both free, open-source users that self-host their Weaviate setup, as well as paid enterprise users and anyone using our Weaviate-as-a-Service offerings. We value your feedback and love that you are part of shaping our future.
Last year we introduced our dynamic roadmap page that allows you to create and upvote your favorite feature requests. This way, you can make sure that your voice is heard, and we can see what all of you need the most.
Conclusion: Proud of how far we’ve come, excited about the future
In the beginning, I mentioned that not just the product but also the company grew significantly last year. I am incredibly proud of what we have achieved – both overall and in the past year. This wouldn’t have been possible without an absolutely fantastic team. Everyone working on Weaviate – whether a full-time employee or open-source contributor – is doing a fantastic job. I am proud of you all and highly excited about the future. Thank you all, and let’s make 2023 the most exciting year for Weaviate users so far!
Thank you so much for reading! If you would like to talk to us more about this topic, please connect with us on Slack or Twitter. Weaviate is open-source, and you can follow the project on GitHub. Don’t forget to give us a ⭐️ while you are there!