The v4
client has since been officially released! Thank you to everyone who provided feedback and helped shape the client. You can find the official release announcement here.
We've released a pre-release version of our new Python client API. It's available for you to try out and provide feedback through this form. We've made a number of major improvements, including strong typing, a collections-first approach, and performance improvements.
You can try it here - Experimental clients.
Introduction
We want to ensure that Weaviate is not just powerful, but also user-friendly.
After all, the best tools are those that are accessible and easy to integrate into your projects. This is why we prioritize client libraries for different programming languages.
Today, we're thrilled to unveil our pre-release collections
Python client API. We've designed it to be more Pythonic, leveraging modern IDE features like code completion and type hints. Moreover, we've revamped the way you work with your data.
While our internal tests have yielded positive results, we are excited to see what you, the community, think. So, we invite you to explore this new API, share your insights, and contribute to its evolution by providing feedback.
If you're keen to try it, please head over to our Experimental clients page. There, you'll find the installation, usage and feedback instructions.
Let's dive into the inspiration and core concepts behind this innovation.
Going forward, we will use the term collection
instead of class
to refer to the sets of objects to be stored in Weaviate. This is to avoid confusion with the generic word class
in object-oriented programming, and in Python.
Key changes
Highlights: Strong typing through custom Python classes, focus on individual collections for interaction
There are two key changes to the way the collections
client works in comparison to the existing Python client (let's call this the Classic
client). They relate to the object typing and the way you interact with the data.
Aspect | Classic Client | Collections Client |
---|---|---|
Object Typing | Primarily relied on untyped dict / JSONs. | Utilizes strong typing through custom Python classes. |
Unit of Interaction | Interacts with the entire database. | Focuses on individual collections for interaction. |
Let's look at each of these in more detail.
Object typing
The Classic
client primarily relied on untyped data structures like dictionaries and JSONs. While this approach offered flexibility, we saw this lead to runtime errors. Recognizing this, the collections client takes a leap forward by adding strong typing through custom Python classes in many places.
This in turn enables the things we all appreciate in our coding life - like code completion and type hints in the IDE. Take a look at how little actual typing is required to create a collection definition, thanks to the IDE support:
As a user, you will now only interact with the Python API itself and its surface rather than needing know the underlying API's surface as well. In other words, you do not have to remember the exact API parameter names, as the IDE will show you the available options. This can be a boon when you are working with less often-used options or when you are new to Weaviate.
Take a look at the next example, where we are configuring the invertedIndexConfig
for a property. The IDE shows us the available options and their descriptions.
Types are introduced for the data objects as well at creation time, as well as when retrieving them from the database. This means that you can access the properties of the data object directly.
So syntax that is currently like this:
response['data']['Get']['Article'][0]['title'] # Get the `title` property of the first object
response['data']['Get']['Article'][0]['_additional']['id'] # Get the ID of the first object
response['data']['Get']['Article'][0]['_additional']['generate']['singleResult'] # Get the generated text from a `singlePrompt` request
response['data']['Get']['Article'][0]['_additional']['generate']['groupedResult'] # Get the generated text from a `groupedTask` request
Become:
Collections
client syntaxresponse.objects[0].properties['title'] # Get the `title` property of the first object
response.objects[0].uuid # Get the ID of the first object
response.objects[0].generated # Get the generated text from a `singlePrompt` request
response.generated # Get the generated text from a `groupedTask` request
We think that these changes will reduce errors, increase productivity, and make the code easier to read and understand.
Collections-first approach
The other big change is that the collections
client focuses on individual collections for interaction.
This means that you will no longer need to specify the collection name in every request. Instead, you will create an object for each collection that you want to interact with, and then use that object for all subsequent requests.
For example, take the following syntax for performing a simple request to retrieve a few objects from the database:
response = (
client.query.get(
class_name="Article",
properties=["title", "body", "url"]
)
.with_limit(2)
.do()
)
Becomes:
Collections
client syntaxarticles = client.collection.get("Article")
response = articles.query.fetch_objects(limit=2)
You'll see that a search is now a method that originates from the collection object.
We have observed that in most cases, you will be working with a single collection at a time. So, we think that this change will improve your efficiency and make the code more readable.
This is the same for all searches, retrieval augmented generations, and data operations such as insertion, deletion, and updating. You can access these methods through submodules like:
articles.data # For data operations
articles.query # For searches
articles.generate # For retrieval augmented generations
articles.aggregate # For aggregating results
We think that these changes will make the code more readable and intuitive.
Quality-of-life improvements
Highlights: Simplified methods, easier defaults, batch import error handling
As well as making these big philosophical changes, we've also made a number of quality-of-life improvements to the API.
Simplified methods
Standalone methods with parameters now replace the builder pattern (with_
methods) for queries. So what used to be a chain of methods like this:
response = (
client.query.get(
class_name="JeopardyQuestion",
properties=["question", "answer"]
)
.with_near_text({"concepts": ["the space race"]})
.with_generate(
grouped_task="Write a haiku about these facts!",
)
.with_limit(2)
.do()
)
Becomes:
Collections
client syntaxquestions = client.collection.get("JeopardyQuestion")
response = questions.generate.near_text(
query="the space race",
limit=2,
grouped_task="Write a haiku about these facts!"
)
Property/metadata return defaults
You might have noticed that above examples do not specify properties to be returned!
We have changed the default behavior to return most properties and metadata such as the object ID, creation time, vector search distance and so on. We think this will make it easier to get started, and for production use-cases where you want to optimize the response size, you can still specify the properties you want to return.
Batch import typing
Batch object, now called insert_many
, also gets a refresh with the introduction of a DataObject
class. This class is used to define the properties of the object to be inserted, and is then passed to the insert_many
method.
from weaviate.util import generate_uuid5
articles_to_add = list()
for i, row in enumerate(data_source):
article_obj = wvc.data.DataObject(
properties={
"title": row["title"],
"body": row["title"],
},
vector=vectors[i],
uuid=generate_uuid5(row["title"])
)
articles_to_add.append(article_obj)
response = articles.data.insert_many(articles_to_add)
print(response)
This provides a more intuitive and structured way to define objects, where each object can be validated through the DataObject
class.
Oh, and if you don't need to manually specify an object ID or a vector, you can just pass in a list of dictionaries!
for i, row in enumerate(data_source):
article_obj = {
"title": row["title"],
"body": row["title"],
}
articles_to_add.append(article_obj)
response = articles.data.insert_many(articles_to_add)
print(response)
Batch import error handling
Another big improvement is how we handle errors during batch data import. Many of you let us know that you would like to get more information around any errors at a macro and micro level. So, we've added a couple of features to help with this.
One is the introduction of an Error
class that will be returned amongst the successful ID response. Here's one that we triggered by supplying an incorrect datatype:
Error(message="invalid text property 'url' on class 'TestArticle': not a string, but float64", code=None, original_uuid=None)
The other is that the overall response object will indicate whether there were any errors (has_errors
Boolean), and where they occurred (errors
dictionary).
Collection
-level iterator
The cursor
API has been a popular tool for iterating through an entire collection, for example to create manual backups or retrieving the entire set of IDs.
We've simplified the API here with a Pythonic iterator, so the syntax to extract all objects from a collection is now as simple as:
all_objects = [article for article in articles.iterator()]
You can also choose what properties/metadata to return.
Performance improvements
Highlights: Up to 60-80% faster imports, 3-4x faster queries
While a big part of this collections
client release has been about improving the developer experience, there has been some significant performance improvements as well.
Under the hood, we expand on Weaviate's gRPC (what is it?) capabilities that has been ongoing since v1.19
release.
The short story is that your query performance and import performance will be significantly improved.
Import speeds
We've internally seen that batch data import with new client using gRPC achieves a 60-80% speedup compared to the current client. For production use cases involving tens or hundreds of millions of objects, this can mean significant reduction in the hours spent getting your database ready.
Query speeds
But that's not all. Even a query will execute much faster when using gRPC in comparison with GraphQL. We've seen that a small query executes 3-4 times faster(!) which is no small feat.
How was this achieved?
These speed-ups come from two benefits of using gRPC, which is a binary protocol. This means that one, the data is sent in a compact binary format, rather than as text, which reduces the volume of data to be sent. The other factor is that to send data as text, it needs to be processed into JSON format which is a computationally expensive process. With gRPC, the data is sent in a binary format, which is much faster to process at both the sender and receiver ends.
We won't get into the details of gRPC here, but if you are interested, you can read more about it here. We will also be writing more about it in the future.
Object typing in focus
Highlights: Strong typing in queries, retrieval augmented generation, data operations, ...
We've mentioned object typing a few times already, but let's take a closer look at a few more concrete examples. Here you can see the new syntax for collection definition, queries and retrieval augmented generation, as well as types for each of thees tasks.
To get started you can import the set of submodules like shown below, and use them as needed.
import weaviate.classes as wvc # <-- import the custom classes
Collection definition
For many (including us), the collection definition step required multiples visits to the documentation. This was unsurprising, given that it required writing out a dictionary with many key/value pairs:
collection_definition = {
"class": "Article",
"vectorizer": "text2vec-openai",
"vectorIndexConfig": {
"distance": "cosine",
},
"moduleConfig": {
"generative-openai": {}
},
"properties": [
{
"name": "title",
"dataType": ["text"]
},
# ...
{
"name": "url",
"dataType": ["text"],
"tokenization": "field",
"moduleConfig": {
"text2vec-openai": {
"skip": true
},
}
},
],
}
client.schema.add_class(collection_definition)
Well, no more! Now, you can use Python classes all the way through, with your IDE guiding the way:
Collections
client syntaxclient.collection.create(
name="Article",
vectorizer_config=wvc.ConfigFactory.Vectorizer.text2vec_openai(),
vector_index_config=wvc.ConfigFactory.vector_index(distance_metric=wvc.config.VectorDistances.COSINE),
generative_config=wvc.ConfigFactory.Generative.openai(),
properties=[
wvc.Property(
name="title",
data_type=wvc.DataType.TEXT,
),
# ...
wvc.Property(
name="url",
data_type=wvc.DataType.TEXT,
tokenization=wvc.Tokenization.FIELD,
skip_vectorization=True,
),
]
)
Note that many of the previously complex options such as vectorizer, generative module, index configurations, and so on are now defined through a ConfigFactory
class:
articles = client.collection.create(
name="Article",
# Skipped for brevity ...
vectorizer_config=wvc.ConfigFactory.Vectorizer.text2vec_openai(),
vector_index_config=wvc.ConfigFactory.vector_index(distance_metric=wvc.config.VectorDistances.COSINE),
generative_config=wvc.ConfigFactory.Generative.openai(),
)
Queries
Because each query is now one method, you can see the available options in your IDE. Furthermore, options like defining filters or metadata that were defined through string parameters in the Classic
client are now defined through custom Python classes.
Retrieval augmented generation
As you saw briefly earlier, we have a .generate
submodule for retrieval augmented generation.
The structure of these mirror those of the queries, with additional parameters added for the generation task. For example, this query:
response = questions.query.near_text(
query="the space race",
limit=2,
)
Can be converted to a retrieval augmented generation task by switching the submodule to .generate
and adding the grouped_task
parameter:
response = questions.generate.near_text(
query="the space race",
limit=2,
grouped_task="Write a haiku about these facts!"
)
Now it's your turn
If you've made it this far, you're probably excited to try out the new client. We're soo excited for you to try it out too!
We have prepared a dedicated page with instructions on how to install and use the new client. The page also includes a link to the feedback form where you can share your thoughts and suggestions.
We can't wait to hear what you think.
Ready to start building?
Check out the Quickstart tutorial, or build amazing apps with a free trial of Weaviate Cloud (WCD).
Don't want to miss another blog post?
Sign up for our bi-weekly newsletter to stay updated!
By submitting, I agree to the Terms of Service and Privacy Policy.