REST - /v1/classification
Start a classification
Weaviate's classification features allows you to classify data objects by predicting cross-references based on the semantic meaning of the data objects. Weaviate Core (without any modules) provides one type of classification:
- kNN classification: Uses the k-nearest neighbors algorithm and requiring training data to predict cross-references. Weaviate finds similar objects and checks how they were labeled in the past. Especially when there isn't a logical semantic relationship in the objects that need to be classified, the kNN algorithm is helpful.
The vectorizer module text2vec-contextionary
provides a second type of classification. Information about this classification type can be found here.
- Contextual classification: Predicts cross-references based on the context, without training data. If you don't have any training data and want to classify how similar a source item is to a potential target item, contextual classification is the right pick. Especially when there is a strong semantic relation in your data (e.g.,
The Landmark Eiffel Tower
andThe City Paris
).
A classification can be started using the RESTful API, via the v1/classification
endpoint with a POST
request. This triggers the start of the classification, after which it will run in the background. This can also be achieved using one of the client libraries. Use the GET
method to see the status of the classification:
- Python
- JavaScript/TypeScript
- Go
- Java
- Curl
import weaviate
client = weaviate.Client("http://localhost:8080")
query_result = (
client.classification.schedule()
.with_type("knn")
.with_class_name("<className>")
.with_based_on_properties(["<property3>"]) # must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
.with_classify_properties(["<property1>", "<property2>"]) # at least one property must be specified
.with_settings({"k": 3}) # additional classification settings, optional for KNN
.with_wait_for_completion()
.do()
)
print(query_result)
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
const response = await client.classifications
.scheduler()
.withType('<e.g. knn>')
.withSettings(<e.g. { 'k': 3 }>) // additional classification settings, optional for KNN
.withClassName('<className>')
.withClassifyProperties(['<property1>', '<property2>']) // at least one property must be specified
.withBasedOnProperties(['<property3>']) // must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
.withWaitForCompletion()
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/classifications"
"github.com/weaviate/weaviate/usecases/classification"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
classification, err := client.Classifications().
Scheduler().
WithType(classifications.<e.g. KNN>).
WithSettings(<&classification.ParamsKNN{K: 3}>).
WithClassName("<ClassName>").
WithClassifyProperties([]string{"<property1>", "<property2>"}). // at least one property must be specified
WithBasedOnProperties([]string{"<property3>"}). // must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
WithWaitForCompletion.
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", classification)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.classifications.model.Classification;
import io.weaviate.client.v1.classifications.model.ClassificationType;
import io.weaviate.client.v1.classifications.model.ParamsKNN;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
Result<Classification> result = client.classifications()
.scheduler()
.withType(classifications.<e.g. ClassificationType.KNN>)
.withSettings(ParamsKNN.builder().k(3).build())
.withClassName("<ClassName>")
.withClassifyProperties(new String[]{"<property1>", "<property2>"}) // at least one property must be specified
.withBasedOnProperties(new String[]{"property3"}) // must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
.withWaitForCompletion()
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
# At least one property needs to be specified for "classifyProperties"
# "basedOnProperties" must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "className",
"classifyProperties": ["<property1>", "<property2>"],
"basedOnProperties": ["<property3>"],
"type": "<e.g. knn>"
}' \
http://localhost:8080/v1/classifications
Which will return information about the started classification, including the classification id
.
Clients and async classification
Some classification jobs can take some time to complete. With the Weaviate clients, there are two ways to deal with this. Although there is no explicit async method for classification available, you can do the following:
- Wait for the classification function to finish before continuing with the rest of the script (see examples in the code block above).
Python
: addwith_wait_for_completion()
to the builder pattern.Go
: add.WithWaitForCompletion()
to the builder pattern.JavaScript
: add.withWaitForCompletion()
to the builder pattern.
- Don't wait for the classification to be finished and return directly. You can check if the classification is completed using the classification meta endpoint with the id of the classification (which can be found in the return body of the classification start). The field
status
in the return body will either berunning
orcompleted
. See here how to query this information.
Get status, results and metadata
The GET
endpoint returns the status, results and metadata of a previously created classification:
Method
GET /v1/classifications/{id}
Parameters
The classification id
should be passed to the request. This id
is obtained from the result of starting the classification.
Response
It returns the following fields for all classification types:
{
"id": "string", // classification id
"class": "string", // class name of the classified data objects
"classifyProperties": [ "string" ], // list of the class properties that are (to be) classified
"basedOnProperties": [ "string" ], // list of the class properties that the classification is based on
"status": "string", // status of the classification, can be "running" or "completed"
"meta": {
"started": "timestamp",
"completed": "timestamp",
"count": int, // total number of items to classify (only if "status" is completed)
"countSucceeded": int, // total number of items that succeeded (only if "status" is completed)
"countFailed": int // total number of items that failed (only if "status" is completed, only if >0)
},
"type": "string", // the type of classification, can be "knn" or a module specific classification
"settings": {}, // additional settings specific to the classification type
"filters": { // additional filters specific to the data objects to include in the classification, Where filters according to the GraphQL Where filter design
"sourceWhere": { … },
"trainingSetWhere": { … },
"targetWhere": { … },
}
}
The following fields additionally when the classification was based on kNN:
{
"settings": {
"k": int, // the number of neighbors taken in the classification
}
}
Example
A knn
classification according to the example
The following command:
- Python
- JavaScript/TypeScript
- Go
- Java
- Curl
import weaviate
client = weaviate.Client("http://localhost:8080")
classification_info = client.classification.get("ee722219-b8ec-4db1-8f8d-5150bb1a9e0c")
print(classification_info)
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
const response = await client.classifications
.getter()
.withId('ee722219-b8ec-4db1-8f8d-5150bb1a9e0c')
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
classification, err := client.Classifications().Getter().
WithID("ee722219-b8ec-4db1-8f8d-5150bb1a9e0c").
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", classification)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.classifications.model.Classification;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
Result<Classification> result = client.classifications().getter()
.withID("ee722219-b8ec-4db1-8f8d-5150bb1a9e0c")
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
curl http://localhost:8080/v1/classifications/ee722219-b8ec-4db1-8f8d-5150bb1a9e0c
returns:
{
"basedOnProperties": [
"summary"
],
"class": "Article",
"classifyProperties": [
"hasPopularity"
],
"id": "ee722219-b8ec-4db1-8f8d-5150bb1a9e0c",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"minimumUsableWords": 3,
"status": "running",
"tfidfCutoffPercentile": 80,
"type": "knn",
"settings": {
"k": 3,
}
}
Evaluation of single data object results
After the classification is completed, the concerning reference properties data objects in the Weaviate instance are updated according to the classification. These data objects will be represented similarly to other data objects. The results of a classification can be requested for the individual data objects through the v1/objects/{id}/?include=classification
RESTful endpoint or with the GraphQL _additional {classification}
field.
KNN classification
With k-nearest neighbor classification, Weaviate finds similar objects and checks how they were labeled in the past. The more objects added and correctly labeled over time, the better a future classification becomes. Especially when there isn't a logical semantic relationship in the objects that need to be classified, the kNN algorithm is helpful.
Example use cases
Email spam classification
Imagine you have a data set of emails. Some of those emails are useful, others are spam. The decision between whether an email is spam follows a set of business rules which you may not know about. For example. it could be likely that if email mentions certain words, such as brand names for a specific medication, an email is more likely to be spam. You can let Weaviate learn based on the training data you provide it with. Next to the "Email" class (source), you also introduce an "Importance" class of which adds three data objects: "Spam", "Neutral", "Important". With "kNN" Weaviate never compares source objects to target objects. Instead, it compares source objects to similar source objects and "inherits" their labeling. In turn, it also improves in quality the more (correctly) labeled data you add. For example, if Weaviate finds an email object with the text "Buy the best stamina drugs for cheap prices at very-questionable-shop.com", it will now scan the training data set for a close match. Imagine it finds the email with "Buy cheap pills online" and similar emails. Because these pre-labeled objects were marked as spam, Weaviate will make the decision to label the unseen data object as spam as well. The same will happen for "neutral" and "important" emails respectively.
Article popularity prediction
Imagine you have a property for the popularity of the Article
by the audience, and you would like to predict the popularity
for new articles based on known properties. You can use kNN classification, use the popularity of previous articles and predict the popularity of new articles.
Requirements
- A schema with at least two classes and a cross-reference between both classes.
- Some training data, which are data objects in the class with a reference (you want to predict for other objects) to another class already made.
Endpoint and parameters
A classification can be started via the v1/classifications
endpoint, which can also be accessed via the client libraries. The following fields must (required) or can (optional) be specified along with the POST
request:
Required:
type: "knn"
: the type of the classification, which is "knn" here.class
: the class name of the data objects to be classified.classifyProperties
: a list of properties which values to classify. The individual properties of the class should be reference properties to other classes, which should only refer to one class. This is defined by thedataType
in the schema, which thus should be an array consisting of exactly one class name.basedOnProperties
: one or more of the other properties of the class (NOTE: current Weaviate supports only one property given, so make sure to pass a list with a string of one property name), this field must be specified, but the current implementation takes the whole vector of the class (objects) into account.
Optional, with default values:
settings {k: 3}
. The number of neighbors to base the classification on.- Parameters to add limitations (based on e.g. background business knowledge).
filters: {}
with the following possible properties:sourceWhere: {}
. Parameter to determine which data objects to classify (i.e. you can use this if you want to leave out some data objects to classify them later based on background knowledge). It accepts awhere
filter body.targetWhere: {}
. Parameter to limit possible targets (i.e. when it you want to make sure no data objects will be classified as such). It accepts awhere
filter body.trainingSetWhere: {}
. Parameter to limit possible data objects in the training set. It accepts awhere
filter body.
Start a kNN classification
A classification can be started through one of the clients, or with a direct curl
request to the RESTful API.
- Python
- JavaScript/TypeScript
- Go
- Java
- Curl
import weaviate
client = weaviate.Client("http://localhost:8080")
trainingSetWhere = {
"path": ["wordCount"],
"operator": "GreaterThan",
"valueInt": 100
}
client.classification.schedule()\
.with_type("knn")\
.with_class_name("Article")\
.with_based_on_properties(["summary"])\
.with_classify_properties(["hasPopularity"])\
.with_training_set_where_filter(trainingSetWhere)\
.with_settings({"k":3})\
.do()
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
// the following trigger a classification without waiting
let response = await client.classifications
.scheduler()
.withType('knn')
.withSettings({ k: 3 })
.withClassName('Article')
.withClassifyProperties(['hasPopularity'])
.withBasedOnProperties(['summary'])
.do();
console.log(JSON.stringify(response, null, 2));
// the following triggers a classification with waiting for completion
response = await client.classifications
.scheduler()
.withType('knn')
.withSettings({ k: 3 })
.withClassName('Article')
.withClassifyProperties(['hasPopularity'])
.withBasedOnProperties(['summary'])
.withWaitForCompletion()
.withWaitTimeout(60 * 1000)
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/classifications"
"github.com/weaviate/weaviate-go-client/v4/weaviate/filters"
"github.com/weaviate/weaviate/usecases/classification"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
valueInt64 := func(in int64) *int64 {
return &in
}
valueInt32 := func(in int32) *int32 {
return &in
}
trainingSetWhere := (&filters.WhereBuilder{}).
WithOperator(filters.GreaterThan).
WithPath([]string{"wordCount"}).
WithValueInt(100)
classification, err := client.Classifications().Scheduler().
WithType(classifications.KNN).
WithSettings(&classification.ParamsKNN{K: valueInt32(3)}).
WithClassName("Article").
WithClassifyProperties([]string{"hasPopularity"}).
WithBasedOnProperties([]string{"summary"}).
WithTrainingSetWhereFilter(trainingSetWhere).
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", classification)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.classifications.model.Classification;
import io.weaviate.client.v1.classifications.model.ClassificationType;
import io.weaviate.client.v1.classifications.model.ParamsKNN;
import io.weaviate.client.v1.filters.Operator;
import io.weaviate.client.v1.filters.WhereFilter;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
WhereFilter trainingSetWhere = WhereFilter.builder()
.valueInt(100)
.operator(Operator.GreaterThan)
.path(new String[]{ "wordCount" })
.build();
Result<Classification> result = client.classifications().scheduler()
.withType(ClassificationType.KNN)
.withSettings(ParamsKNN.builder().k(3).build())
.withClassName("Article")
.withClassifyProperties(new String[]{ "hasPopularity" })
.withBasedOnProperties(new String[]{ "summary" })
.withTrainingSetWhereFilter(trainingSetWhere)
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"type": "knn",
"settings": {
"k": 3
},
"basedOnProperties": [
"summary"
],
"classifyProperties": [
"hasPopularity"
],
"filters": {
"trainingSetWhere": {"path": ["wordCount"], "operator": "GreaterThan", "valueInt": 100}
}
}' \
http://localhost:8080/v1/classifications
A classification is started, and will run in the background. The following response is given after starting the classification, and the status can be fetched via the v1/classifications/{id}
endpoint.
{
"basedOnProperties": [
"summary"
],
"class": "Article",
"classifyProperties": [
"hasPopularity"
],
"id": "ee722219-b8ec-4db1-8f8d-5150bb1a9e0c",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"minimumUsableWords": 3,
"status": "running",
"tfidfCutoffPercentile": 80,
"type": "knn",
"settings": {
"k": 3,
}
}
Evaluation of single data object results
After the classification is completed, the concerning reference properties data objects in the Weaviate instance are updated according to the classification. These data objects will be represented similarly to other data objects. The results of a classification can be requested for the individual data objects through the v1/objects/{id}/?include=classification
RESTful endpoint or with the GraphQL _additional {classification}
field.
Zero-Shot Classification
Zero-shot classification is an unsupervised classification method, meaning you don't need any training data. Zero-shot allows you to classify data which wasn't seen before to build the classifier. This type of classification is perfect if you want to label data objects with classes, but you don't have or don't want to use training data. It picks the label objects that have the lowest distance to the source objects. The link is made using cross-references, similar to existing classifications in Weaviate.
Weaviate's zero-shot classification measures how similar (how close) a data item is to a potential target item (a class or label). More specifically, Weaviate uses vector search and similarity
algorithms to classify data objects with other data objects. Internally, Weaviate performs a nearVector
search (which you can also perform manually with GraphQL), and takes the closes result out of a given set of options (data objects) to classify.
Zero-shot classification works with all (text/image/..) vectorizers (or no vectorizer, as long as you have vectors stored in Weaviate).
Endpoint and parameters
A classification can be started via the v1/classifications
endpoint, which can also be accessed via the client libraries. The following fields must (required) or can (optional) be specified along with the POST
request:
Required:
type: "zeroshot"
: the type of the classification, which is zeroshot here.class
: the class name of the data objects to be classified.classifyProperties
: a list of properties which values to classify. The individual properties of the class should be reference properties to other classes, which should only refer to one class. This is defined by thedataType
in the schema, which thus should be an array consisting of exactly one class name.
Optional, with default values:
- Parameters to add limitations (based on e.g. background business knowledge).
filters: {}
with the following possible properties:sourceWhere: {}
. Parameter to determine which data objects to classify (i.e. you can use this if you want to leave out some data objects to classify them later based on background knowledge). It accepts awhere
filter body.targetWhere: {}
. Parameter to limit possible targets (i.e. when it you want to make sure no data objects will be classified as such). It accepts awhere
filter body.trainingSetWhere: {}
. Parameter to limit possible data objects in the training set. It accepts awhere
filter body.
Start a zeroshot classification
A classification can be started through one of the clients, or with a direct curl
request to the RESTful API.
- Python
- JavaScript/TypeScript
- Go
- Java
- Curl
import weaviate
client = weaviate.Client("http://localhost:8080")
source_where = {
"path": ["wordCount"],
"operator": "GreaterThan",
"valueInt": 100
}
target_where = {
"path": ["name"],
"operator": "NotEqual",
"valueText": "Government"
}
client.classification.schedule()\
.with_type("zeroshot")\
.with_class_name("Article")\
.with_classify_properties(["ofCategory"])\
.with_source_where_filter(source_where)\ # optional
.with_target_where_filter(target_where)\ # optional
.do()
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
// the following trigger a classification without waiting
let response = await client.classifications
.scheduler()
.withType('zeroshot')
.withClassName('Article')
.withClassifyProperties(['ofCategory'])
.do();
console.log(JSON.stringify(response, null, 2));
// the following triggers a classification with waiting for completion
response = await client.classifications
.scheduler()
.withType('zeroshot')
.withClassName('Article')
.withClassifyProperties(['ofCategory'])
.withWaitForCompletion()
.withWaitTimeout(60 * 1000)
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/classifications"
"github.com/weaviate/weaviate-go-client/v4/weaviate/filters"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
sourceWhere := filters.Where().
WithOperator(filters.GreaterThan).
WithPath([]string{"wordCount"}).
WithValueInt(100)
targetWhere := filters.Where().
WithOperator(filters.NotEqual).
WithPath([]string{"name"}).
WithValueString("Government")
classification, err := client.Classifications().Scheduler().
WithType(classifications.ZeroShot).
WithClassName("Article").
WithClassifyProperties([]string{"ofCategory"}).
WithSourceWhereFilter(sourceWhere).
WithTargetWhereFilter(targetWhere).
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", classification)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.classifications.model.Classification;
import io.weaviate.client.v1.classifications.model.ClassificationType;
import io.weaviate.client.v1.filters.Operator;
import io.weaviate.client.v1.filters.WhereFilter;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
WhereFilter trainingSetWhere = WhereFilter.builder()
.valueInt(100)
.operator(Operator.GreaterThan)
.path(new String[]{ "wordCount" })
.build();
WhereFilter targetSetWhere = WhereFilter.builder()
.valueText("Government")
.operator("NotEqual")
.path(new String[]{ "name" })
.build();
Result<Classification> result = client.classifications().scheduler()
.withType(ClassificationType.ZeroShot)
.withClassName("Article")
.withClassifyProperties(new String[]{ "ofCategory" })
.withTrainingSetWhereFilter(trainingSetWhere)
.withTargetWhereFilter(targetSetWhere)
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"type": "zeroshot",
"classifyProperties": [
"ofCategory"
],
"filters": {
"sourceWhere": {"path": ["wordCount"], "operator": "GreaterThan", "valueInt": 100},
"targetWhere": {"path": ["name"], "operator": "NotEqual", "valueText": "Government"}
}
}' \
http://localhost:8080/v1/classifications
A classification is started, and will run in the background. The following response is given after starting the classification, and the status can be fetched via the v1/classifications/{id}
endpoint.
{
"class": "Article",
"classifyProperties": [
"ofCategory"
],
"id": "973e3b4c-4c1d-4b51-87d8-4d0343beb7ad",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"status": "running",
"type": "zeroshot"
}
Evaluation of single data object results
After the classification is completed, the concerning reference properties data objects in the Weaviate instance are updated according to the classification. These data objects will be represented similarly to other data objects. The results of a classification can be requested for the individual data objects through the v1/objects/{id}/?include=classification
RESTful endpoint or with the GraphQL _additional {classification}
field.
More Resources
For additional information, try these sources.