REST - /v1/classifications
Start a classification
Weaviate's classification features allows you to classify data objects based on their vector.
- kNN classification: Uses the k-nearest neighbors algorithm to predict a property.
- This requires existing training data.
- The target property should be a reference property to another class.
Note that knn
classification requires the target property to be included in the schema before importing the data. If you want to add a property to the schema after importing data, you will need to re-import the data).
If text2vec-contextionary
is enabled, contextual classification can also be used.
- Contextual classification: Predicts cross-references based on the context.
- This does not require training data.
- This works best when there is a strong semantic relation in your data (e.g.,
The Landmark Eiffel Tower
andThe City Paris
).
Classification can triggered to run in the background with a POST
request, and the GET
method can be used to view its status.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
- Go
- Java
- Curl
The Python client v4
API does not yet support classification tasks. Please use the client's v3
API for this.
import weaviate
client = weaviate.Client("http://localhost:8080")
query_result = (
client.classification.schedule()
.with_type("knn")
.with_class_name("<className>")
.with_based_on_properties(["<property3>"]) # must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
.with_classify_properties(["<property1>", "<property2>"]) # at least one property must be specified
.with_settings({"k": 3}) # additional classification settings, optional for KNN
.with_wait_for_completion()
.do()
)
print(query_result)
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
const response = await client.classifications
.scheduler()
.withType('<e.g. knn>')
.withSettings(<e.g. { 'k': 3 }>) // additional classification settings, optional for KNN
.withClassName('<className>')
.withClassifyProperties(['<property1>', '<property2>']) // at least one property must be specified
.withBasedOnProperties(['<property3>']) // must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
.withWaitForCompletion()
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/classifications"
"github.com/weaviate/weaviate/usecases/classification"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
classification, err := client.Classifications().
Scheduler().
WithType(classifications.<e.g. KNN>).
WithSettings(<&classification.ParamsKNN{K: 3}>).
WithClassName("<ClassName>").
WithClassifyProperties([]string{"<property1>", "<property2>"}). // at least one property must be specified
WithBasedOnProperties([]string{"<property3>"}). // must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
WithWaitForCompletion.
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", classification)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.classifications.model.Classification;
import io.weaviate.client.v1.classifications.model.ClassificationType;
import io.weaviate.client.v1.classifications.model.ParamsKNN;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
Result<Classification> result = client.classifications()
.scheduler()
.withType(classifications.<e.g. ClassificationType.KNN>)
.withSettings(ParamsKNN.builder().k(3).build())
.withClassName("<ClassName>")
.withClassifyProperties(new String[]{"<property1>", "<property2>"}) // at least one property must be specified
.withBasedOnProperties(new String[]{"property3"}) // must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
.withWaitForCompletion()
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
# At least one property needs to be specified for "classifyProperties"
# "basedOnProperties" must be specified, but for contextual classifications the current implementation takes the whole vector of the class (objects) into account
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "className",
"classifyProperties": ["<property1>", "<property2>"],
"basedOnProperties": ["<property3>"],
"type": "<e.g. knn>"
}' \
http://localhost:8080/v1/classifications
The endpoint will return information about the started classification. The response will include the classification id
.
Clients and async classification
Classification jobs can take some time to complete. With the Weaviate clients, you can:
- Wait for the classification function to finish before continuing with the rest of the script.
Python
: addwith_wait_for_completion()
to the builder pattern.Go
: add.WithWaitForCompletion()
to the builder pattern.JavaScript
: add.withWaitForCompletion()
to the builder pattern.
- Don't wait for the classification to be finished and return directly.
- You can check the classification task status using the classification endpoint with the task
id
. The fieldstatus
in the return body will either berunning
orcompleted
. See here how to query this information.
- You can check the classification task status using the classification endpoint with the task
Classification status, results and metadata
A GET
request can return the status, results and metadata of a previously created classification:
Method
GET /v1/classifications/{id}
Parameters
The classification id
should be passed to the request. This id
is obtained from the result of starting the classification.
Response
It returns the following fields for all classification types:
{
"id": "string", // classification id
"class": "string", // class name of the classified data objects
"classifyProperties": [ "string" ], // list of the class properties that are (to be) classified
"basedOnProperties": [ "string" ], // list of the class properties that the classification is based on
"status": "string", // status of the classification, can be "running" or "completed"
"meta": {
"started": "timestamp",
"completed": "timestamp",
"count": int, // total number of items to classify (only if "status" is completed)
"countSucceeded": int, // total number of items that succeeded (only if "status" is completed)
"countFailed": int // total number of items that failed (only if "status" is completed, only if >0)
},
"type": "string", // the type of classification, can be "knn" or a module specific classification
"settings": {}, // additional settings specific to the classification type
"filters": { // additional filters specific to the data objects to include in the classification, Where filters according to the GraphQL Where filter design
"sourceWhere": { … },
"trainingSetWhere": { … },
"targetWhere": { … },
}
}
The following fields additionally when the classification was based on kNN:
{
"settings": {
"k": int, // the number of neighbors taken in the classification
}
}
Example
A knn
classification according to the example
The following command:
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
- Go
- Java
- Curl
The Python client v4
API does not yet support classification tasks. Please use the client's v3
API for this.
import weaviate
client = weaviate.Client("http://localhost:8080")
classification_info = client.classification.get("ee722219-b8ec-4db1-8f8d-5150bb1a9e0c")
print(classification_info)
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
const response = await client.classifications
.getter()
.withId('ee722219-b8ec-4db1-8f8d-5150bb1a9e0c')
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
classification, err := client.Classifications().Getter().
WithID("ee722219-b8ec-4db1-8f8d-5150bb1a9e0c").
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", classification)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.classifications.model.Classification;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
Result<Classification> result = client.classifications().getter()
.withID("ee722219-b8ec-4db1-8f8d-5150bb1a9e0c")
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
curl http://localhost:8080/v1/classifications/ee722219-b8ec-4db1-8f8d-5150bb1a9e0c
returns:
{
"basedOnProperties": [
"summary"
],
"class": "Article",
"classifyProperties": [
"hasPopularity"
],
"id": "ee722219-b8ec-4db1-8f8d-5150bb1a9e0c",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"minimumUsableWords": 3,
"status": "running",
"tfidfCutoffPercentile": 80,
"type": "knn",
"settings": {
"k": 3,
}
}
KNN classification
Weaviate performs the knn
classification based on vector similarity between data objects.,
Due to the nature of the k-nearest neighbor algorithm, the quality of the classification will be a function of the quantity and quality of the pre-existing data.
Requirements
- A schema with a class to be classified, and a class to store the classification.
- A cross-reference from the class to be classified to the class to store the classification.
- Training data within the class to store the classification.
- Vectors for the class to be classified.
Parameters
Required:
type: "knn"
: the type of the classification, which is "knn" here.class
: the class name of the data objects to be classified.classifyProperties
: an array containing the target, cross-reference, property name of the class to be classified.basedOnProperties
: an array containing a property name.
basedOnProperties
limitationsThe current knn implementation uses the object vector, but requires the basedOnProperties
to be an array with one valid text property name.
Optional, with default values:
settings {k: 3}
. The number of neighbors to base the classification on.- Parameters to add limitations (based on e.g. background business knowledge).
filters: {}
with the following possible properties:sourceWhere: {}
. Parameter to determine which data objects to classify (i.e. to leave out some data objects).targetWhere: {}
. Parameter to limit possible targets (i.e. to exclude possible target values).trainingSetWhere: {}
. Parameter to limit possible data objects in the training set.- All of
sourceWhere
,targetWhere
andtrainingSetWhere
filters accept awhere
filter body.
Start a kNN classification
A classification can be started through one of the clients, or with a direct curl
request to the RESTful API.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
- Go
- Java
- Curl
The Python client v4
API does not yet support classification tasks. Please use the client's v3
API for this.
import weaviate
client = weaviate.Client("http://localhost:8080")
trainingSetWhere = {
"path": ["wordCount"],
"operator": "GreaterThan",
"valueInt": 100
}
client.classification.schedule()\
.with_type("knn")\
.with_class_name("Article")\
.with_based_on_properties(["summary"])\
.with_classify_properties(["hasPopularity"])\
.with_training_set_where_filter(trainingSetWhere)\
.with_settings({"k":3})\
.do()
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
// the following trigger a classification without waiting
let response = await client.classifications
.scheduler()
.withType('knn')
.withSettings({ k: 3 })
.withClassName('Article')
.withClassifyProperties(['hasPopularity'])
.withBasedOnProperties(['summary'])
.do();
console.log(JSON.stringify(response, null, 2));
// the following triggers a classification with waiting for completion
response = await client.classifications
.scheduler()
.withType('knn')
.withSettings({ k: 3 })
.withClassName('Article')
.withClassifyProperties(['hasPopularity'])
.withBasedOnProperties(['summary'])
.withWaitForCompletion()
.withWaitTimeout(60 * 1000)
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/classifications"
"github.com/weaviate/weaviate-go-client/v4/weaviate/filters"
"github.com/weaviate/weaviate/usecases/classification"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
valueInt64 := func(in int64) *int64 {
return &in
}
valueInt32 := func(in int32) *int32 {
return &in
}
trainingSetWhere := (&filters.WhereBuilder{}).
WithOperator(filters.GreaterThan).
WithPath([]string{"wordCount"}).
WithValueInt(100)
classification, err := client.Classifications().Scheduler().
WithType(classifications.KNN).
WithSettings(&classification.ParamsKNN{K: valueInt32(3)}).
WithClassName("Article").
WithClassifyProperties([]string{"hasPopularity"}).
WithBasedOnProperties([]string{"summary"}).
WithTrainingSetWhereFilter(trainingSetWhere).
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", classification)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.classifications.model.Classification;
import io.weaviate.client.v1.classifications.model.ClassificationType;
import io.weaviate.client.v1.classifications.model.ParamsKNN;
import io.weaviate.client.v1.filters.Operator;
import io.weaviate.client.v1.filters.WhereFilter;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
WhereFilter trainingSetWhere = WhereFilter.builder()
.valueInt(100)
.operator(Operator.GreaterThan)
.path(new String[]{ "wordCount" })
.build();
Result<Classification> result = client.classifications().scheduler()
.withType(ClassificationType.KNN)
.withSettings(ParamsKNN.builder().k(3).build())
.withClassName("Article")
.withClassifyProperties(new String[]{ "hasPopularity" })
.withBasedOnProperties(new String[]{ "summary" })
.withTrainingSetWhereFilter(trainingSetWhere)
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"type": "knn",
"settings": {
"k": 3
},
"basedOnProperties": [
"summary"
],
"classifyProperties": [
"hasPopularity"
],
"filters": {
"trainingSetWhere": {"path": ["wordCount"], "operator": "GreaterThan", "valueInt": 100}
}
}' \
http://localhost:8080/v1/classifications
A classification is started, and will run in the background. The following response is given after starting the classification, and the status can be fetched via the v1/classifications/{id}
endpoint.
{
"basedOnProperties": [
"summary"
],
"class": "Article",
"classifyProperties": [
"hasPopularity"
],
"id": "ee722219-b8ec-4db1-8f8d-5150bb1a9e0c",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"minimumUsableWords": 3,
"status": "running",
"tfidfCutoffPercentile": 80,
"type": "knn",
"settings": {
"k": 3,
}
}
Evaluation of single data object results
The classification task will update the target property in each data object.
The results of a classification can be requested for the individual data objects through the v1/objects/{id}/?include=classification
RESTful endpoint or with the GraphQL _additional {classification}
field.
Zero-Shot Classification
Zero-shot classification is an unsupervised classification method, meaning you don't need any training data.
Weaviate's zero-shot classification measures how similar (how close) a data item is to a potential target item (a class or label).
More specifically, Weaviate uses vector search and similarity
algorithms to classify data objects with other data objects. Internally, Weaviate performs a nearVector
search (which you can also perform manually with GraphQL), and takes the closes result out of a given set of options (data objects) to classify.
Zero-shot classification works with all (text/image/..) vectorizers (or no vectorizer, as long as you have vectors stored in Weaviate).
Parameters
Required:
type: "zeroshot"
: the type of the classification.class
: the class name of the data objects to be classified.classifyProperties
: a list of properties to classify. They should be reference properties to other classes, each only referring to one class.
Optional, with default values:
- Parameters to add limitations (based on e.g. background business knowledge).
filters: {}
with the following possible properties:sourceWhere: {}
. Parameter to determine which data objects to classify (i.e. to leave out some data objects).targetWhere: {}
. Parameter to limit possible targets (i.e. to exclude possible target values).trainingSetWhere: {}
. Parameter to limit possible data objects in the training set.- All of
sourceWhere
,targetWhere
andtrainingSetWhere
filters accept awhere
filter body.
Start a zeroshot classification
A classification can be started through one of the clients, or with a direct curl
request to the RESTful API.
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
- Go
- Java
- Curl
The Python client v4
API does not yet support classification tasks. Please use the client's v3
API for this.
import weaviate
client = weaviate.Client("http://localhost:8080")
source_where = {
"path": ["wordCount"],
"operator": "GreaterThan",
"valueInt": 100
}
target_where = {
"path": ["name"],
"operator": "NotEqual",
"valueText": "Government"
}
client.classification.schedule()\
.with_type("zeroshot")\
.with_class_name("Article")\
.with_based_on_properties(["summary"])\
.with_classify_properties(["ofCategory"])\
.with_source_where_filter(source_where)\ # optional
.with_target_where_filter(target_where)\ # optional
.do()
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
// the following trigger a classification without waiting
let response = await client.classifications
.scheduler()
.withType('zeroshot')
.withClassName('Article')
.withBasedOnProperties(["summary"])
.withClassifyProperties(['ofCategory'])
.do();
console.log(JSON.stringify(response, null, 2));
// the following triggers a classification with waiting for completion
response = await client.classifications
.scheduler()
.withType('zeroshot')
.withClassName('Article')
.withClassifyProperties(['ofCategory'])
.withBasedOnProperties(["summary"])
.withWaitForCompletion()
.withWaitTimeout(60 * 1000)
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/classifications"
"github.com/weaviate/weaviate-go-client/v4/weaviate/filters"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
sourceWhere := filters.Where().
WithOperator(filters.GreaterThan).
WithPath([]string{"wordCount"}).
WithValueInt(100)
targetWhere := filters.Where().
WithOperator(filters.NotEqual).
WithPath([]string{"name"}).
WithValueString("Government")
classification, err := client.Classifications().Scheduler().
WithType(classifications.ZeroShot).
WithClassName("Article").
WithClassifyProperties([]string{"ofCategory"}).
WithBasedOnProperties([]string{"summary"}).
WithSourceWhereFilter(sourceWhere).
WithTargetWhereFilter(targetWhere).
Do(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("%v", classification)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.classifications.model.Classification;
import io.weaviate.client.v1.classifications.model.ClassificationType;
import io.weaviate.client.v1.filters.Operator;
import io.weaviate.client.v1.filters.WhereFilter;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
WhereFilter trainingSetWhere = WhereFilter.builder()
.valueInt(100)
.operator(Operator.GreaterThan)
.path(new String[]{ "wordCount" })
.build();
WhereFilter targetSetWhere = WhereFilter.builder()
.valueText("Government")
.operator("NotEqual")
.path(new String[]{ "name" })
.build();
Result<Classification> result = client.classifications().scheduler()
.withType(ClassificationType.ZeroShot)
.withClassName("Article")
.withClassifyProperties(new String[]{ "ofCategory" })
.withBasedOnProperties(new String[]{"summary"})
.withTrainingSetWhereFilter(trainingSetWhere)
.withTargetWhereFilter(targetSetWhere)
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"type": "zeroshot",
"classifyProperties": [
"ofCategory"
],
"basedOnProperties": [
"summary"
],
"filters": {
"sourceWhere": {"path": ["wordCount"], "operator": "GreaterThan", "valueInt": 100},
"targetWhere": {"path": ["name"], "operator": "NotEqual", "valueText": "Government"}
}
}' \
http://localhost:8080/v1/classifications
A classification is started, and will run in the background. The following response is given after starting the classification, and the status can be fetched via the v1/classifications/{id}
endpoint.
{
"class": "Article",
"classifyProperties": [
"ofCategory"
],
"id": "973e3b4c-4c1d-4b51-87d8-4d0343beb7ad",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"status": "running",
"type": "zeroshot"
}
Evaluation of single data object results
The classification task will update the target property in each data object.
The results of a classification can be requested for the individual data objects through the v1/objects/{id}/?include=classification
RESTful endpoint or with the GraphQL _additional {classification}
field.