Skip to main content

REST - /v1/classifications

Start a classification

Weaviate's classification features allows you to classify data objects based on their vector.

  • kNN classification: Uses the k-nearest neighbors algorithm to predict a property.
    • This requires existing training data.
    • The target property should be a reference property to another class.
Adding property to the schema

Note that knn classification requires the target property to be included in the schema before importing the data. If you want to add a property to the schema after importing data, you will need to re-import the data).

If text2vec-contextionary is enabled, contextual classification can also be used.

  • Contextual classification: Predicts cross-references based on the context.
    • This does not require training data.
    • This works best when there is a strong semantic relation in your data (e.g., The Landmark Eiffel Tower and The City Paris).

Classification can triggered to run in the background with a POST request, and the GET method can be used to view its status.

note

The Python client v4 API does not yet support classification tasks. Please use the client's v3 API for this.

The endpoint will return information about the started classification. The response will include the classification id.

Clients and async classification

Classification jobs can take some time to complete. With the Weaviate clients, you can:

  • Wait for the classification function to finish before continuing with the rest of the script.
    • Python: add with_wait_for_completion() to the builder pattern.
    • Go: add .WithWaitForCompletion() to the builder pattern.
    • JavaScript: add .withWaitForCompletion() to the builder pattern.
  • Don't wait for the classification to be finished and return directly.
    • You can check the classification task status using the classification endpoint with the task id. The field status in the return body will either be running or completed. See here how to query this information.

Classification status, results and metadata

A GET request can return the status, results and metadata of a previously created classification:

Method

GET /v1/classifications/{id}

Parameters

The classification id should be passed to the request. This id is obtained from the result of starting the classification.

Response

It returns the following fields for all classification types:

{
"id": "string", // classification id
"class": "string", // class name of the classified data objects
"classifyProperties": [ "string" ], // list of the class properties that are (to be) classified
"basedOnProperties": [ "string" ], // list of the class properties that the classification is based on
"status": "string", // status of the classification, can be "running" or "completed"
"meta": {
"started": "timestamp",
"completed": "timestamp",
"count": int, // total number of items to classify (only if "status" is completed)
"countSucceeded": int, // total number of items that succeeded (only if "status" is completed)
"countFailed": int // total number of items that failed (only if "status" is completed, only if >0)
},
"type": "string", // the type of classification, can be "knn" or a module specific classification
"settings": {}, // additional settings specific to the classification type
"filters": { // additional filters specific to the data objects to include in the classification, Where filters according to the GraphQL Where filter design
"sourceWhere": {},
"trainingSetWhere": {},
"targetWhere": {},
}
}

The following fields additionally when the classification was based on kNN:

{
"settings": {
"k": int, // the number of neighbors taken in the classification
}
}

Example

A knn classification according to the example The following command:

note

The Python client v4 API does not yet support classification tasks. Please use the client's v3 API for this.

returns:

{
"basedOnProperties": [
"summary"
],
"class": "Article",
"classifyProperties": [
"hasPopularity"
],
"id": "ee722219-b8ec-4db1-8f8d-5150bb1a9e0c",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"minimumUsableWords": 3,
"status": "running",
"tfidfCutoffPercentile": 80,
"type": "knn",
"settings": {
"k": 3,
}
}

KNN classification

Weaviate performs the knn classification based on vector similarity between data objects.,

Due to the nature of the k-nearest neighbor algorithm, the quality of the classification will be a function of the quantity and quality of the pre-existing data.

Requirements

  • A schema with a class to be classified, and a class to store the classification.
    • A cross-reference from the class to be classified to the class to store the classification.
  • Training data within the class to store the classification.
  • Vectors for the class to be classified.

Parameters

Required:

  • type: "knn": the type of the classification, which is "knn" here.
  • class: the class name of the data objects to be classified.
  • classifyProperties: an array containing the target, cross-reference, property name of the class to be classified.
  • basedOnProperties: an array containing a property name.
basedOnProperties limitations

The current knn implementation uses the object vector, but requires the basedOnProperties to be an array with one valid text property name.

Optional, with default values:

  • settings {k: 3}. The number of neighbors to base the classification on.
  • Parameters to add limitations (based on e.g. background business knowledge).
    • filters: {} with the following possible properties:
      • sourceWhere: {}. Parameter to determine which data objects to classify (i.e. to leave out some data objects).
      • targetWhere: {}. Parameter to limit possible targets (i.e. to exclude possible target values).
      • trainingSetWhere: {}. Parameter to limit possible data objects in the training set.
      • All of sourceWhere, targetWhere and trainingSetWhere filters accept a where filter body.

Start a kNN classification

A classification can be started through one of the clients, or with a direct curl request to the RESTful API.

note

The Python client v4 API does not yet support classification tasks. Please use the client's v3 API for this.

A classification is started, and will run in the background. The following response is given after starting the classification, and the status can be fetched via the v1/classifications/{id} endpoint.

{
"basedOnProperties": [
"summary"
],
"class": "Article",
"classifyProperties": [
"hasPopularity"
],
"id": "ee722219-b8ec-4db1-8f8d-5150bb1a9e0c",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"minimumUsableWords": 3,
"status": "running",
"tfidfCutoffPercentile": 80,
"type": "knn",
"settings": {
"k": 3,
}
}

Evaluation of single data object results

The classification task will update the target property in each data object.

The results of a classification can be requested for the individual data objects through the v1/objects/{id}/?include=classification RESTful endpoint or with the GraphQL _additional {classification} field.

Zero-Shot Classification

Zero-shot classification is an unsupervised classification method, meaning you don't need any training data.

Weaviate's zero-shot classification measures how similar (how close) a data item is to a potential target item (a class or label).

More specifically, Weaviate uses vector search and similarity algorithms to classify data objects with other data objects. Internally, Weaviate performs a nearVector search (which you can also perform manually with GraphQL), and takes the closes result out of a given set of options (data objects) to classify.

Zero-shot classification works with all (text/image/..) vectorizers (or no vectorizer, as long as you have vectors stored in Weaviate).

Parameters

Required:

  • type: "zeroshot": the type of the classification.
  • class: the class name of the data objects to be classified.
  • classifyProperties: a list of properties to classify. They should be reference properties to other classes, each only referring to one class.

Optional, with default values:

  • Parameters to add limitations (based on e.g. background business knowledge).
    • filters: {} with the following possible properties:
      • sourceWhere: {}. Parameter to determine which data objects to classify (i.e. to leave out some data objects).
      • targetWhere: {}. Parameter to limit possible targets (i.e. to exclude possible target values).
      • trainingSetWhere: {}. Parameter to limit possible data objects in the training set.
      • All of sourceWhere, targetWhere and trainingSetWhere filters accept a where filter body.

Start a zeroshot classification

A classification can be started through one of the clients, or with a direct curl request to the RESTful API.

note

The Python client v4 API does not yet support classification tasks. Please use the client's v3 API for this.

A classification is started, and will run in the background. The following response is given after starting the classification, and the status can be fetched via the v1/classifications/{id} endpoint.

{
"class": "Article",
"classifyProperties": [
"ofCategory"
],
"id": "973e3b4c-4c1d-4b51-87d8-4d0343beb7ad",
"meta": {
"completed": "0001-01-01T00:00:00.000Z",
"started": "2020-09-09T14:57:08.468Z"
},
"status": "running",
"type": "zeroshot"
}

Evaluation of single data object results

The classification task will update the target property in each data object.

The results of a classification can be requested for the individual data objects through the v1/objects/{id}/?include=classification RESTful endpoint or with the GraphQL _additional {classification} field.