spellcheck

Weaviate on Stackoverflow badge Weaviate issues on Github badge Weaviate total Docker pulls badge

๐Ÿ’ก You are looking at older or release candidate documentation. The current Weaviate version is v1.15.2


In short

  • The SpellCheck module is a Weaviate module for spell checking of raw text in GraphQL queries.
  • The module depends on a Python spellchecking service.
  • The module adds an spellCheck {} filter to the GraphQL nearText {} search arguments.
  • The module returns the spelling check result in the GraphQL _additional { spellCheck {} } field.

Introduction

The SpellCheck module is a Weaviate module for checking spelling in raw texts in GraphQL query inputs. Using the Python spellchecker as service, the module analyzes text, gives a suggestion and can force an auto-correction.

How to enable (module configuration)

Docker-compose

The Q&A module can be added as a service to the Docker-compose file. You must have a text vectorizer like text2vec-contextionary or text2vec-transformers running. An example Docker-compose file for using the spellcheck module with the text2vec-contextionary is here:

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.9.0
    ports:
    - 8080:8080
    restart: on-failure:0
    environment:
      CONTEXTIONARY_URL: contextionary:9999
      SPELLCHECK_INFERENCE_API: "http://text-spellcheck:8080"
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-contextionary'
      ENABLE_MODULES: 'text2vec-contextionary,text-spellcheck'
  contextionary:
    environment:
      OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75
      EXTENSIONS_STORAGE_MODE: weaviate
      EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080
      NEIGHBOR_OCCURRENCE_IGNORE_PERCENTILE: 5
      ENABLE_COMPOUND_SPLITTING: 'false'
    image: semitechnologies/contextionary:en0.16.0-v1.0.2
    ports:
    - 9999:9999
  text-spellcheck:
    image: semitechnologies/text-spellcheck-model:pyspellchecker-d933122
...

Variable explanations:

  • SPELLCHECK_INFERENCE_API: where the spellcheck module is running

How to use (GraphQL)

Use the new spellchecker module to verify user-provided search queries (in existing nearText (given that a text2vec module is used) or ask (if the qna-transformers module is enabled) functions) are spelled correctly and even suggest alternative, correct spellings. Spell-checking happens at query time.

There are two ways to use this module:

  1. It provides a new GraphQL _additional property which can be used to check (but not alter) the provided queries, see query below.

Example query

  {
  Get {
    Article(nearText:{
      concepts: ["houssing prices"]
    }) {
      title
      _additional{
        spellCheck{
          changes{
            corrected
            original
          }
          didYouMean
          location
          originalText
        }
      }
    }
  }
}
  import weaviate

client = weaviate.Client("http://localhost:8080")

near_text = {
  "concepts": ["houssing prices"],
}

result = (
  client.query
  .get("Article", ["title", "_additional {spellCheck { change {corrected original} didYouMean location originalText}}"])
  .with_near_text(near_text)
  .do()
)

print(result)
  const weaviate = require("weaviate-client");

const client = weaviate.client({
  scheme: 'http',
  host: 'localhost:8080',
});

client.graphql
      .get()
      .withClassName('Article')
      .withFields('title _additional {spellCheck { change {corrected original} didYouMean location originalText}}')
      .withNearText({
        concepts: ["houssing prices"],
      })
      .do()
      .then(res => {
        console.log(res)
      })
      .catch(err => {
        console.error(err)
      });
  package main

import (
  "context"
  "fmt"

  "github.com/semi-technologies/weaviate-go-client/v4/weaviate"
  "github.com/semi-technologies/weaviate-go-client/v4/weaviate/graphql"
)

func main() {
  cfg := weaviate.Config{
    Host:   "localhost:8080",
    Scheme: "http",
  }
  client := weaviate.New(cfg)

  className := "Article"
  fields := []graphql.Field{
    {Name: "title"},
    {Name: "_additional", Fields: []graphql.Field{
      {Name: "spellCheck", Fields: []graphql.Field{
        {Name: "change", Fields: []graphql.Field{
          {Name: "corrected"},
          {Name: "original"},
        }},
        {Name: "didYouMean"},
        {Name: "location"},
        {Name: "originalText"},
      }},
    }},
  }
  concepts := []string{"houssing prices"}
  nearText := client.GraphQL().NearTextArgBuilder().
    WithConcepts(concepts)

  ctx := context.Background()
  result, err := client.GraphQL().Get().
    WithClassName(className).
    WithFields(fields...).
    WithNearText(nearText).
    Do(ctx)

  if err != nil {
    panic(err)
  }
  fmt.Printf("%v", result)
}
  package technology.semi.weaviate;

import technology.semi.weaviate.client.Config;
import technology.semi.weaviate.client.WeaviateClient;
import technology.semi.weaviate.client.base.Result;
import technology.semi.weaviate.client.v1.graphql.model.GraphQLResponse;
import technology.semi.weaviate.client.v1.graphql.query.argument.NearTextArgument;
import technology.semi.weaviate.client.v1.graphql.query.fields.Field;

public class App {
  public static void main(String[] args) {
    Config config = new Config("http", "localhost:8080");
    WeaviateClient client = new WeaviateClient(config);

    Field title = Field.builder().name("title").build();
    Field _additional = Field.builder()
      .name("_additional")
      .fields(new Field[]{
        Field.builder()
          .name("spellCheck")
          .fields(new Field[]{
            Field.builder()
              .name("change")
              .fields(new Field[]{
                Field.builder().name("corrected").build(),
                Field.builder().name("original").build()
              }).build(),
            Field.builder().name("didYouMean").build(),
            Field.builder().name("location").build(),
            Field.builder().name("originalText").build()
          }).build()
      }).build();

    NearTextArgument explore = client.graphQL().arguments().nearTextArgBuilder()
      .concepts(new String[]{ "houssing prices" })
      .build();

    Result<GraphQLResponse> result = client.graphQL().get()
      .withClassName("Article")
      .withFields(title, _additional)
      .withNearText(explore)
      .run();

    if (result.hasErrors()) {
      System.out.println(result.getError());
      return;
    }
    System.out.println(result.getResult());
  }
}
  $ echo '{ 
  "query": "{
    Get {
      Article(nearText:{
        concepts: ["houssing prices"]
      }) {
        title
        _additional{
          spellCheck{
            changes{
              corrected
              original
            }
            didYouMean
            location
            originalText
          }
        }
      }
    }
  }"
}' | curl \
    -X POST \
    -H 'Content-Type: application/json' \
    -d @- \
    http://localhost:8080/v1/graphql

GraphQL response

The result is contained in a new GraphQL _additional property called spellCheck. It contains the following fields:

  • changes: a list with the following fields:
    • corrected (string): the corrected spelling if a correction is found
    • original (string): the original spelled word in the query
  • didYouMean: the corrected full text in the query
  • originalText: the original full text in the query
  • location: the location of the misspelled string in the query

Example response

{
  "data": {
    "Get": {
      "Article": [
        {
          "_additional": {
            "spellCheck": [
              {
                "changes": [
                  {
                    "corrected": "housing",
                    "original": "houssing"
                  }
                ],
                "didYouMean": "housing prices",
                "location": "nearText.concepts[0]",
                "originalText": "houssing prices"
              }
            ]
          },
          "title": "..."
        }
      ]
    }
  },
  "errors": null
}
  1. It extends existing text2vec-modules with a autoCorrect flag, which can be used to correct the query if incorrect in the background:

Example query

{
  Get {
    Article(nearText:{
      concepts: ["houssing prices"],
      autocorrect: true
    }) {
      title
      _additional{
        spellCheck{
          changes{
            corrected
            original
          }
          didYouMean
          location
          originalText
        }
      }
    }
  }
}

๐ŸŸข Click here to try out this graphql example in the Weaviate Console.

More resources

If you canโ€™t find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For issues: Github. Or,
  5. Ask your question in the Slack channel: Slack.