spellcheck

Weaviate on Stackoverflow badge Weaviate issues on Github badge Weaviate v1.13.2 version badge Weaviate v1.13.2 version badge Weaviate total Docker pulls badge


In short

  • The SpellCheck module is a Weaviate module for spell checking of raw text in GraphQL queries.
  • The module depends on a Python spellchecking service.
  • The module adds an spellCheck {} filter to the GraphQL nearText {} search arguments.
  • The module returns the spelling check result in the GraphQL _additional { spellCheck {} } field.

Introduction

The SpellCheck module is a Weaviate module for checking spelling in raw texts in GraphQL query inputs. Using the Python spellchecker as service, the module analyzes text, gives a suggestion and can force an auto-correction.

How to enable (module configuration)

Docker-compose

The Q&A module can be added as a service to the Docker-compose file. You must have a text vectorizer like text2vec-contextionary or text2vec-transformers running. An example Docker-compose file for using the spellcheck module with the text2vec-contextionary is here:

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.9.0
    ports:
    - 8080:8080
    restart: on-failure:0
    environment:
      CONTEXTIONARY_URL: contextionary:9999
      SPELLCHECK_INFERENCE_API: "http://text-spellcheck:8080"
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-contextionary'
      ENABLE_MODULES: 'text2vec-contextionary,text-spellcheck'
  contextionary:
    environment:
      OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75
      EXTENSIONS_STORAGE_MODE: weaviate
      EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080
      NEIGHBOR_OCCURRENCE_IGNORE_PERCENTILE: 5
      ENABLE_COMPOUND_SPLITTING: 'false'
    image: semitechnologies/contextionary:en0.16.0-v1.0.2
    ports:
    - 9999:9999
  text-spellcheck:
    image: semitechnologies/text-spellcheck-model:pyspellchecker-d933122
...

Variable explanations:

  • SPELLCHECK_INFERENCE_API: where the spellcheck module is running

How to use (GraphQL)

Use the new spellchecker module to verify user-provided search queries (in existing nearText (given that a text2vec module is used) or ask (if the qna-transformers module is enabled) functions) are spelled correctly and even suggest alternative, correct spellings. Spell-checking happens at query time.

There are two ways to use this module:

  1. It provides a new GraphQL _additional property which can be used to check (but not alter) the provided queries, see query below.

Example query

  {
  Get {
    Article(nearText:{
      concepts: ["houssing prices"]
    }) {
      title
      _additional{
        spellCheck{
          changes{
            corrected
            original
          }
          didYouMean
          location
          originalText
        }
      }
    }
  }
}
  import weaviate

client = weaviate.Client("http://localhost:8080")

near_text = {
  "concepts": ["houssing prices"],
}

result = (
  client.query
  .get("Article", ["title", "_additional {spellCheck { change {corrected original} didYouMean location originalText}}"])
  .with_near_text(near_text)
  .do()
)

print(result)
  const weaviate = require("weaviate-client");

const client = weaviate.client({
  scheme: 'http',
  host: 'localhost:8080',
});

client.graphql
      .get()
      .withClassName('Article')
      .withFields('title _additional {spellCheck { change {corrected original} didYouMean location originalText}}')
      .withNearText({
        concepts: ["houssing prices"],
      })
      .do()
      .then(res => {
        console.log(res)
      })
      .catch(err => {
        console.error(err)
      });
  package main

import (
  "context"
  "fmt"

  "github.com/semi-technologies/weaviate-go-client/v4/weaviate"
  "github.com/semi-technologies/weaviate-go-client/v4/weaviate/graphql"
)

func main() {
  cfg := weaviate.Config{
    Host:   "localhost:8080",
    Scheme: "http",
  }
  client := weaviate.New(cfg)

  className := "Article"
  fields := []graphql.Field{
    {Name: "title"},
    {Name: "_additional", Fields: []graphql.Field{
      {Name: "spellCheck", Fields: []graphql.Field{
        {Name: "change", Fields: []graphql.Field{
          {Name: "corrected"},
          {Name: "original"},
        }},
        {Name: "didYouMean"},
        {Name: "location"},
        {Name: "originalText"},
      }},
    }},
  }
  concepts := []string{"houssing prices"}
  nearText := client.GraphQL().NearTextArgBuilder().
    WithConcepts(concepts)

  ctx := context.Background()
  result, err := client.GraphQL().Get().
    WithClassName(className).
    WithFields(fields...).
    WithNearText(nearText).
    Do(ctx)

  if err != nil {
    panic(err)
  }
  fmt.Printf("%v", result)
}
  package technology.semi.weaviate;

import technology.semi.weaviate.client.Config;
import technology.semi.weaviate.client.WeaviateClient;
import technology.semi.weaviate.client.base.Result;
import technology.semi.weaviate.client.v1.graphql.model.GraphQLResponse;
import technology.semi.weaviate.client.v1.graphql.query.argument.NearTextArgument;
import technology.semi.weaviate.client.v1.graphql.query.fields.Field;

public class App {
  public static void main(String[] args) {
    Config config = new Config("http", "localhost:8080");
    WeaviateClient client = new WeaviateClient(config);

    Field title = Field.builder().name("title").build();
    Field _additional = Field.builder()
      .name("_additional")
      .fields(new Field[]{
        Field.builder()
          .name("spellCheck")
          .fields(new Field[]{
            Field.builder()
              .name("change")
              .fields(new Field[]{
                Field.builder().name("corrected").build(),
                Field.builder().name("original").build()
              }).build(),
            Field.builder().name("didYouMean").build(),
            Field.builder().name("location").build(),
            Field.builder().name("originalText").build()
          }).build()
      }).build();

    NearTextArgument explore = client.graphQL().arguments().nearTextArgBuilder()
      .concepts(new String[]{ "houssing prices" })
      .build();

    Result<GraphQLResponse> result = client.graphQL().get()
      .withClassName("Article")
      .withFields(title, _additional)
      .withNearText(explore)
      .run();

    if (result.hasErrors()) {
      System.out.println(result.getError());
      return;
    }
    System.out.println(result.getResult());
  }
}
  $ echo '{ 
  "query": "{
    Get {
      Article(nearText:{
        concepts: ["houssing prices"]
      }) {
        title
        _additional{
          spellCheck{
            changes{
              corrected
              original
            }
            didYouMean
            location
            originalText
          }
        }
      }
    }
  }"
}' | curl \
    -X POST \
    -H 'Content-Type: application/json' \
    -d @- \
    http://localhost:8080/v1/graphql

GraphQL response

The result is contained in a new GraphQL _additional property called spellCheck. It contains the following fields:

  • changes: a list with the following fields:
    • corrected (string): the corrected spelling if a correction is found
    • original (string): the original spelled word in the query
  • didYouMean: the corrected full text in the query
  • originalText: the original full text in the query
  • location: the location of the misspelled string in the query

Example response

{
  "data": {
    "Get": {
      "Article": [
        {
          "_additional": {
            "spellCheck": [
              {
                "changes": [
                  {
                    "corrected": "housing",
                    "original": "houssing"
                  }
                ],
                "didYouMean": "housing prices",
                "location": "nearText.concepts[0]",
                "originalText": "houssing prices"
              }
            ]
          },
          "title": "..."
        }
      ]
    }
  },
  "errors": null
}
  1. It extends existing text2vec-modules with a autoCorrect flag, which can be used to correct the query if incorrect in the background:

Example query

{
  Get {
    Article(nearText:{
      concepts: ["houssing prices"],
      autocorrect: true
    }) {
      title
      _additional{
        spellCheck{
          changes{
            corrected
            original
          }
          didYouMean
          location
          originalText
        }
      }
    }
  }
}

🟢 Click here to try out this graphql example in the Weaviate Console.

More resources

If you can’t find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For issues: Github. Or,
  5. Ask your question in the Slack channel: Slack.