Spell Check
In short
- The Spell Check module is a Weaviate module for spell checking of raw text in GraphQL queries.
- The module depends on a Python spellchecking library.
- The module adds a
spellCheck {}
filter to the GraphQLnearText {}
search arguments. - The module returns the spelling check result in the GraphQL
_additional { spellCheck {} }
field.
Introduction
The Spell Check module is a Weaviate module for checking spelling in raw texts in GraphQL query inputs. Using the Python spellchecker library, the module analyzes text, gives a suggestion and can force an autocorrection.
How to enable (module configuration)
Docker Compose
The Spell Check module can be added as a service to the Docker Compose file. You must have a text vectorizer like text2vec-contextionary
or text2vec-transformers
running. An example Docker Compose file for using the text-spellcheck
module with the text2vec-contextionary
is here:
---
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.26.4
ports:
- 8080:8080
- 50051:50051
restart: on-failure:0
environment:
CONTEXTIONARY_URL: contextionary:9999
SPELLCHECK_INFERENCE_API: "http://text-spellcheck:8080"
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-contextionary'
ENABLE_MODULES: 'text2vec-contextionary,text-spellcheck'
CLUSTER_HOSTNAME: 'node1'
contextionary:
environment:
OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75
EXTENSIONS_STORAGE_MODE: weaviate
EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080
NEIGHBOR_OCCURRENCE_IGNORE_PERCENTILE: 5
ENABLE_COMPOUND_SPLITTING: 'false'
image: cr.weaviate.io/semitechnologies/contextionary:en0.16.0-v1.0.2
ports:
- 9999:9999
text-spellcheck:
image: cr.weaviate.io/semitechnologies/text-spellcheck-model:pyspellchecker-d933122
...
Variable explanations:
SPELLCHECK_INFERENCE_API
: where the spellcheck module is running
How to use (GraphQL)
Use the spellchecker module to verify at query time that user-provided search queries are spelled correctly and even suggest alternative, correct spellings. Filters that accept query text include:
nearText
, if atext2vec-*
module is usedask
, if theqna-transformers
module is enabled
There are two ways to use this module: spell checking, and autocorrection.
Spell checking
The module provides a new GraphQL _additional
property which can be used to check (but not alter) the provided queries.
Example query
- GraphQL
- Python
- JS/TS Client v2
- Go
- Java
- Curl
{
Get {
Article(nearText: {
concepts: ["houssing prices"]
}) {
title
_additional {
spellCheck {
changes {
corrected
original
}
didYouMean
location
originalText
}
}
}
}
}
import weaviate
client = weaviate.Client("http://localhost:8080")
near_text = {
"concepts": ["houssing prices"],
}
result = (
client.query
.get("Article", ["title", "_additional {spellCheck { change {corrected original} didYouMean location originalText}}"])
.with_near_text(near_text)
.do()
)
print(result)
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
const response = await client.graphql
.get()
.withClassName('Article')
.withFields('title _additional { spellCheck { change {corrected original} didYouMean location originalText } }')
.withNearText({
concepts: ['houssing prices'],
})
.do();
console.log(JSON.stringify(response, null, 2));
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/graphql"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
className := "Article"
fields := []graphql.Field{
{Name: "title"},
{Name: "_additional", Fields: []graphql.Field{
{Name: "spellCheck", Fields: []graphql.Field{
{Name: "change", Fields: []graphql.Field{
{Name: "corrected"},
{Name: "original"},
}},
{Name: "didYouMean"},
{Name: "location"},
{Name: "originalText"},
}},
}},
}
concepts := []string{"houssing prices"}
nearText := client.GraphQL().NearTextArgBuilder().
WithConcepts(concepts)
ctx := context.Background()
result, err := client.GraphQL().Get().
WithClassName(className).
WithFields(fields...).
WithNearText(nearText).
Do(ctx)
if err != nil {
panic(err)
}
fmt.Printf("%v", result)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.graphql.model.GraphQLResponse;
import io.weaviate.client.v1.graphql.query.argument.NearTextArgument;
import io.weaviate.client.v1.graphql.query.fields.Field;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
Field title = Field.builder().name("title").build();
Field _additional = Field.builder()
.name("_additional")
.fields(new Field[]{
Field.builder()
.name("spellCheck")
.fields(new Field[]{
Field.builder()
.name("change")
.fields(new Field[]{
Field.builder().name("corrected").build(),
Field.builder().name("original").build()
}).build(),
Field.builder().name("didYouMean").build(),
Field.builder().name("location").build(),
Field.builder().name("originalText").build()
}).build()
}).build();
NearTextArgument explore = client.graphQL().arguments().nearTextArgBuilder()
.concepts(new String[]{ "houssing prices" })
.build();
Result<GraphQLResponse> result = client.graphQL().get()
.withClassName("Article")
.withFields(title, _additional)
.withNearText(explore)
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
echo '{
"query": "{
Get {
Article(nearText: {
concepts: [\"houssing prices\"]
}) {
title
_additional {
spellCheck {
changes {
corrected
original
}
didYouMean
location
originalText
}
}
}
}
}"
}' | curl \
-X POST \
-H 'Content-Type: application/json' \
-d @- \
http://localhost:8080/v1/graphql
GraphQL response
The result is contained in a new GraphQL _additional
property called spellCheck
. It contains the following fields:
changes
: a list with the following fields:corrected
(string
): the corrected spelling if a correction is foundoriginal
(string
): the original word in the query
didYouMean
: the corrected full text in the queryoriginalText
: the original full text in the querylocation
: the location of the misspelled string in the query
Example response
{
"data": {
"Get": {
"Article": [
{
"_additional": {
"spellCheck": [
{
"changes": [
{
"corrected": "housing",
"original": "houssing"
}
],
"didYouMean": "housing prices",
"location": "nearText.concepts[0]",
"originalText": "houssing prices"
}
]
},
"title": "..."
}
]
}
},
"errors": null
}
Autocorrect
The module extends existing text2vec-*
modules with an autoCorrect
flag, which can be used to automatically correct the query if it was misspelled:
Example query
{
Get {
Article(nearText: {
concepts: ["houssing prices"],
autocorrect: true
}) {
title
_additional {
spellCheck {
changes {
corrected
original
}
didYouMean
location
originalText
}
}
}
}
}
Questions and feedback
If you have any questions or feedback, let us know in the user forum.