Core Knowledge
Getting started

Installation
Configuration
Client libraries

Schema

GraphQL references
RESTful API references

Modules

    Roadmap
    Architecture
    Vector Index (ANN) Plugins
    Benchmarks

    Tutorials
    More resources

    Summarization

    Weaviate on Stackoverflow badge Weaviate issues on Github badge Weaviate v1.15.2 version badge Weaviate v1.15.2 version badge Weaviate total Docker pulls badge


    In short

    • The Summarization (SUM) module is a Weaviate module that summarizes whole paragraps into a short text.
    • The module depends on a SUM Transformers model that should be running with Weaviate. There are pre-built models available, but you can also attach another HuggingFace Transformer or custom SUM model.
    • The module adds a summary {} filter to the GraphQL _additional {} field.
    • The module returns the results in the GraphQL _additional { summary {} } field.

    Introduction

    The Summarization module is a Weaviate module that is used to summarize Weaviate text objects at query time.

    For example, it allows us to run a query on our data in Weaviate, which can take a text like this:

    "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."

    and transform it to a short sentence like this:

    "The Eiffel Tower is a landmark in Paris, France."

    Note, for maximum performance of your queries, transformer-based models should run with GPUs.

    CPUs can be used, however, this will significantly slow down your queries.

    Available modules

    Here is the current list of available SUM modules - sourced from Huggingface:

    How to enable (module configuration)

    Docker-compose

    The SUM module can be added as a service to the Docker-compose file. You must have a text vectorizer like text2vec-contextionary or text2vec-transformers running. An example Docker-compose file for using the sum-transformers module (facebook-bart-large-cnn) in combination with the text2vec-contextionary:

    ---
    version: '3.4'
    services:
      weaviate:
        command:
        - --host
        - 0.0.0.0
        - --port
        - '8080'
        - --scheme
        - http
        image: semitechnologies/weaviate:1.15.2
        ports:
        - 8080:8080
        restart: on-failure:0
        environment:
          CONTEXTIONARY_URL: contextionary:9999
          SUM_INFERENCE_API: "http://sum-transformers:8080"
          QUERY_DEFAULTS_LIMIT: 25
          AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
          PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
          DEFAULT_VECTORIZER_MODULE: 'text2vec-contextionary'
          ENABLE_MODULES: 'text2vec-contextionary,sum-transformers'
          CLUSTER_HOSTNAME: 'node1'
      contextionary:
        environment:
          OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75
          EXTENSIONS_STORAGE_MODE: weaviate
          EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080
          NEIGHBOR_OCCURRENCE_IGNORE_PERCENTILE: 5
          ENABLE_COMPOUND_SPLITTING: 'false'
        image: semitechnologies/contextionary:en0.16.0-v1.0.2
        ports:
        - 9999:9999
      sum-transformers:
        image: semitechnologies/sum-transformers:facebook-bart-large-cnn-1.0.0
    ...
    

    Variable explanations:

    • SUM_INFERENCE_API: where the summarization module is running

    How to use (GraphQL)

    To make use of the modules capabilities, simply extend your query with the following new _additional property:

    GraphQL Token

    This module adds a search filter to the GraphQL _additional field in queries: summary{}. This new filter takes the following arguments:

    FieldData TypeRequiredExample valueDescription
    propertieslist of stringsyes["description"]The properties of the queries Class which contains text (text or string Datatype). You must provide at least one property

    Example query

      {
      Get {
        Article(
          limit: 1
        ) {
          title
          _additional{
            summary(
              properties: ["summary"],
            ) {
              property
              result
            }
          }
        }
      }
    }
    
      import weaviate
    
    client = weaviate.Client("http://localhost:8080")
    
    result = (
      client.query
      .get('Article', ['title', '_additional {summary ( properties: ["summary"]) { property result }}'])
      .do()
    )
    
    print(result)
    
      const weaviate = require("weaviate-client");
    
    const client = weaviate.client({
      scheme: 'http',
      host: 'localhost:8080',
    });
    
    client.graphql
          .get()
          .withClassName('Article')
          .withFields('title _additional {summary ( properties: [\'summary\']) { property result } }')
          .do()
          .then(res => {
            console.log(res)
          })
          .catch(err => {
            console.error(err)
          });
    
      package main
    
    import (
      "context"
      "fmt"
    
      "github.com/semi-technologies/weaviate-go-client/v4/weaviate"
      "github.com/semi-technologies/weaviate-go-client/v4/weaviate/graphql"
    )
    
    func main() {
      cfg := weaviate.Config{
        Host:   "localhost:8080",
        Scheme: "http",
      }
      client := weaviate.New(cfg)
    
      className := "Article"
      fields := []graphql.Field{
        {Name: "title"},
        {Name: "_additional", Fields: []graphql.Field{
          {Name: "summary(properties: [\"summary\"])", Fields: []graphql.Field{
            {Name: "property"},
            {Name: "result"},
          }},
        }},
      }
    
      result, err := client.GraphQL().Get().
        WithClassName(className).
        WithFields(fields...).
        Do(context.Background())
    
      if err != nil {
        panic(err)
      }
      fmt.Printf("%v", result)
    }
    
      package technology.semi.weaviate;
    
    import technology.semi.weaviate.client.Config;
    import technology.semi.weaviate.client.WeaviateClient;
    import technology.semi.weaviate.client.base.Result;
    import technology.semi.weaviate.client.v1.graphql.model.GraphQLResponse;
    import technology.semi.weaviate.client.v1.graphql.query.fields.Field;
    
    public class App {
      public static void main(String[] args) {
        Config config = new Config("http", "localhost:8080");
        WeaviateClient client = new WeaviateClient(config);
    
        Field title = Field.builder().name("title").build();
        Field _additional = Field.builder()
          .name("_additional")
          .fields(new Field[]{
            Field.builder()
              .name("summary (properties: [\"summary\"])")
              .fields(new Field[]{
                Field.builder().name("property").build(),
                Field.builder().name("result").build()
              }).build()
          }).build();
    
        Result<GraphQLResponse> result = client.graphQL().get()
          .withClassName("Article")
          .withFields(title, _additional)
          .run();
    
        if (result.hasErrors()) {
          System.out.println(result.getError());
          return;
        }
        System.out.println(result.getResult());
      }
    }
    
      $ echo '{ 
      "query": "{
        Get {
          Article(
            limit: 1
          ) {
            title
            _additional{
              summary(
                properties: [\"summary\"],
              ) {
                property
                result
              }
            }
          }
        }
      }"
    }' | curl \
        -X POST \
        -H 'Content-Type: application/json' \
        -d @- \
        http://localhost:8080/v1/graphql
    

    GraphQL response

    The answer is contained in a new GraphQL _additional property called summary, which returns a list of tokens. It contains the following fields:

    • property (string): The property that was summarized – this is useful when you summarize more than one property
    • result (string): The output summary

    Example response

    {
      "data": {
        "Get": {
          "Article": [
            {
              "_additional": {
                "summary": [
                  {
                    "property": "summary",
                    "result": "Finding the perfect pair of jeans can be a challenge."
                  }
                ]
              },
              "title": "The Most Comfortable Gap Jeans to Shop Now"
            }
          ]
        }
      },
      "errors": null
    }
    

    Use another Summarization module from HuggingFace

    You can build a Docker image which supports any summarization model from the Huggingface model hub with a two-line Dockerfile. In the following example, we are going to build a custom image for the google/pegasus-pubmed model.

    Step 1: Create a Dockerfile

    Create a new Dockerfile. We will name it my-model.Dockerfile. Add the following lines to it:

    FROM semitechnologies/sum-transformers:custom
    RUN chmod +x ./download.py
    RUN MODEL_NAME=google/pegasus-pubmed ./download.py
    

    Step 2: Build and tag your Dockerfile.

    We will tag our Dockerfile as google-pegasus-pubmed:

    docker build -f my-model.Dockerfile -t google-pegasus-pubmed .
    

    Step 3: That’s it!

    You can now push your image to your favorite registry or reference it locally in your Weaviate docker-compose.yaml using the Docker tag google-pegasus-pubmed.

    How it works (under the hood)

    The code for the application in this repo works well with models that take in a text input like:

    But, similar to finding a bathing suit that fits in all the right places, discovering a new pair of perfectly cut and comfortable jeans can be thrilling. The Gap jeans are as classic as they come and they give me that figure-hugging fit without cutting off my circulation. At the moment, there are TikTok videos circulating about these jeans, making me one member of an ever-growing fan club. While good jeans are priceless, these bonafide confidence boosters are also now on sale for $63. Trust me, there’s no time like the present to break up with those all day, every day sweatpants and slip on some hero denim.

    then summarize it and return information in JSON format like this:

    [
      {
        "result": "Finding the perfect pair of jeans can be a challenge."
      }
    ]
    

    The Weaviate SUM Module then takes this output and processes this to GraphQL output.

    More resources

    If you can’t find the answer to your question here, please look at the:

    1. Frequently Asked Questions. Or,
    2. Knowledge base of old issues. Or,
    3. For questions: Stackoverflow. Or,
    4. For issues: Github. Or,
    5. Ask your question in the Slack channel: Slack.