The solution to TL;DRs - Weaviate's summarizer module

February 28, 2023 · 18 min read

Educator

(Note: You can skip to the TL;DR version below 😉)

How often do you find yourself facing a wall of text in an email, a report, or a paper, and letting out a sigh? Nobody enjoys hacking their way through boring, dense prose. Especially if it's just to see if the information is even relevant.

In this day and age, this is a more common problem than ever. For a while now, the bottleneck in knowledge work has been our rate of information discovery and consumption. So how do we solve this problem?

You probably already know that Weaviate, as a vector database, can help with information cataloging and discovery. But did you know that Weaviate can also summarize information during retrieval?

Our summarizer module (sum-transformers) can be added to a Weaviate instance to do exactly that.

And as a bonus, we will also show you how to use our new generative module (generative-openai) to do the same thing as well.

By the end, you will see how you can use Weaviate to reduce the amount of TL;DRs (too long\; did not read) in your life and in the lives of those around you.

`sum-transformers` in action

Since we’re talking about reducing TL;DRs - let’s cut to the chase. The sum-transformers module does one thing - summarize a piece of text into shorter text. For example, it will produce a pithy, to-the-point summary like this:

The Sydney Opera House is a multi-venue performing arts centre in Sydney, Australia. The building was designed by Danish architect Jørn Utzon and opened in 1973. It is one of the most popular visitor attractions in Australia, visited by more than eight million people annually.

From an original text that has about 7x the length!

See original text

The Sydney Opera House is a multi-venue performing arts centre in Sydney. Located on the foreshore of Sydney Harbour, it is widely regarded as one of the world's most famous and distinctive buildings and a masterpiece of 20th-century architecture. Designed by Danish architect Jørn Utzon, but completed by an Australian architectural team headed by Peter Hall, the building was formally opened by Queen Elizabeth II on 20 October 1973 after a gestation beginning with Utzon's 1957 selection as winner of an international design competition. The Government of New South Wales, led by the premier, Joseph Cahill, authorised work to begin in 1958 with Utzon directing construction. The government's decision to build Utzon's design is often overshadowed by circumstances that followed, including cost and scheduling overruns as well as the architect's ultimate resignation. The building and its surrounds occupy the whole of Bennelong Point on Sydney Harbour, between Sydney Cove and Farm Cove, adjacent to the Sydney central business district and the Royal Botanic Gardens, and near to the Sydney Harbour Bridge.

The building comprises multiple performance venues, which together host well over 1,500 performances annually, attended by more than 1.2 million people. Performances are presented by numerous performing artists, including three resident companies: Opera Australia, the Sydney Theatre Company and the Sydney Symphony Orchestra. As one of the most popular visitor attractions in Australia, the site is visited by more than eight million people annually, and approximately 350,000 visitors take a guided tour of the building each year. The building is managed by the Sydney Opera House Trust, an agency of the New South Wales State Government.

On 28 June 2007, the Sydney Opera House became a UNESCO World Heritage Site, having been listed on the (now defunct) Register of the National Estate since 1980, the National Trust of Australia register since 1983, the City of Sydney Heritage Inventory since 2000, the New South Wales State Heritage Register since 2003, and the Australian National Heritage List since 2005. The Opera House was also a finalist in the New7Wonders of the World campaign list.

Here are other examples, where the module produced summaries of biographical, mythical, and technical information:

Lewis Hamilton (80% reduction)

Summarized text

Sir Lewis Carl Davidson Hamilton (born 7 January 1985) is a British racing driver. In Formula One, Hamilton has won a joint-record seven World Drivers' Championship titles (tied with Michael Schumacher), and holds the records for the most wins (103), pole positions (103) and podium finishes (191) Hamilton joined the McLaren young driver programme in 1998 at the age of 13, becoming the youngest racing driver ever to be contracted by a Formula One team. After six years with McLaren, Hamilton signed with Mercedes in 2013.

Original text

Sir Lewis Carl Davidson Hamilton (born 7 January 1985) is a British racing driver currently competing in Formula One, driving for Mercedes-AMG Petronas Formula One Team. In Formula One, Hamilton has won a joint-record seven World Drivers' Championship titles (tied with Michael Schumacher), and holds the records for the most wins (103), pole positions (103), and podium finishes (191), among others.

Born and raised in Stevenage, Hertfordshire, Hamilton joined the McLaren young driver programme in 1998 at the age of 13, becoming the youngest racing driver ever to be contracted by a Formula One team. This led to a Formula One drive with McLaren for six years from 2007 to 2012, making Hamilton the first black driver to race in the series. In his inaugural season, Hamilton set numerous records as he finished runner-up to Kimi Räikkönen by one point. The following season, he won his maiden title in dramatic fashion—making a crucial overtake at the last corner on the last lap of the last race of the season—to become the then-youngest Formula One World Champion in history. After six years with McLaren, Hamilton signed with Mercedes in 2013.

Changes to the regulations for 2014 mandating the use of turbo-hybrid engines saw the start of a highly successful period for Hamilton, during which he won six further drivers' titles. Consecutive titles came in 2014 and 2015 during an intense rivalry with teammate Nico Rosberg. Following Rosberg's retirement in 2016, Ferrari's Sebastian Vettel became Hamilton's closest rival in two championship battles, in which Hamilton twice overturned mid-season point deficits to claim consecutive titles again in 2017 and 2018. His third and fourth consecutive titles followed in 2019 and 2020 to equal Schumacher's record of seven drivers' titles. Hamilton achieved his 100th pole position and race win during the 2021 season.

Hamilton has been credited with furthering Formula One's global following by appealing to a broader audience outside the sport, in part due to his high-profile lifestyle, environmental and social activism, and exploits in music and fashion. He has also become a prominent advocate in support of activism to combat racism and push for increased diversity in motorsport. Hamilton was the highest-paid Formula One driver from 2013 to 2021, and was ranked as one of the world's highest-paid athletes by Forbes of twenty-tens decade and 2021. He was also listed in the 2020 issue of Time as one of the 100 most influential people globally, and was knighted in the 2021 New Year Honours. Hamilton was granted honorary Brazilian citizenship in 2022.

The Loch Ness Monster (52% reduction)

Summarized text

The Loch Ness Monster is said to be a large, long-necked creature. Popular belief in the creature has varied since it was brought to worldwide attention in 1933. Evidence of its existence is disputed, with a number of disputed photographs and sonar readings. The pseudoscience and subculture of cryptozoology has placed particular emphasis on the creature.

Original text

The Loch Ness Monster (Scottish Gaelic: Uilebheist Loch Nis), affectionately known as Nessie, is a creature in Scottish folklore that is said to inhabit Loch Ness in the Scottish Highlands. It is often described as large, long-necked, and with one or more humps protruding from the water. Popular interest and belief in the creature has varied since it was brought to worldwide attention in 1933. Evidence of its existence is anecdotal, with a number of disputed photographs and sonar readings.

The scientific community explains alleged sightings of the Loch Ness Monster as hoaxes, wishful thinking, and the misidentification of mundane objects. The pseudoscience and subculture of cryptozoology has placed particular emphasis on the creature.

Bitmap Indexes (79% reduction)

Summarized text

A bitmap index is a special kind of database index that uses bitmaps. Bitmap indexes have a significant space and performance advantage over other structures for query of such data. Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated.

Original text

Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data. The extreme case of low cardinality is Boolean data (e.g., does a resident in a city have internet access?), which has two values, True and False. Bitmap indexes use bit arrays (commonly called bitmaps) and answer queries by performing bitwise logical operations on these bitmaps. Bitmap indexes have a significant space and performance advantage over other structures for query of such data. Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated: consequently, they are more often employed in read-only systems that are specialized for fast query - e.g., data warehouses, and generally unsuitable for online transaction processing applications.

Some researchers argue that bitmap indexes are also useful for moderate or even high-cardinality data (e.g., unique-valued data) which is accessed in a read-only manner, and queries access multiple bitmap-indexed columns using the AND, OR or XOR operators extensively. Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema.

If you examine these summaries, you will notice that these sentences are not lifted verbatim from the original text. Instead, what is produced is an abstractive summary, which is newly produced based on the original text.

The summarization module achieves this at query time by passing on the text that is retrieved from Weaviate to a language model that is trained specifically for summarization.

This means that you can set up your Weaviate instance to not only retrieve the most relevant query results for you but take the next step and add an overview of each retrieved object.

Summarized documents are easier to understand

So, in a parallel fashion to the generative module, you can get back the information stored in Weaviate and then some.

Instead of this:

{
  "data": {
    "Get": {
      "Article": [
        {
          "title": "Sydney Opera House",
          "url": "https://en.wikipedia.org/wiki/Sydney_Opera_House",
          "wiki_summary": ...
        },
        ...
      ]
    }
  }
}

You can get back:

{
  "data": {
    "Get": {
      "Article": [
        {
          "_additional": {
            "summary": [
              {
                "property": "wiki_summary",
                "result": "<GENERATED_SUMMARY>"
              }
            ]
          },
          "title": "Sydney Opera House",
          "url": "https://en.wikipedia.org/wiki/Sydney_Opera_House",
          "wiki_summary": ...
        },
        ...
      ]
    }
  }
}

Where the <GENERATED_SUMMARY> is actually not something that is stored in Weaviate!

Here is how:

Using `sum-transformers`

The required steps to make use of this module are as follows:

Enable the module in your Docker Compose file (e.g. docker-compose.yml).

Add this to your query:

_additional { summary ( properties: ["<FIELD_TO_BE_SUMMARIZED>"]) { property result } }

Parse the results!

That's all there is to it. Let's take a look at each step in more detail.

Configuration (`docker-compose.yml`) file

Use the sum-transformers module in addition to another vectorizer module, such as text2vec-transformers, or an inference-API-based model such as text2vec-openai/cohere/huggingface.

Accordingly, the relevant lines are:

---
services:
  weaviate:
    ...
    environment:
      ...
      ENABLE_MODULES: 'text2vec-contextionary,sum-transformers'
  ...
  sum-transformers:
    image: cr.weaviate.io/semitechnologies/sum-transformers:facebook-bart-large-cnn-1.0.0
    # image: cr.weaviate.io/semitechnologies/sum-transformers:google-pegasus-xsum-1.2.0  # Alternative model
    environment:
      ENABLE_CUDA: '0'
...

Notice that the ENABLE_MODULES variable includes sum-transformers, and there is a sum-transformers section that specifies the Docker image to be used, specifies the summarizer model, and whether to use CUDA (GPU) acceleration.

Example of an entire configuration yaml

---
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.17.4
    ports:
    - 8080:8080
    restart: on-failure:0
    environment:
      CONTEXTIONARY_URL: contextionary:9999
      SUM_INFERENCE_API: 'http://sum-transformers:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-contextionary'
      ENABLE_MODULES: 'text2vec-contextionary,sum-transformers'
      CLUSTER_HOSTNAME: 'node1'
  contextionary:
    environment:
      OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75
      EXTENSIONS_STORAGE_MODE: weaviate
      EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080
      NEIGHBOR_OCCURRENCE_IGNORE_PERCENTILE: 5
      ENABLE_COMPOUND_SPLITTING: 'false'
    image: cr.weaviate.io/semitechnologies/contextionary:en0.16.0-v1.0.2
    ports:
    - 9999:9999
  sum-transformers:
    image: cr.weaviate.io/semitechnologies/sum-transformers:facebook-bart-large-cnn-1.0.0
    # image: cr.weaviate.io/semitechnologies/sum-transformers:google-pegasus-xsum-1.2.0  # Alternative model
    environment:
      ENABLE_CUDA: '0'
...

With these variables specified, an instance of Weaviate can be spun up with the summarizer module enabled. Next, it's a matter of making queries:

For Weaviate Cloud - use generative-openai

At the time of writing, Weaviate Cloud (WCD) instances do not support the sum-transformers module.

However, you can perform summarizations with the generative-openai module, by providing a specific prompt. Take a look below 😉.

Results with summaries

Summaries are available at query time. In other words, they are not pre-determined but generated from the results retrieved by the sum-transformers model.

This can be triggered as an additional property, with a GraphQL query syntax like so:

{
  Get {
    <CLASS_NAME> {
      _additional {
        summary (
          properties: ["<FIELD_TO_BE_SUMMARIZED>"],
        ) {
          property
          result
        }
      }
    }
  }
}

Where the _additional { summary ... } instructs Weaviate to carry out the summarization.

In this syntax, properties specifies the fields to be summarized, where each summary will include a property that echoes the field name, and result is the summarized text output.

Thus, it will produce a result like:

{
  "data": {
    "Get": {
      "Article": [
        {
          "_additional": {
            "summary": [
              {
                "property": "wiki_summary",
                "result": "The Loch Ness Monster is said to be a large, long-necked creature. Popular belief in the creature has varied since it was brought to worldwide attention in 1933. Evidence of its existence is disputed, with a number of disputed photographs and sonar readings. The pseudoscience and subculture of cryptozoology has placed particular emphasis on the creature."
              }
            ]
          },
          "title": "Loch Ness Monster"
        },
      ]
    }
  }
}

Note that in this example, Article was the class name, wiki_summary was the input text and the query included the title field also.

See the full GraphQL query

{
  Get {
    Article (
      limit: 1
      where: {
        path: ["title"],
        operator: Like,
        valueText: "loch ness"
      }
      ) {
      title
      _additional {
        summary (
          properties: ["wiki_summary"],
        ) {
          property
          result
        }
      }
    }
  }
}

Grabbing these results is relatively straightforward. But as with many tasks that involve language models, there are a few tips and tricks to help you get the most out of this wonderful tool.

Alternative: `generative-openai`

You might have heard of our new generative-openai module already. It can take your results, and provide a prompt to a generative model to get back a response.

So, this module can also be used to generate summaries by prompting it directly to do so. Below is one such example:

{
  Get {
    Article(
      nearText: {
        concepts: ["Bitmap Index"]
      }
      limit: 1
    ) {
      title
      wiki_summary
      _additional {
        generate(
          singleResult: {
            prompt: """
              Describe the following as a short summary: {wiki_summary}
            """
          }
        ) {
          singleResult
          error
        }
      }
    }
  }
}

This produced:

A bitmap index is a special kind of database index that uses bitmaps to efficiently store and query low-cardinality data. It is often used in read-only systems such as data warehouses, and is less efficient than traditional B-tree indexes for frequently updated data. Some researchers argue that bitmap indexes can also be used for moderate or high-cardinality data, and are useful for joining large fact tables to smaller dimension tables in data warehousing applications.

From the original text below:

See original text

A bitmap index is a special kind of database index that uses bitmaps.

While this is not a custom-trained summarization model, this may be a great alternative solution where sum-transformers module cannot be used, or where you may wish to leverage flexible, large language models that are only available through APIs.

Best practice notes

GPU usage

The sum-transformers module is configured to spin up a Docker container on your system to carry out the inference (i.e. summarization) task. While inference tasks are far less resource intensive than training, they are still non-trivial.

It will run much faster on systems that support GPU acceleration with CUDA. CPUs may be fine for occasional or private, evaluation use, but for any kind of deployment, we highly recommend utilizing GPU acceleration.

Model choice

Currently, the sum-transformers module uses the bart-large-cnn model under the hood by default, with an option for the pegasus-xsum model. Both of these are well-known, high-performance models trained by Facebook and Google respectively.

In addition to these two models, however, you can use any model from the Hugging Face Hub (or your own) by following this guide.

Even when looking only at language models that are trained for summarization tasks, there is still a wide range of choices in terms of sheer numbers, which vary in the target domain (e.g. medical, legal, scientific, etc.) and size (number of parameters, i.e. speed). If you have specific needs, we recommend investigating other models.

Avoid too long an input

All transformer models have a maximum input length size. For example, bart-large-cnn has a maximum limit of 1024 tokens, where each token is part of a word (i.e. a few characters).

Inputting too long a text into the summarizer module will cause it to throw an error. Accordingly, if you envisage generating summaries using sum-transformers, we recommend considering chunking your data at import time to make sure that the content of each field is shorter than the maximum length.

But also... not too short

The Goldilocks Principle also applies here, where too short an input may cause the summarizer model to misbehave (i.e. hallucinate).

When fed insufficient input, the model may "pad" the output with tokens that have very little grounding in the truth. Generally speaking, an input length that is shorter than a typical "summary" output length by the model is inadvisable.

Do more with less

All in all, summarizing your data with Weaviate can be another valuable tool in your toolkit to reduce TL;DRs in our lives. By combining the power of vector search with summarization capabilities, Weaviate offers a practical, easy solution to the problem of information overload.

So instead of slashing your way through thickets of prose to look for the information you need, work smarter! Whether it's with generate-openai or sum-transformers - let Weaviate lift you up to give you a bird's eye view of the whole landscape.

Bonus: TL;DR version (edited):

Weaviate can also summarize information during retrieval through the use of its summarizer module, sum-transformers.

This module can shorten a piece of text into a pithy, to-the-point summary by passing the text retrieved from Weaviate to a language model trained specifically for summarization.

By using Weaviate to summarize your data, you can reduce the amount of too long; did not read (TL;DR) content in your life and reduce the problem of information overload. The sum-transformers module uses the bart-large-cnn model by default, with an option for the pegasus-xsum model, but any model from Hugging Face Hub can be used.

Ready to start building?

Check out the Quickstart tutorial, or build amazing apps with a free trial of Weaviate Cloud (WCD).

GitHub

Forum

Slack

X (Twitter)

Don't want to miss another blog post?

By submitting, I agree to the Terms of Service and Privacy Policy.

sum-transformers in action​

Original text​

Original text​

Original text​

Using sum-transformers​

Configuration (docker-compose.yml) file​

Results with summaries​

Alternative: generative-openai​

Best practice notes​

GPU usage​

Model choice​

Avoid too long an input​

But also... not too short​

Do more with less​

Bonus: TL;DR version (edited):​

Ready to start building?​

Don't want to miss another blog post?

`sum-transformers` in action

Original text

Original text

Original text

Using `sum-transformers`

Configuration (`docker-compose.yml`) file

Results with summaries

Alternative: `generative-openai`

Best practice notes

GPU usage

Model choice

Avoid too long an input

But also... not too short

Do more with less

Bonus: TL;DR version (edited):

Ready to start building?