text2vec-transformers
Overview
The text2vec-transformers
module enables Weaviate to obtain vectors locally from text using a transformers-based model.
text2vec-transformers
encapsulates models in Docker containers, which allows independent scaling on GPU-enabled hardware while keeping Weaviate on CPU-only hardware, as Weaviate is CPU-optimized.
Key notes:
- This module is not available on Weaviate Cloud Services (WCS).
- Enabling this module will enable the
nearText
search operator. - This module is only compatible with models encapsulated in a Docker container.
- Pre-built images are available with popular models.
- You can also use other models, such as:
- By building an image for any publicly available model from the Hugging Face model hub.
- By building an image for any model compatible with Hugging Face's
AutoModel
andAutoTokenizer
.
Transformer model inference speeds are usually about ten times faster with GPUs. If you have a GPU, use one of the GPU enabled models.
If you use text2vec-transformers
without GPU acceleration, imports or nearText
queries may become bottlenecks. The ONNX-enabled images can use ONNX Runtime for faster inference processing on CPUs. Look for the -onnx
suffix in the image name.
Alternatively, consider one of the following options:
- an API-based module such as
text2vec-cohere
ortext2vec-openai
- a local inference container such as
text2vec-contextionary
ortext2vec-gpt4all
Weaviate instance configuration
This module is not available on Weaviate Cloud Services.
Docker Compose file
To use text2vec-transformers
, you must enable it in your Docker Compose file (e.g. docker-compose.yml
).
While you can do so manually, we recommend using the Weaviate configuration tool to generate the Docker Compose
file.
Parameters
Weaviate:
ENABLE_MODULES
(Required): The modules to enable. Includetext2vec-transformers
to enable the module.DEFAULT_VECTORIZER_MODULE
(Optional): The default vectorizer module. You can set this totext2vec-transformers
to make it the default for all collections.TRANSFORMERS_INFERENCE_API
(Required): The URL of the default inference container.USE_SENTENCE_TRANSFORMERS_VECTORIZER
(Optional): (EXPERIMENTAL) Use thesentence-transformer
vectorizer instead of the default vectorizer (from thetransformers
library). Applies to custom images only.
Inference container:
v1.24.2
As of Weaviate v1.24.2
, you can use multiple inference containers with text2vec-transformers
. This allows you to use different models for different collections by setting the inferenceUrl
in the collection configuration.
image
(Required): The image name of the inference container.ENABLE_CUDA
(Optional): Set to1
to enable GPU usage. Default is0
(CPU only).
Example
This configuration enables text2vec-transformers
, sets it as the default vectorizer, and sets the parameters for the Transformers Docker container, including setting it to use sentence-transformers-multi-qa-MiniLM-L6-cos-v1
image and to disable CUDA acceleration.
version: '3.4'
services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:1.24.4
restart: on-failure:0
ports:
- 8080:8080
- 50051:50051
environment:
QUERY_DEFAULTS_LIMIT: 20
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: "./data"
ENABLE_MODULES: text2vec-transformers
DEFAULT_VECTORIZER_MODULE: text2vec-transformers
TRANSFORMERS_INFERENCE_API: http://t2v-transformers:8080
CLUSTER_HOSTNAME: 'node1'
t2v-transformers: # Set the name of the inference container
image: cr.weaviate.io/semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
environment:
ENABLE_CUDA: 0 # set to 1 to enable
# Set additional inference containers here if desired
...
Make sure to enable CUDA if you have a compatible GPU available (ENABLE_CUDA=1
) to take advantage of GPU acceleration.
Alternative: Run a separate container
As an alternative, you can run the inference container independently from Weaviate. To do so, you can:
- Enable
text2vec-transformers
in your Docker Compose file, - Omit
t2v-transformers
parameters, - Run the inference container separately, e.g. using Docker, and
- Use
TRANSFORMERS_INFERENCE_API
orinferenceUrl
to set the URL of the inference container.
For example, choose any of our pre-built transformers models and spin it up - for example using:
docker run -itp "8000:8080" semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
Then, for example if Weaviate is running outside of Docker, set TRANSFORMERS_INFERENCE_API="http://localhost:8000"
. Alternatively if Weaviate is part of the same Docker network, e.g. because they are part of the same docker-compose.yml
file, you can use Docker networking/DNS, such as TRANSFORMERS_INFERENCE_API=http://t2v-transformers:8080
.
Collection configuration
You can configure how the module will behave in each collection through the Weaviate schema.
Vectorization settings
You can set vectorizer behavior using the moduleConfig
section under each collection and property:
Collection-level
vectorizer
- what module to use to vectorize the data.vectorizeClassName
– whether to vectorize the collection name. Default:true
.poolingStrategy
– the pooling strategy to use. Default:masked_mean
. Allowed values:masked_mean
orcls
. (Read more on this topic.)inferenceUrl
– the URL of the inference container, for when using multiple inference containers (e.g.http://service-name:8080
). Default:http://t2v-transformers:8080
.
queryInferenceUrl
&passageInferenceUrl
– the URL of the inference container for query and passage respectively, for when using multiple inference containers with aDPR
type model (e.g.http://service-name:8080
).
You can only set one of inferenceUrl
or (queryInferenceUrl
and passageInferenceUrl
). If you are running a DPR model, set queryInferenceUrl
and passageInferenceUrl
to use different inference containers for queries and passages when using inference containers with a DPR type model.
Property-level
skip
– whether to skip vectorizing the property altogether. Default:false
vectorizePropertyName
– whether to vectorize the property name. Default:false
Example
{
"classes": [
{
"class": "Document",
"description": "A collection called document",
"vectorizer": "text2vec-transformers",
"moduleConfig": {
"text2vec-transformers": {
"vectorizeClassName": false,
"inferenceUrl": "http://t2v-transformers:8080", // Optional. Set to use a different inference container when using multiple inference containers.
// Note: You can only set one of `inferenceUrl` or (`queryInferenceUrl` and `passageInferenceUrl`).
// Set 'inferenceUrl' to use a different inference container when using multiple inference containers with most (i.e. non-DPR type) models.
// Set 'queryInferenceUrl' and 'passageInferenceUrl' to use different inference containers for queries and passages when using multiple inference containers with a DPR type model.
// "queryInferenceUrl": "http://t2v-transformers-query:8080", // Optional. Set to use a different inference container for queries when using multiple inference containers with a DPR type model.
// "passageInferenceUrl": "http://t2v-transformers-passage:8080" // Optional. Set to use a different inference container for passages when using multiple inference containers with a DPR type model.
}
},
"properties": [
{
"name": "content",
"dataType": [
"text"
],
"description": "Content that will be vectorized",
"moduleConfig": {
"text2vec-transformers": {
"skip": false,
"vectorizePropertyName": false
}
}
}
],
}
]
}
Select a model
To select a model, please point text2vec-transformers
to the appropriate Docker container.
You can use one of our pre-built Docker images, or build your own (with just a few lines of code).
This allows you to use any suitable model from the Hugging Face model hub or your own custom model.
Use a pre-built image
We have built images from publicly available models that in our opinion are well suited for semantic search. You can use any of the following:
List of pre-built images
Model Name | Image Name |
---|---|
distilbert-base-uncased (Info) | semitechnologies/transformers-inference:distilbert-base-uncased |
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (Info) | semitechnologies/transformers-inference:sentence-transformers-paraphrase-multilingual-MiniLM-L12-v2 |
sentence-transformers/multi-qa-MiniLM-L6-cos-v1 (Info) | semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1 |
sentence-transformers/multi-qa-mpnet-base-cos-v1 (Info) | semitechnologies/transformers-inference:sentence-transformers-multi-qa-mpnet-base-cos-v1 |
sentence-transformers/all-mpnet-base-v2 (Info) | semitechnologies/transformers-inference:sentence-transformers-all-mpnet-base-v2 |
sentence-transformers/all-MiniLM-L12-v2 (Info) | semitechnologies/transformers-inference:sentence-transformers-all-MiniLM-L12-v2 |
sentence-transformers/paraphrase-multilingual-mpnet-base-v2 (Info) | semitechnologies/transformers-inference:sentence-transformers-paraphrase-multilingual-mpnet-base-v2 |
sentence-transformers/all-MiniLM-L6-v2 (Info) | semitechnologies/transformers-inference:sentence-transformers-all-MiniLM-L6-v2 |
sentence-transformers/multi-qa-distilbert-cos-v1 (Info) | semitechnologies/transformers-inference:sentence-transformers-multi-qa-distilbert-cos-v1 |
sentence-transformers/gtr-t5-base (Info) | semitechnologies/transformers-inference:sentence-transformers-gtr-t5-base |
sentence-transformers/gtr-t5-large (Info) | semitechnologies/transformers-inference:sentence-transformers-gtr-t5-large |
google/flan-t5-base (Info) | semitechnologies/transformers-inference:google-flan-t5-base |
google/flan-t5-large (Info) | semitechnologies/transformers-inference:google-flan-t5-large |
BAAI/bge-small-en-v1.5 (Info) | semitechnologies/transformers-inference:baai-bge-small-en-v1.5 |
BAAI/bge-base-en-v1.5 (Info) | semitechnologies/transformers-inference:baai-bge-base-en-v1.5 |
DPR Models | |
facebook/dpr-ctx_encoder-single-nq-base (Info) | semitechnologies/transformers-inference:facebook-dpr-ctx_encoder-single-nq-base |
facebook/dpr-question_encoder-single-nq-base (Info) | semitechnologies/transformers-inference:facebook-dpr-question_encoder-single-nq-base |
vblagoje/dpr-ctx_encoder-single-lfqa-wiki (Info) | semitechnologies/transformers-inference:vblagoje-dpr-ctx_encoder-single-lfqa-wiki |
vblagoje/dpr-question_encoder-single-lfqa-wiki (Info) | semitechnologies/transformers-inference:vblagoje-dpr-question_encoder-single-lfqa-wiki |
Bar-Ilan University NLP Lab Models | |
biu-nlp/abstract-sim-sentence (Info) | semitechnologies/transformers-inference:biu-nlp-abstract-sim-sentence |
biu-nlp/abstract-sim-query (Info) | semitechnologies/transformers-inference:biu-nlp-abstract-sim-query |
ONNX-enabled images (CPU only)
We also provide ONNX-enabled images for some models. These images use ONNX Runtime for faster inference on CPUs. They are quantized for ARM64 and AMD64 (AVX2) hardware.
Look for the -onnx
suffix in the image name.
List of pre-built images
Model Name | Image Name |
---|---|
sentence-transformers/all-MiniLM-L6-v2 (Info) | semitechnologies/transformers-inference:sentence-transformers-all-MiniLM-L6-v2-onnx |
BAAI/bge-small-en-v1.5 (Info) | semitechnologies/transformers-inference:baai-bge-small-en-v1.5-onnx |
BAAI/bge-base-en-v1.5 (Info) | semitechnologies/transformers-inference:baai-bge-base-en-v1.5-onnx |
BAAI/bge-m3 (Info) | semitechnologies/transformers-inference:baai-bge-m3-onnx |
Is your preferred model missing?
If your preferred model is missing, please open an issue to ask us to include it. Alternatively, follow the steps below to build a custom image.
How to set the version
You can explicitly set the version through a suffix.
- Use
-1.0.0
to pin to a specific version. E.g.semitechnologies/transformers-inference:distilbert-base-uncased-1.0.0
will always use the version with git tag1.0.0
of thedistilbert-base-uncased
repository. - You can explicitly set
-latest
to always use the latest version, however this is the default behavior.
Build a model
To use a public model from the Hugging Face model hub, create a short, two-line Dockerfile to build the image. This example creates a custom image for the distilroberta-base
model.
Step 1: Create a Dockerfile
Create a new Dockerfile
called distilroberta.Dockerfile
. Add the following lines to distilroberta.Dockerfile
:
FROM semitechnologies/transformers-inference:custom
RUN MODEL_NAME=distilroberta-base ./download.py
Step 2: Build and tag your Dockerfile.
Tag the Dockerfile as distilroberta-inference
:
docker build -f distilroberta.Dockerfile -t distilroberta-inference .
Step 3: Use the image
Push the image to a Docker registry or reference it locally in your Weaviate docker-compose.yml
using the Docker tag distilroberta-inference
.
Note: When using a custom image, you have the option of using the USE_SENTENCE_TRANSFORMERS_VECTORIZER
environment variable to use the sentence-transformer
vectorizer instead of the default vectorizer (from the transformers
library).
Use a private or local model
You can build a Docker image which supports any model which is compatible with Hugging Face's AutoModel
and AutoTokenizer
.
In the following example, we are going to build a custom image for a non-public model which we have locally stored at ./my-model
.
Create a new Dockerfile
(you do not need to clone this repository, any folder on your machine is fine), we will name it my-model.Dockerfile
. Add the following lines to it:
FROM semitechnologies/transformers-inference:custom
COPY ./my-model /app/models/model
The above will make sure that your model end ups in the image at /app/models/model
. This path is important, so that the application can find the model.
Now you just need to build and tag your Dockerfile, we will tag it as my-model-inference
:
docker build -f my-model.Dockerfile -t my-model-inference .
That's it! You can now push your image to your favorite registry or reference it locally in your Weaviate docker-compose.yml
using the Docker tag my-model-inference
.
To debug and test if your inference container is working correctly, you can send queries to the vectorizer module's inference container directly, so you can see exactly what vectors it would produce for which input.
To do so – you need to expose the inference container in your Docker Compose file – add something like this:
ports:
- "9090:8080"
to your text2vec-transformers
.
Then you can send REST requests to it directly, e.g.:
curl localhost:9090/vectors -H 'Content-Type: application/json' -d '{"text": "foo bar"}'
and it will print the created vector directly.
Usage
Example
- Python (v4)
- Python (v3)
- JavaScript/TypeScript
- Go
- Java
- Curl
- GraphQL
import weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Move
import os
client = weaviate.connect_to_local()
try:
publications = client.collections.get("Publication")
response = publications.query.near_text(
query="fashion",
distance=0.6,
move_to=Move(force=0.85, concepts="haute couture"),
move_away=Move(force=0.45, concepts="finance"),
return_metadata=wvc.query.MetadataQuery(distance=True),
limit=2
)
for o in response.objects:
print(o.properties)
print(o.metadata)
finally:
client.close()
import weaviate
client = weaviate.Client("http://localhost:8080")
nearText = {
"concepts": ["fashion"],
"distance": 0.6, # prior to v1.14 use "certainty" instead of "distance"
"moveAwayFrom": {
"concepts": ["finance"],
"force": 0.45
},
"moveTo": {
"concepts": ["haute couture"],
"force": 0.85
}
}
result = (
client.query
.get("Publication", "name")
.with_additional(["certainty OR distance"]) # note that certainty is only supported if distance==cosine
.with_near_text(nearText)
.do()
)
print(result)
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
scheme: 'http',
host: 'localhost:8080',
});
const response = await client.graphql
.get()
.withClassName('Publication')
.withFields('name _additional{certainty distance}') // note that certainty is only supported if distance==cosine
.withNearText({
concepts: ['fashion'],
distance: 0.6, // prior to v1.14 use certainty instead of distance
moveAwayFrom: {
concepts: ['finance'],
force: 0.45,
},
moveTo: {
concepts: ['haute couture'],
force: 0.85,
},
})
.do();
console.log(response);
package main
import (
"context"
"fmt"
"github.com/weaviate/weaviate-go-client/v4/weaviate"
"github.com/weaviate/weaviate-go-client/v4/weaviate/graphql"
)
func main() {
cfg := weaviate.Config{
Host: "localhost:8080",
Scheme: "http",
}
client, err := weaviate.NewClient(cfg)
if err != nil {
panic(err)
}
className := "Publication"
name := graphql.Field{Name: "name"}
_additional := graphql.Field{
Name: "_additional", Fields: []graphql.Field{
{Name: "certainty"}, // only supported if distance==cosine
{Name: "distance"}, // always supported
},
}
concepts := []string{"fashion"}
distance := float32(0.6)
moveAwayFrom := &graphql.MoveParameters{
Concepts: []string{"finance"},
Force: 0.45,
}
moveTo := &graphql.MoveParameters{
Concepts: []string{"haute couture"},
Force: 0.85,
}
nearText := client.GraphQL().NearTextArgBuilder().
WithConcepts(concepts).
WithDistance(distance). // use WithCertainty(certainty) prior to v1.14
WithMoveTo(moveTo).
WithMoveAwayFrom(moveAwayFrom)
ctx := context.Background()
result, err := client.GraphQL().Get().
WithClassName(className).
WithFields(name, _additional).
WithNearText(nearText).
Do(ctx)
if err != nil {
panic(err)
}
fmt.Printf("%v", result)
}
package io.weaviate;
import io.weaviate.client.Config;
import io.weaviate.client.WeaviateClient;
import io.weaviate.client.base.Result;
import io.weaviate.client.v1.graphql.model.GraphQLResponse;
import io.weaviate.client.v1.graphql.query.argument.NearTextArgument;
import io.weaviate.client.v1.graphql.query.argument.NearTextMoveParameters;
import io.weaviate.client.v1.graphql.query.fields.Field;
public class App {
public static void main(String[] args) {
Config config = new Config("http", "localhost:8080");
WeaviateClient client = new WeaviateClient(config);
NearTextMoveParameters moveTo = NearTextMoveParameters.builder()
.concepts(new String[]{ "haute couture" }).force(0.85f).build();
NearTextMoveParameters moveAway = NearTextMoveParameters.builder()
.concepts(new String[]{ "finance" }).force(0.45f)
.build();
NearTextArgument nearText = client.graphQL().arguments().nearTextArgBuilder()
.concepts(new String[]{ "fashion" })
.distance(0.6f) // use .certainty(0.7f) prior to v1.14
.moveTo(moveTo)
.moveAwayFrom(moveAway)
.build();
Field name = Field.builder().name("name").build();
Field _additional = Field.builder()
.name("_additional")
.fields(new Field[]{
Field.builder().name("certainty").build(), // only supported if distance==cosine
Field.builder().name("distance").build(), // always supported
}).build();
Result<GraphQLResponse> result = client.graphQL().get()
.withClassName("Publication")
.withFields(name, _additional)
.withNearText(nearText)
.run();
if (result.hasErrors()) {
System.out.println(result.getError());
return;
}
System.out.println(result.getResult());
}
}
# Note: Under nearText, use `certainty` instead of distance prior to v1.14
# Under _additional, `certainty` is only supported if distance==cosine, but `distance` is always supported
echo '{
"query": "{
Get {
Publication(
nearText: {
concepts: [\"fashion\"],
distance: 0.6,
moveAwayFrom: {
concepts: [\"finance\"],
force: 0.45
},
moveTo: {
concepts: [\"haute couture\"],
force: 0.85
}
}
) {
name
_additional {
certainty
distance
}
}
}
}"
}' | curl \
-X POST \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer learn-weaviate' \
-H "X-OpenAI-Api-Key: $OPENAI_API_KEY" \
-d @- \
https://edu-demo.weaviate.network/v1/graphql
{
Get{
Publication(
nearText: {
concepts: ["fashion"],
distance: 0.6 # prior to v1.14 use "certainty" instead of "distance"
moveAwayFrom: {
concepts: ["finance"],
force: 0.45
},
moveTo: {
concepts: ["haute couture"],
force: 0.85
}
}
){
name
_additional {
certainty # only supported if distance==cosine.
distance # always supported
}
}
}
}
Chunking
The text2vec-transformers
module can automatically chunk text based on the model's maximum token length before it is passed to the model. It will then return the pooled vectors.
See HuggingFaceVectorizer.vectorizer() for the exact implementation.
Model licenses
The text2vec-transformers
module is compatible with various models. Each of the models has its own license. For detailed information, please review the license for the model you are using in the Hugging Face Model Hub.
It is your responsibility to evaluate whether the terms of its license(s), if any, are appropriate for your intended use.
Release notes
For details see, t2v-transformers-model release notes.