Preparation

Pre-requisites

This course is self-contained. However, we recommend that you go through one of the 101-level courses, such as that for working with text, your own vectors, or multimodal data.

This page briefly covers the required resources and setup, including the Weaviate Python client library, and a Weaviate instance with the multi-modal vectorizer.

Weaviate Python client library

Install the latest (v4, e.g. 4.5.0) Weaviate Python client library with:

pip install -U weaviate-client

Set up Weaviate

Install Docker on your machine. We recommend following the official Docker installation guide.

Create a new directory and navigate to it in your terminal. Then, create a new file called docker-compose.yml and add the following content:

---
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.31.4
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      CLIP_INFERENCE_API: 'http://multi2vec-clip:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'multi2vec-clip'
      ENABLE_API_BASED_MODULES: 'true'
      CLUSTER_HOSTNAME: 'node1'
  multi2vec-clip:
    image: cr.weaviate.io/semitechnologies/multi2vec-clip:sentence-transformers-clip-ViT-B-32-multilingual-v1
    environment:
      ENABLE_CUDA: '0'
volumes:
  weaviate_data:
...

Create a Weaviate instance

Run the following command to start Weaviate:

docker compose up

Your Weaviate instance details

Once the instance is created, you can access it at http://localhost:8080.

Work with Weaviate

Connect to your Weaviate instance

To connect to the Weaviate instance, use the connect_to_local function. We also provide API keys here for any inference APIs (e.g. OpenAI, Cohere, Google, AWS etc.) that Weaviate may use.

import weaviate
import os

headers = {
    "X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")
}  # Replace with your own API keys

client = weaviate.connect_to_local(headers=headers)

API docs

Check Weaviate status

You can check whether the Weaviate instance is up using the is_live function.

assert client.is_live()  # This will raise an exception if the client is not live

API docs

Retrieve server meta information

You can retrieve meta information about the Weaviate instance using the meta function.

import json

metainfo = client.get_meta()
print(json.dumps(metainfo, indent=2))  # Print the meta information in a readable format

API docs

This will print the server meta information to the console. The output will look similar to the following:

Example get_meta output

Note that this output is a little longer due to the additional details from the CLIP models.

{
  "hostname": "http://[::]:8080",
  "modules": {
    "generative-openai": {
      "documentationHref": "https://platform.openai.com/docs/api-reference/completions",
      "name": "Generative Search - OpenAI"
    },
    "multi2vec-clip": {
      "clip_model": {
        "_commit_hash": null,
        "_name_or_path": "/root/.cache/torch/sentence_transformers/sentence-transformers_clip-ViT-B-32/0_CLIPModel",
        "add_cross_attention": false,
        "architectures": [
          "CLIPModel"
        ],
        "bad_words_ids": null,
        "begin_suppress_tokens": null,
        "bos_token_id": null,
        "chunk_size_feed_forward": 0,
        "cross_attention_hidden_size": null,
        "decoder_start_token_id": null,
        "diversity_penalty": 0,
        "do_sample": false,
        "early_stopping": false,
        "encoder_no_repeat_ngram_size": 0,
        "eos_token_id": null,
        "exponential_decay_length_penalty": null,
        "finetuning_task": null,
        "forced_bos_token_id": null,
        "forced_eos_token_id": null,
        "id2label": {
          "0": "LABEL_0",
          "1": "LABEL_1"
        },
        "initializer_factor": 1,
        "is_decoder": false,
        "is_encoder_decoder": false,
        "label2id": {
          "LABEL_0": 0,
          "LABEL_1": 1
        },
        "length_penalty": 1,
        "logit_scale_init_value": 2.6592,
        "max_length": 20,
        "min_length": 0,
        "model_type": "clip",
        "no_repeat_ngram_size": 0,
        "num_beam_groups": 1,
        "num_beams": 1,
        "num_return_sequences": 1,
        "output_attentions": false,
        "output_hidden_states": false,
        "output_scores": false,
        "pad_token_id": null,
        "prefix": null,
        "problem_type": null,
        "projection_dim": 512,
        "pruned_heads": {},
        "remove_invalid_values": false,
        "repetition_penalty": 1,
        "return_dict": true,
        "return_dict_in_generate": false,
        "sep_token_id": null,
        "suppress_tokens": null,
        "task_specific_params": null,
        "temperature": 1,
        "text_config": {
          "_name_or_path": "",
          "add_cross_attention": false,
          "architectures": null,
          "attention_dropout": 0,
          "bad_words_ids": null,
          "begin_suppress_tokens": null,
          "bos_token_id": 0,
          "chunk_size_feed_forward": 0,
          "cross_attention_hidden_size": null,
          "decoder_start_token_id": null,
          "diversity_penalty": 0,
          "do_sample": false,
          "dropout": 0,
          "early_stopping": false,
          "encoder_no_repeat_ngram_size": 0,
          "eos_token_id": 2,
          "exponential_decay_length_penalty": null,
          "finetuning_task": null,
          "forced_bos_token_id": null,
          "forced_eos_token_id": null,
          "gradient_checkpointing": false,
          "hidden_act": "quick_gelu",
          "hidden_size": 512,
          "id2label": {
            "0": "LABEL_0",
            "1": "LABEL_1"
          },
          "initializer_factor": 1,
          "initializer_range": 0.02,
          "intermediate_size": 2048,
          "is_decoder": false,
          "is_encoder_decoder": false,
          "label2id": {
            "LABEL_0": 0,
            "LABEL_1": 1
          },
          "layer_norm_eps": 1e-05,
          "length_penalty": 1,
          "max_length": 20,
          "max_position_embeddings": 77,
          "min_length": 0,
          "model_type": "clip_text_model",
          "no_repeat_ngram_size": 0,
          "num_attention_heads": 8,
          "num_beam_groups": 1,
          "num_beams": 1,
          "num_hidden_layers": 12,
          "num_return_sequences": 1,
          "output_attentions": false,
          "output_hidden_states": false,
          "output_scores": false,
          "pad_token_id": 1,
          "prefix": null,
          "problem_type": null,
          "projection_dim": 512,
          "pruned_heads": {},
          "remove_invalid_values": false,
          "repetition_penalty": 1,
          "return_dict": true,
          "return_dict_in_generate": false,
          "sep_token_id": null,
          "suppress_tokens": null,
          "task_specific_params": null,
          "temperature": 1,
          "tf_legacy_loss": false,
          "tie_encoder_decoder": false,
          "tie_word_embeddings": true,
          "tokenizer_class": null,
          "top_k": 50,
          "top_p": 1,
          "torch_dtype": null,
          "torchscript": false,
          "transformers_version": "4.30.2",
          "typical_p": 1,
          "use_bfloat16": false,
          "vocab_size": 49408
        },
        "tf_legacy_loss": false,
        "tie_encoder_decoder": false,
        "tie_word_embeddings": true,
        "tokenizer_class": null,
        "top_k": 50,
        "top_p": 1,
        "torch_dtype": "torch.float32",
        "torchscript": false,
        "transformers_version": null,
        "typical_p": 1,
        "use_bfloat16": false,
        "vision_config": {
          "_name_or_path": "",
          "add_cross_attention": false,
          "architectures": null,
          "attention_dropout": 0,
          "bad_words_ids": null,
          "begin_suppress_tokens": null,
          "bos_token_id": null,
          "chunk_size_feed_forward": 0,
          "cross_attention_hidden_size": null,
          "decoder_start_token_id": null,
          "diversity_penalty": 0,
          "do_sample": false,
          "dropout": 0,
          "early_stopping": false,
          "encoder_no_repeat_ngram_size": 0,
          "eos_token_id": null,
          "exponential_decay_length_penalty": null,
          "finetuning_task": null,
          "forced_bos_token_id": null,
          "forced_eos_token_id": null,
          "gradient_checkpointing": false,
          "hidden_act": "quick_gelu",
          "hidden_size": 768,
          "id2label": {
            "0": "LABEL_0",
            "1": "LABEL_1"
          },
          "image_size": 224,
          "initializer_factor": 1,
          "initializer_range": 0.02,
          "intermediate_size": 3072,
          "is_decoder": false,
          "is_encoder_decoder": false,
          "label2id": {
            "LABEL_0": 0,
            "LABEL_1": 1
          },
          "layer_norm_eps": 1e-05,
          "length_penalty": 1,
          "max_length": 20,
          "min_length": 0,
          "model_type": "clip_vision_model",
          "no_repeat_ngram_size": 0,
          "num_attention_heads": 12,
          "num_beam_groups": 1,
          "num_beams": 1,
          "num_channels": 3,
          "num_hidden_layers": 12,
          "num_return_sequences": 1,
          "output_attentions": false,
          "output_hidden_states": false,
          "output_scores": false,
          "pad_token_id": null,
          "patch_size": 32,
          "prefix": null,
          "problem_type": null,
          "projection_dim": 512,
          "pruned_heads": {},
          "remove_invalid_values": false,
          "repetition_penalty": 1,
          "return_dict": true,
          "return_dict_in_generate": false,
          "sep_token_id": null,
          "suppress_tokens": null,
          "task_specific_params": null,
          "temperature": 1,
          "tf_legacy_loss": false,
          "tie_encoder_decoder": false,
          "tie_word_embeddings": true,
          "tokenizer_class": null,
          "top_k": 50,
          "top_p": 1,
          "torch_dtype": null,
          "torchscript": false,
          "transformers_version": "4.30.2",
          "typical_p": 1,
          "use_bfloat16": false
        }
      },
      "text_model": {
        "_commit_hash": null,
        "_name_or_path": "./models/text/0_CLIPModel",
        "add_cross_attention": false,
        "architectures": [
          "CLIPModel"
        ],
        "bad_words_ids": null,
        "begin_suppress_tokens": null,
        "bos_token_id": null,
        "chunk_size_feed_forward": 0,
        "cross_attention_hidden_size": null,
        "decoder_start_token_id": null,
        "diversity_penalty": 0,
        "do_sample": false,
        "early_stopping": false,
        "encoder_no_repeat_ngram_size": 0,
        "eos_token_id": null,
        "exponential_decay_length_penalty": null,
        "finetuning_task": null,
        "forced_bos_token_id": null,
        "forced_eos_token_id": null,
        "id2label": {
          "0": "LABEL_0",
          "1": "LABEL_1"
        },
        "initializer_factor": 1,
        "is_decoder": false,
        "is_encoder_decoder": false,
        "label2id": {
          "LABEL_0": 0,
          "LABEL_1": 1
        },
        "length_penalty": 1,
        "logit_scale_init_value": 2.6592,
        "max_length": 20,
        "min_length": 0,
        "model_type": "clip",
        "no_repeat_ngram_size": 0,
        "num_beam_groups": 1,
        "num_beams": 1,
        "num_return_sequences": 1,
        "output_attentions": false,
        "output_hidden_states": false,
        "output_scores": false,
        "pad_token_id": null,
        "prefix": null,
        "problem_type": null,
        "projection_dim": 512,
        "pruned_heads": {},
        "remove_invalid_values": false,
        "repetition_penalty": 1,
        "return_dict": true,
        "return_dict_in_generate": false,
        "sep_token_id": null,
        "suppress_tokens": null,
        "task_specific_params": null,
        "temperature": 1,
        "text_config": {
          "_name_or_path": "",
          "add_cross_attention": false,
          "architectures": null,
          "attention_dropout": 0,
          "bad_words_ids": null,
          "begin_suppress_tokens": null,
          "bos_token_id": 0,
          "chunk_size_feed_forward": 0,
          "cross_attention_hidden_size": null,
          "decoder_start_token_id": null,
          "diversity_penalty": 0,
          "do_sample": false,
          "dropout": 0,
          "early_stopping": false,
          "encoder_no_repeat_ngram_size": 0,
          "eos_token_id": 2,
          "exponential_decay_length_penalty": null,
          "finetuning_task": null,
          "forced_bos_token_id": null,
          "forced_eos_token_id": null,
          "gradient_checkpointing": false,
          "hidden_act": "quick_gelu",
          "hidden_size": 512,
          "id2label": {
            "0": "LABEL_0",
            "1": "LABEL_1"
          },
          "initializer_factor": 1,
          "initializer_range": 0.02,
          "intermediate_size": 2048,
          "is_decoder": false,
          "is_encoder_decoder": false,
          "label2id": {
            "LABEL_0": 0,
            "LABEL_1": 1
          },
          "layer_norm_eps": 1e-05,
          "length_penalty": 1,
          "max_length": 20,
          "max_position_embeddings": 77,
          "min_length": 0,
          "model_type": "clip_text_model",
          "no_repeat_ngram_size": 0,
          "num_attention_heads": 8,
          "num_beam_groups": 1,
          "num_beams": 1,
          "num_hidden_layers": 12,
          "num_return_sequences": 1,
          "output_attentions": false,
          "output_hidden_states": false,
          "output_scores": false,
          "pad_token_id": 1,
          "prefix": null,
          "problem_type": null,
          "projection_dim": 512,
          "pruned_heads": {},
          "remove_invalid_values": false,
          "repetition_penalty": 1,
          "return_dict": true,
          "return_dict_in_generate": false,
          "sep_token_id": null,
          "suppress_tokens": null,
          "task_specific_params": null,
          "temperature": 1,
          "tf_legacy_loss": false,
          "tie_encoder_decoder": false,
          "tie_word_embeddings": true,
          "tokenizer_class": null,
          "top_k": 50,
          "top_p": 1,
          "torch_dtype": null,
          "torchscript": false,
          "transformers_version": "4.30.2",
          "typical_p": 1,
          "use_bfloat16": false,
          "vocab_size": 49408
        },
        "tf_legacy_loss": false,
        "tie_encoder_decoder": false,
        "tie_word_embeddings": true,
        "tokenizer_class": null,
        "top_k": 50,
        "top_p": 1,
        "torch_dtype": "torch.float32",
        "torchscript": false,
        "transformers_version": null,
        "typical_p": 1,
        "use_bfloat16": false,
        "vision_config": {
          "_name_or_path": "",
          "add_cross_attention": false,
          "architectures": null,
          "attention_dropout": 0,
          "bad_words_ids": null,
          "begin_suppress_tokens": null,
          "bos_token_id": null,
          "chunk_size_feed_forward": 0,
          "cross_attention_hidden_size": null,
          "decoder_start_token_id": null,
          "diversity_penalty": 0,
          "do_sample": false,
          "dropout": 0,
          "early_stopping": false,
          "encoder_no_repeat_ngram_size": 0,
          "eos_token_id": null,
          "exponential_decay_length_penalty": null,
          "finetuning_task": null,
          "forced_bos_token_id": null,
          "forced_eos_token_id": null,
          "gradient_checkpointing": false,
          "hidden_act": "quick_gelu",
          "hidden_size": 768,
          "id2label": {
            "0": "LABEL_0",
            "1": "LABEL_1"
          },
          "image_size": 224,
          "initializer_factor": 1,
          "initializer_range": 0.02,
          "intermediate_size": 3072,
          "is_decoder": false,
          "is_encoder_decoder": false,
          "label2id": {
            "LABEL_0": 0,
            "LABEL_1": 1
          },
          "layer_norm_eps": 1e-05,
          "length_penalty": 1,
          "max_length": 20,
          "min_length": 0,
          "model_type": "clip_vision_model",
          "no_repeat_ngram_size": 0,
          "num_attention_heads": 12,
          "num_beam_groups": 1,
          "num_beams": 1,
          "num_channels": 3,
          "num_hidden_layers": 12,
          "num_return_sequences": 1,
          "output_attentions": false,
          "output_hidden_states": false,
          "output_scores": false,
          "pad_token_id": null,
          "patch_size": 32,
          "prefix": null,
          "problem_type": null,
          "projection_dim": 512,
          "pruned_heads": {},
          "remove_invalid_values": false,
          "repetition_penalty": 1,
          "return_dict": true,
          "return_dict_in_generate": false,
          "sep_token_id": null,
          "suppress_tokens": null,
          "task_specific_params": null,
          "temperature": 1,
          "tf_legacy_loss": false,
          "tie_encoder_decoder": false,
          "tie_word_embeddings": true,
          "tokenizer_class": null,
          "top_k": 50,
          "top_p": 1,
          "torch_dtype": null,
          "torchscript": false,
          "transformers_version": "4.30.2",
          "typical_p": 1,
          "use_bfloat16": false
        }
      }
    },
    "text2vec-openai": {
      "documentationHref": "https://platform.openai.com/docs/guides/embeddings/what-are-embeddings",
      "name": "OpenAI Module"
    },
  },
  "version": "1.23.9"
}

API docs

Close the connection

After you have finished using the Weaviate client, you should close the connection. This frees up resources and ensures that the connection is properly closed.

We suggest using a try-finally block as a best practice. For brevity, we will not include the try-finally blocks in the remaining code snippets.

import weaviate
import os

# Instantiate your client (not shown). e.g.:
# client = weaviate.connect_to_local()

try:
    # Work with the client here - e.g.:
    assert client.is_live()
    pass

finally:  # This will always be executed, even if an exception is raised
    client.close()  # Close the connection & release resources

API docs

Source data

We are going to use a movie dataset sourced from TMDB. The dataset can be found in this GitHub repository, and it contains bibliographic information on ~700 movies released between 1990 and 2024.

As a multimodal project, we'll also use corresponding posters for each movie, which are available in the same repository.

See sample text data

	backdrop_path	genre_ids	id	original_language	original_title	overview	popularity	poster_path	release_date	title	video	vote_average	vote_count
0	/3Nn5BOM1EVw1IYrv6MsbOS6N1Ol.jpg	[14, 18, 10749]	162	en	Edward Scissorhands	A small suburban town receives a visit from a castaway unfinished science experiment named Edward.	45.694	/1RFIbuW9Z3eN9Oxw2KaQG5DfLmD.jpg	1990-12-07	Edward Scissorhands	False	7.7	12305
1	/sw7mordbZxgITU877yTpZCud90M.jpg	[18, 80]	769	en	GoodFellas	The true story of Henry Hill, a half-Irish, half-Sicilian Brooklyn kid who is adopted by neighbourhood gangsters at an early age and climbs the ranks of a Mafia family under the guidance of Jimmy Conway.	57.228	/aKuFiU82s5ISJpGZp7YkIr3kCUd.jpg	1990-09-12	GoodFellas	False	8.5	12106
2	/6uLhSLXzB1ooJ3522ydrBZ2Hh0W.jpg	[35, 10751]	771	en	Home Alone	Eight-year-old Kevin McCallister makes the most of the situation after his family unwittingly leaves him behind when they go on Christmas vacation. But when a pair of bungling burglars set their sights on Kevin's house, the plucky kid stands ready to defend his territory. By planting booby traps galore, adorably mischievous Kevin stands his ground as his frantic mother attempts to race home before Christmas Day.	3.538	/onTSipZ8R3bliBdKfPtsDuHTdlL.jpg	1990-11-16	Home Alone	False	7.4	10599
3	/vKp3NvqBkcjHkCHSGi6EbcP7g4J.jpg	[12, 35, 878]	196	en	Back to the Future Part III	The final installment of the Back to the Future trilogy finds Marty digging the trusty DeLorean out of a mineshaft and looking for Doc in the Wild West of 1885. But when their time machine breaks down, the travelers are stranded in a land of spurs. More problems arise when Doc falls for pretty schoolteacher Clara Clayton, and Marty tangles with Buford Tannen.	28.896	/crzoVQnMzIrRfHtQw0tLBirNfVg.jpg	1990-05-25	Back to the Future Part III	False	7.5	9918
4	/3tuWpnCTe14zZZPt6sI1W9ByOXx.jpg	[35, 10749]	114	en	Pretty Woman	When a millionaire wheeler-dealer enters a business contract with a Hollywood hooker Vivian Ward, he loses his heart in the bargain.	97.953	/hVHUfT801LQATGd26VPzhorIYza.jpg	1990-03-23	Pretty Woman	False	7.5	7671

Questions and feedback

If you have any questions or feedback, let us know in the user forum.

Weaviate Python client library​

Set up Weaviate​

Create a Weaviate instance​

Your Weaviate instance details​

Work with Weaviate​

Connect to your Weaviate instance​

Check Weaviate status​

Retrieve server meta information​

Close the connection​

Source data​

Questions and feedback​

Weaviate Python client library

Set up Weaviate

Create a Weaviate instance

Your Weaviate instance details

Work with Weaviate

Connect to your Weaviate instance

Check Weaviate status

Retrieve server meta information

Close the connection

Source data

Questions and feedback