Skip to main content

Preparation

Pre-requisites

This course is self-contained. However, we recommend that you go through one of the 101-level courses, such as that for working with text, your own vectors, or multimodal data.

This page briefly covers the required resources and setup, including the Weaviate Python client library, and a Weaviate instance with the multi-modal vectorizer.

Weaviate Python client library

Install the latest (v4, e.g. 4.5.0) Weaviate Python client library with:

pip install -U weaviate-client

Set up Weaviate

Install Docker on your machine. We recommend following the official Docker installation guide.

Create a new directory and navigate to it in your terminal. Then, create a new file called docker-compose.yml and add the following content:

---
version: '3.4'
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.25.8
ports:
- 8080:8080
- 50051:50051
volumes:
- weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
CLIP_INFERENCE_API: 'http://multi2vec-clip:8080'
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'multi2vec-clip'
ENABLE_MODULES: 'multi2vec-clip,text2vec-openai,generative-openai'
CLUSTER_HOSTNAME: 'node1'
multi2vec-clip:
image: cr.weaviate.io/semitechnologies/multi2vec-clip:sentence-transformers-clip-ViT-B-32-multilingual-v1
environment:
ENABLE_CUDA: '0'
volumes:
weaviate_data:
...

Create a Weaviate instance

Run the following command to start Weaviate:

docker compose up

Your Weaviate instance details

Once the instance is created, you can access it at http://localhost:8080.

Work with Weaviate

Connect to your Weaviate instance

To connect to the Weaviate instance, use the connect_to_local function. We also provide API keys here for any inference APIs (e.g. OpenAI, Cohere, Google, AWS etc.) that Weaviate may use.

import weaviate
import os

headers = {
"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")
} # Replace with your own API keys

client = weaviate.connect_to_local(headers=headers)

Check Weaviate status

You can check whether the Weaviate instance is up using the is_live function.

assert client.is_live()  # This will raise an exception if the client is not live

Retrieve server meta information

You can retrieve meta information about the Weaviate instance using the meta function.

import json

metainfo = client.get_meta()
print(json.dumps(metainfo, indent=2)) # Print the meta information in a readable format

This will print the server meta information to the console. The output will look similar to the following:

Example get_meta output

Note that this output is a little longer due to the additional details from the CLIP models.

{
"hostname": "http://[::]:8080",
"modules": {
"generative-openai": {
"documentationHref": "https://platform.openai.com/docs/api-reference/completions",
"name": "Generative Search - OpenAI"
},
"multi2vec-clip": {
"clip_model": {
"_commit_hash": null,
"_name_or_path": "/root/.cache/torch/sentence_transformers/sentence-transformers_clip-ViT-B-32/0_CLIPModel",
"add_cross_attention": false,
"architectures": [
"CLIPModel"
],
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0,
"do_sample": false,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_factor": 1,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1,
"logit_scale_init_value": 2.6592,
"max_length": 20,
"min_length": 0,
"model_type": "clip",
"no_repeat_ngram_size": 0,
"num_beam_groups": 1,
"num_beams": 1,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"projection_dim": 512,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1,
"text_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": 0,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0,
"do_sample": false,
"dropout": 0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 2,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"gradient_checkpointing": false,
"hidden_act": "quick_gelu",
"hidden_size": 512,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_factor": 1,
"initializer_range": 0.02,
"intermediate_size": 2048,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1,
"max_length": 20,
"max_position_embeddings": 77,
"min_length": 0,
"model_type": "clip_text_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 8,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": 1,
"prefix": null,
"problem_type": null,
"projection_dim": 512,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.30.2",
"typical_p": 1,
"use_bfloat16": false,
"vocab_size": 49408
},
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1,
"torch_dtype": "torch.float32",
"torchscript": false,
"transformers_version": null,
"typical_p": 1,
"use_bfloat16": false,
"vision_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0,
"do_sample": false,
"dropout": 0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"gradient_checkpointing": false,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"image_size": 224,
"initializer_factor": 1,
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1,
"max_length": 20,
"min_length": 0,
"model_type": "clip_vision_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 12,
"num_beam_groups": 1,
"num_beams": 1,
"num_channels": 3,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"patch_size": 32,
"prefix": null,
"problem_type": null,
"projection_dim": 512,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.30.2",
"typical_p": 1,
"use_bfloat16": false
}
},
"text_model": {
"_commit_hash": null,
"_name_or_path": "./models/text/0_CLIPModel",
"add_cross_attention": false,
"architectures": [
"CLIPModel"
],
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0,
"do_sample": false,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_factor": 1,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1,
"logit_scale_init_value": 2.6592,
"max_length": 20,
"min_length": 0,
"model_type": "clip",
"no_repeat_ngram_size": 0,
"num_beam_groups": 1,
"num_beams": 1,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"projection_dim": 512,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1,
"text_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": 0,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0,
"do_sample": false,
"dropout": 0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 2,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"gradient_checkpointing": false,
"hidden_act": "quick_gelu",
"hidden_size": 512,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_factor": 1,
"initializer_range": 0.02,
"intermediate_size": 2048,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1,
"max_length": 20,
"max_position_embeddings": 77,
"min_length": 0,
"model_type": "clip_text_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 8,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": 1,
"prefix": null,
"problem_type": null,
"projection_dim": 512,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.30.2",
"typical_p": 1,
"use_bfloat16": false,
"vocab_size": 49408
},
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1,
"torch_dtype": "torch.float32",
"torchscript": false,
"transformers_version": null,
"typical_p": 1,
"use_bfloat16": false,
"vision_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0,
"do_sample": false,
"dropout": 0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"gradient_checkpointing": false,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"image_size": 224,
"initializer_factor": 1,
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1,
"max_length": 20,
"min_length": 0,
"model_type": "clip_vision_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 12,
"num_beam_groups": 1,
"num_beams": 1,
"num_channels": 3,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"patch_size": 32,
"prefix": null,
"problem_type": null,
"projection_dim": 512,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.30.2",
"typical_p": 1,
"use_bfloat16": false
}
}
},
"text2vec-openai": {
"documentationHref": "https://platform.openai.com/docs/guides/embeddings/what-are-embeddings",
"name": "OpenAI Module"
},
},
"version": "1.23.9"
}

Close the connection

After you have finished using the Weaviate client, you should close the connection. This frees up resources and ensures that the connection is properly closed.

We suggest using a try-finally block as a best practice. For brevity, we will not include the try-finally blocks in the remaining code snippets.

import weaviate
import os

# Instantiate your client (not shown). e.g.:
# client = weaviate.connect_to_local()

try:
# Work with the client here - e.g.:
assert client.is_live()
pass

finally: # This will always be executed, even if an exception is raised
client.close() # Close the connection & release resources

Source data

We are going to use a movie dataset sourced from TMDB. The dataset can be found in this GitHub repository, and it contains bibliographic information on ~700 movies released between 1990 and 2024.

As a multimodal project, we'll also use corresponding posters for each movie, which are available in the same repository.

See sample text data
backdrop_pathgenre_idsidoriginal_languageoriginal_titleoverviewpopularityposter_pathrelease_datetitlevideovote_averagevote_count
0/3Nn5BOM1EVw1IYrv6MsbOS6N1Ol.jpg[14, 18, 10749]162enEdward ScissorhandsA small suburban town receives a visit from a castaway unfinished science experiment named Edward.45.694/1RFIbuW9Z3eN9Oxw2KaQG5DfLmD.jpg1990-12-07Edward ScissorhandsFalse7.712305
1/sw7mordbZxgITU877yTpZCud90M.jpg[18, 80]769enGoodFellasThe true story of Henry Hill, a half-Irish, half-Sicilian Brooklyn kid who is adopted by neighbourhood gangsters at an early age and climbs the ranks of a Mafia family under the guidance of Jimmy Conway.57.228/aKuFiU82s5ISJpGZp7YkIr3kCUd.jpg1990-09-12GoodFellasFalse8.512106
2/6uLhSLXzB1ooJ3522ydrBZ2Hh0W.jpg[35, 10751]771enHome AloneEight-year-old Kevin McCallister makes the most of the situation after his family unwittingly leaves him behind when they go on Christmas vacation. But when a pair of bungling burglars set their sights on Kevin's house, the plucky kid stands ready to defend his territory. By planting booby traps galore, adorably mischievous Kevin stands his ground as his frantic mother attempts to race home before Christmas Day.3.538/onTSipZ8R3bliBdKfPtsDuHTdlL.jpg1990-11-16Home AloneFalse7.410599
3/vKp3NvqBkcjHkCHSGi6EbcP7g4J.jpg[12, 35, 878]196enBack to the Future Part IIIThe final installment of the Back to the Future trilogy finds Marty digging the trusty DeLorean out of a mineshaft and looking for Doc in the Wild West of 1885. But when their time machine breaks down, the travelers are stranded in a land of spurs. More problems arise when Doc falls for pretty schoolteacher Clara Clayton, and Marty tangles with Buford Tannen.28.896/crzoVQnMzIrRfHtQw0tLBirNfVg.jpg1990-05-25Back to the Future Part IIIFalse7.59918
4/3tuWpnCTe14zZZPt6sI1W9ByOXx.jpg[35, 10749]114enPretty WomanWhen a millionaire wheeler-dealer enters a business contract with a Hollywood hooker Vivian Ward, he loses his heart in the bargain.97.953/hVHUfT801LQATGd26VPzhorIYza.jpg1990-03-23Pretty WomanFalse7.57671

Questions and feedback

If you have any questions or feedback, let us know in the user forum.