Multimodal search

With Weaviate, you can perform semantic searches to find similar items based on their meaning. This is done by comparing the vector embeddings of the items in the database.

As we are using a multimodal model, we can search for objects based on their similarity to any of the supported modalities. Meaning that we can search for movies based on their similarity to a text or an image.

Image query

Code

This example finds entries in "MovieMM" based on their similarity to this image of the International Space Station, and prints out the title and release year of the top 5 matches.

Query image

import weaviate
import weaviate.classes.query as wq
import os


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}  # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)


def url_to_base64(url):
    import requests
    import base64

    image_response = requests.get(url)
    content = image_response.content
    return base64.b64encode(content).decode("utf-8")


# Get the collection
movies = client.collections.get("MovieMM")

# Perform query
src_img_path = "https://github.com/weaviate-tutorials/edu-datasets/blob/main/img/International_Space_Station_after_undocking_of_STS-132.jpg?raw=true"
query_b64 = url_to_base64(src_img_path)

response = movies.query.near_image(
    near_image=query_b64,
    limit=5,
    return_metadata=wq.MetadataQuery(distance=True),
    return_properties=["title", "release_date", "tmdb_id", "poster"]  # To include the poster property in the response (`blob` properties are not returned by default)
)

# Inspect the response
for o in response.objects:
    print(
        o.properties["title"], o.properties["release_date"].year, o.properties["tmdb_id"]
    )  # Print the title and release year (note the release date is a datetime object)
    print(
        f"Distance to query: {o.metadata.distance:.3f}\n"
    )  # Print the distance of the object from the query

client.close()

API docs

Explain the code

The results are based on similarity of the vector embeddings between the query and the database object. In this case, the vectorizer module generates an embedding of the input image.

The limit parameter here sets the maximum number of results to return.

The return_metadata parameter takes an instance of the MetadataQuery class to set metadata to return in the search results. The current query returns the vector distance to the query.

Note that the results are very similar to the tone of the query image, as the top results are all space-themed movies.

Example results

Posters for the top 5 matches:

Weaviate output:

Interstellar 2014 157336
Distance to query: 0.354

Gravity 2013 49047
Distance to query: 0.384

Arrival 2016 329865
Distance to query: 0.386

Armageddon 1998 95
Distance to query: 0.400

Godzilla 1998 929
Distance to query: 0.441

Response object

The returned object is an instance of a custom class. Its objects attribute is a list of search results, each object being an instance of another custom class.

Each returned object will:

Include all properties and its UUID by default except those with blob data types.
- Since the poster property is a blob, it is not included by default.
- To include the poster property, you must specify it and the other properties to fetch in the return_properties parameter.
Not include any other information (e.g. references, metadata, vectors.) by default.

Text search

Code

This example finds entries in "MovieMM" based on their similarity to the query "red", and prints out the title and release year of the top 5 matches.

import weaviate
import weaviate.classes.query as wq
import os


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}  # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)

# Get the collection
movies = client.collections.get("MovieMM")

# Perform query
response = movies.query.near_text(
    query="red",
    limit=5,
    return_metadata=wq.MetadataQuery(distance=True),
    return_properties=["title", "release_date", "tmdb_id", "poster"]  # To include the poster property in the response (`blob` properties are not returned by default)
)

# Inspect the response
for o in response.objects:
    print(
        o.properties["title"], o.properties["release_date"].year, o.properties["tmdb_id"]
    )  # Print the title and release year (note the release date is a datetime object)
    print(
        f"Distance to query: {o.metadata.distance:.3f}\n"
    )  # Print the distance of the object from the query

client.close()

API docs

Explain the code

The results are based on similarity of the vector embeddings between the query and the database object. In this case, the vectorizer module generates an embedding of the input text.

The remaining parameters are the same as in the previous example.

Note that the results actually include movies with red color themes in its poster. This is because the CLIP vectorizer encodes the color information of the image in the vectors.

Example results

Posters for the top 5 matches:

Weaviate output:

Deadpool 2 2018 383498
Distance to query: 0.670

Bloodshot 2020 338762
Distance to query: 0.677

Deadpool 2016 293660
Distance to query: 0.678

300 2007 1271
Distance to query: 0.682

The Hunt for Red October 1990 1669
Distance to query: 0.683

Response object

The returned object is in the same format as in the previous example.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.

Image query​

Code​

Explain the code​

Response object​

Text search​

Code​

Explain the code​

Response object​

Questions and feedback​

Image query

Code

Explain the code

Response object

Text search

Code

Explain the code

Response object

Questions and feedback