Multimodal search
With Weaviate, you can perform semantic searches to find similar items based on their meaning. This is done by comparing the vector embeddings of the items in the database.
As we are using a multimodal model, we can search for objects based on their similarity to any of the supported modalities. Meaning that we can search for movies based on their similarity to a text or an image.
Image query
Code
This example finds entries in "MovieMM" based on their similarity to this image of the International Space Station, and prints out the title and release year of the top 5 matches.
Query image
import weaviate
import weaviate.classes.query as wq
import os
# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")} # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)
def url_to_base64(url):
import requests
import base64
image_response = requests.get(url)
content = image_response.content
return base64.b64encode(content).decode("utf-8")
# Get the collection
movies = client.collections.get("MovieMM")
# Perform query
src_img_path = "https://github.com/weaviate-tutorials/edu-datasets/blob/main/img/International_Space_Station_after_undocking_of_STS-132.jpg?raw=true"
query_b64 = url_to_base64(src_img_path)
response = movies.query.near_image(
near_image=query_b64,
limit=5,
return_metadata=wq.MetadataQuery(distance=True),
return_properties=["title", "release_date", "tmdb_id", "poster"] # To include the poster property in the response (`blob` properties are not returned by default)
)
# Inspect the response
for o in response.objects:
print(
o.properties["title"], o.properties["release_date"].year, o.properties["tmdb_id"]
) # Print the title and release year (note the release date is a datetime object)
print(
f"Distance to query: {o.metadata.distance:.3f}\n"
) # Print the distance of the object from the query
client.close()
Explain the code
The results are based on similarity of the vector embeddings between the query and the database object. In this case, the vectorizer module generates an embedding of the input image.
The limit
parameter here sets the maximum number of results to return.
The return_metadata
parameter takes an instance of the MetadataQuery
class to set metadata to return in the search results. The current query returns the vector distance to the query.
Note that the results are very similar to the tone of the query image, as the top results are all space-themed movies.
Example results
Posters for the top 5 matches:
Weaviate output:
Interstellar 2014 157336
Distance to query: 0.354
Gravity 2013 49047
Distance to query: 0.384
Arrival 2016 329865
Distance to query: 0.386
Armageddon 1998 95
Distance to query: 0.400
Godzilla 1998 929
Distance to query: 0.441
Response object
The returned object is an instance of a custom class. Its objects
attribute is a list of search results, each object being an instance of another custom class.
Each returned object will:
- Include all properties and its UUID by default except those with blob data types.
- Since the
poster
property is a blob, it is not included by default. - To include the
poster
property, you must specify it and the other properties to fetch in thereturn_properties
parameter.
- Since the
- Not include any other information (e.g. references, metadata, vectors.) by default.
Text search
Code
This example finds entries in "MovieMM" based on their similarity to the query "red", and prints out the title and release year of the top 5 matches.
import weaviate
import weaviate.classes.query as wq
import os
# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")} # Replace with your OpenAI API key
# client = weaviate.connect_to_local(headers=headers)
# Get the collection
movies = client.collections.get("MovieMM")
# Perform query
response = movies.query.near_text(
query="red",
limit=5,
return_metadata=wq.MetadataQuery(distance=True),
return_properties=["title", "release_date", "tmdb_id", "poster"] # To include the poster property in the response (`blob` properties are not returned by default)
)
# Inspect the response
for o in response.objects:
print(
o.properties["title"], o.properties["release_date"].year, o.properties["tmdb_id"]
) # Print the title and release year (note the release date is a datetime object)
print(
f"Distance to query: {o.metadata.distance:.3f}\n"
) # Print the distance of the object from the query
client.close()
Explain the code
The results are based on similarity of the vector embeddings between the query and the database object. In this case, the vectorizer module generates an embedding of the input text.
The remaining parameters are the same as in the previous example.
Note that the results actually include movies with red color themes in its poster. This is because the CLIP vectorizer encodes the color information of the image in the vectors.
Example results
Posters for the top 5 matches:
Weaviate output:
Deadpool 2 2018 383498
Distance to query: 0.670
Bloodshot 2020 338762
Distance to query: 0.677
Deadpool 2016 293660
Distance to query: 0.678
300 2007 1271
Distance to query: 0.682
The Hunt for Red October 1990 1669
Distance to query: 0.683
Response object
The returned object is in the same format as in the previous example.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.