Multimodal RAG
Multimodal Contrastive Finetuning
Multimodal Embeddings Models produce a joint embedding space for multimodal data that understands text, images, audio and more. Objects that are similar are closer together and dissimilar objects are farther apart, this means that the model preserves semantic similarity within and across modalities.
3 of 5