Skip to main content

Long-Context Retrieval Models with Monarch Mixer

· 2 min read
Zain Hasan

A preview of the paper

A breakdown of the Long Context Retrieval Embedding Models from Stanford!💥

In Short⏩:

  1. They release 3 long context(2k/8k/32k) BERT-like encoder embedding models on HuggingFace

  2. The models are only 80M params and outperform MUCH larger models (4-85x larger)

  3. Accessible via @togethercompute endpoints and integrated into @llama_index and @LangChainAI

  4. They also release LoCo a long context retrieval benchmark.

🏗️Architechtural Details:

  1. They replace the Attention and MLP blocks in the transformer architecture with diagonal block matrix (Monarch Matrices -M2) operations which are hardware optimized and subquadratic in the sequence length - O(N^(1.5))

  2. This enables scaling sequence length and model parameters better.

🪃Training Details:

  1. These M2 models are trained for long context retrieval on a mixture of long and short context tasks data - surprisingly only training on long context doesn't work.

  2. Use a cosine similarity loss instead of the trusty supervised contrastive training loss.

    This loss function. can be computed independently per datapoint in a batch instead of needing to sum over all negative examples in a batch.

    Thus training can be scaled for large batch sizes of long context inputs without OOM'ing

📜Blog

🧑‍💻Code

🔷Models

🔗 arXiv Link

📜 Download paper

Ready to start building?

Check out the Quickstart tutorial, and begin building amazing apps with the free trial of Weaviate Cloud (WCD).

Don't want to miss another blog post?

Sign up for our bi-weekly newsletter to stay updated!


By submitting, I agree to the Terms of Service and Privacy Policy.