Mixture-of-Agents Enhances Large Language Model Capabilities

June 29, 2024 · 2 min read

Developer Advocate

A preview of the paper

🤖Can multiple smaller open-source LLMs be combined to outperform larger monolithic LLMs?

New paper shows that LLMs tend to generate better responses when presented with outputs from other models, even if less capable.

They use this to build a Mixture of Agents(MoA) architecture where multiple LLMs are used to iteratively enhance the generation quality.

LLMs in deeper layers are presented responses from LLMs in earlier layers and iteratively improve the response; mitigates individual model deficiencies and enhance overall response.

MoA Architecture

The complete architecture consists of LLM agents playing one of two roles:

Proposers: These models generate initial reference responses.
Aggregators: These models synthesize the different responses from the proposers into one high-quality response.

Models used: Qwen1.5-110B-Chat, Qwen1.572B-Chat, WizardLM-8x22B, LLaMA-3-70B-Instruct, Mixtral-8x22B-v0.1, dbrx-instruct
3 MoA layers and use the same set of models in each MoA layer
Qwen1.5-110BChat as the aggregator in the last layer
Some models work better as proposers and others as both proposers and aggregators.

How do you choose which models to include in the MoA??

Two metrics are used to assess which models are included in the mixture:

Performance: The average win rate of models in layer i decides if they are included in layer i + 1.
Diversity: The diversity of model outputs is important - using heterogeneous models across layers is better then using the same model

Details:

MoA achieves a new SOTA win rate of 65.8% on AlpacaEval 2.0 compared to the previous best of 57.5% achieved by GPT-4 Omni.
Overall performance comparable to GPT-4 Turbo while being 2× more cost-effective.
No finetuning required only utilizes the interface of prompting and generation of LLMs.
Extends the MoE concept to the model level by operating at the model level rather than at the activation level.
You can swap the final aggregator to any LLM of your choice (Gemini, GPT-4o, Claude3.5) and this improves performance!