Mistral 8x7B | Notion

Mistral AI released a Mixture of Experts model.
It's like having 8 Mistral 7B models combined together.
What is a Mixture of Experts?
- A simple LLM takes in an input, passes it to an expert who will respond with an output.
- With a Mixture of Experts, you have an input that goes into a Gating layer.
- The Gating layer decides which expert should answer the question.
- The other experts are not contributing.
- Each expert is specialized in different tasks.
Mistral 8x7B has 8 different 7B parameter experts.
When training, you are training each expert to be good at a specific task. And you are training the gating mechanism to do a better job at figuring out what the input is and directing the input to the correct model.
It is possible to have a Mixture of Experts model where the gating mechanism passes the prompt to multiple experts and there is another layer that combines the answers.
Rumor has it that GPT4 is a Mixture of Experts model.
OpenMOE - an open source MoE community that releases a family of open-sourced Mixture of Experts large language models.

Give it a try:

https://huggingface.co/DiscoResearch/mixtral-7b-8expert

Papers:

Tweeted:

https://x.com/waseemhnyc/status/1734599465330438277?s=20