- Mistral AI released a Mixture of Experts model.
- It's like having 8 Mistral 7B models combined together.
- What is a Mixture of Experts?
- A simple LLM takes in an input, passes it to an expert who will respond with an output.
- With a Mixture of Experts, you have an input that goes into a Gating layer.
- The Gating layer decides which expert should answer the question.
- The other experts are not contributing.
- Each expert is specialized in different tasks.
- Mistral 8x7B has 8 different 7B parameter experts.
- When training, you are training each expert to be good at a specific task. And you are training the gating mechanism to do a better job at figuring out what the input is and directing the input to the correct model.
- It is possible to have a Mixture of Experts model where the gating mechanism passes the prompt to multiple experts and there is another layer that combines the answers.
- Rumor has it that GPT4 is a Mixture of Experts model.
- OpenMOE - an open source MoE community that releases a family of open-sourced Mixture of Experts large language models.
Give it a try:
https://huggingface.co/DiscoResearch/mixtral-7b-8expert
https://sdk.vercel.ai/
Papers:
Learning Factored Representations in a Deep Mixture of Experts
Tweeted:
https://x.com/waseemhnyc/status/1734599465330438277?s=20