LLMs Explained: Mistral 7B
Mistral, or Mistral AI specifically, surprised all of us with an amazing solution in 2023, showing that 34B parameter models can be outperformed with shocking 7B parameter LLMs🤯
Mistral 7B outperforms the previous Llama 2 13B across all tested tasks. Additionally, Mistral 7B outperforms LLaMa 34B in mathematics and code generation while being 5x smaller. Mistral 7B is a small and extremely fast LLM that can perform on par with SOTA.
The key 🔑? It adopts Llama 2 but introduces Sliding Window Attention and Rolling Buffer Cache to speed up computations and reduce the cache memory usage while preserving the high model quality.
Mistral 7B takes a significant step in balancing the goals of getting high performance while keeping large language models efficient.
Architecture
First, I need to mention that Mistral 7B is based on Llama 2 implementation, please check out my last blog to introduce yourself to the architecture of Llama 2