LLMs Explained: Yi-6/34B and Yi-Vision (Yi-VL) by 01.AI

Ching (Chingis)
4 min readJan 28, 2024

Big for LLM space 🚀

Last year 01.AI, a Chinese tech unicorn founded by AI luminary Kai-Fu Lee, introduced the bilingual (Chinese/English) Yi-34B and Yi-6B models, outperforming many SOTA LLMs (such as LLaMA2-chat-70B, Claude 2, and ChatGPT )and setting new baselines in langauge tasks.

01.AI aims to democratize AI innovation, providing open access for academic research while requiring permissions for free commercial use.

Is it a Chinese version of Mistral AI? there’s a catch.

The key? 🔑 the authors state that although the architecture is based on LLaMA, the key to magical performance is in its training procedure, which is (unfortunately) not publicly available in details.

However, it does not make Yi less valuable because it’s definitely something to keep an eye on 👀 Because they recently announced Yi Vision Language Model, which also shows superior performance on multiple tasks.

Therefore, let’s dive into details that are available to us and learn what Yi is.

TLDR: Yi

The Yi series models adopt the same model architecture as LLaMA but are NOT derivatives of LLaMA.

  • They do not use LLaMA’s weights yet Yi models are based on LLaMA architecture-wise.
  • Yi has independently created its own high-quality training datasets, efficient training pipelines, and robust training infrastructure entirely from the ground up.
  • Both Yi 6/34B were trained on 3 trillion tokens, covering only Chinese and English.

Results:

  • For English language capability, the Yi ranked 2nd (just behind GPT-4), outperforming other LLMs…

--

--

Ching (Chingis)

I am a passionate student. I enjoy studying and sharing my knowledge. Follow me/Connect with me and join my journey.