LLMs Explained: LLaMA and Its Architecture (Part 1)

Ching (Chingis)
4 min readNov 5, 2023

Meta’s large language model took the AI research world by storm in February — followed by the commercial Llama 2 in July and Code Llama in August. But with the introduction of LLaMA, the first major free ‘open source’ LLM, open-source AI began to have a moment 😵‍💫.

According to Meta, the open-source AI community has fine-tuned and released over 7,000 LLaMA derivatives on the Hugging Face platform since the model’s release 🚀.

Let’s delve deep into the workings of the groundbreaking LLaMA, a beacon of open-source AI 👀. This blog post will elucidate the technical design of LLaMA and highlight the key differences that set it apart from its counterparts 🔎.

Pre-training Data

The training dataset is a mixture of several sources to cover a diverse set of domains:

Paper: https://arxiv.org/pdf/2302.13971.pdf

Comments: all data is open-source 👨‍💻. As seen, although LLaMA was trained on Wikipedia Data, which covers 20 languages, most of the data the model was exposed to is from CommonCrawl, which contains English only 🔤. Overall, the training dataset contains roughly 1.4T tokens after tokenization (BPE).

Architecture

LLaMA is based on the transformer architecture; however, the authors leverage various improvements that were proposed and used in different models such as PaLM 🤖.

Note: Like GPT-3, LLaMA uses the Transformer’s decoder-only architecture 💡.

Pre-normalization [GPT3]. To improve the training stability, the authors use RMSNorm and normalize the input of each transformer sub-layer, instead of normalizing the output.

As seen above, RMSNorm simply performs normalization where each element of x is multiplied by the reciprocal of the square root of the mean of the squares (to avoid negative values) of elements of x, with a small number eps added for numerical stability. Then, it multiplies the normalized output with the learnable self.weight.

--

--

Ching (Chingis)

I am a passionate student. I enjoy studying and sharing my knowledge. Follow me/Connect with me and join my journey.