LLMs Explained: LLaMA and Its Architecture (Part 1)

4 min readNov 5, 2023

Meta’s large language model took the AI research world by storm in February — followed by the commercial Llama 2 in July and Code Llama in August. But with the introduction of LLaMA, the first major free ‘open source’ LLM, open-source AI began to have a moment 😵‍💫.

According to Meta, the open-source AI community has fine-tuned and released over 7,000 LLaMA derivatives on the Hugging Face platform since the model’s release 🚀.

Let’s delve deep into the workings of the groundbreaking LLaMA, a beacon of open-source AI 👀. This blog post will elucidate the technical design of LLaMA and highlight the key differences that set it apart from its counterparts 🔎.

Pre-training Data

The training dataset is a mixture of several sources to cover a diverse set of domains:

Paper: https://arxiv.org/pdf/2302.13971.pdf

Comments: all data is open-source 👨‍💻. As seen, although LLaMA was trained on Wikipedia Data, which covers 20 languages, most of the data the model was exposed to is from CommonCrawl, which contains English only 🔤. Overall, the training dataset contains roughly 1.4T tokens after tokenization (BPE).

Architecture

LLaMA is based on the transformer architecture; however, the authors leverage various improvements that were proposed…

LLMs Explained: LLaMA and Its Architecture (Part 1)

Pre-training Data

Architecture

Written by Ching (Chingis)