Member-only story
What Low-Rank Adaptation Really is and Its Applications
This blog piece was inspired by the recent video uploaded by the author of Low-Rank Adaptation, Edward Hu. Low-rank adaptation (LoRA) is one of the most popular methods for fine-tuning LLMs and more!
Edward uploaded a video where he covers
- What is LoRA?
- How to choose the rank r?
- Does LoRA work for my model architecture?
- Benefits of using LoRA
- Engineering ideas enabled by LoRA
This blog connects his video with the paper and adds an idea proposed in academia to provide a very detailed summary of LoRA. I hope you enjoy reading this piece!
Nomenclature
The rank of a matrix is the maximum number of linearly independent rows or columns in the matrix.
LOW-RANK-PARAMETRIZED UPDATE MATRICES
Inspiration
A neural network contains many dense layers which perform matrix multiplication. The weight matrices in these layers typically have full-rank. When adapting to a specific task, Aghajanyan et al. (2020) shows that the pre-trained language models have a low “instrisic dimension” and can still learn efficiently despite a random projection to a smaller subspace. Inspired by this, we hypothesize the updates to the weights also have a low “intrinsic rank” during adaptation.
- Intrinsic dimension refers to the number of independent variables necessary to represent a dataset. So, in this context, however, a low intrinsic dimension means that the models do not require a large number of dimensions to capture the essential features of the data, i.e. the important information can be maintained in a subspace with fewer dimensions (lower rank).
- The authors hypothesize that the updates to the weights also have a low intrinsic rank during adaptation, which means that they believe that the changes made to the model (i.e., to the weight matrices) during task-specific fine-tuning are captured in a small number of essential features or variables.