Building ML Solutions on Top of Pre-trained Embeddings with Noisy/Imbalanced Data

5 min readFeb 29, 2024

With the boom of LLM and/or multi-modal embeddings building solutions on top of them has become popular for many applications. Being able to reuse an embedding model across multiple projects is important to reduce development and maintenance costs. However, this might be a challenging task given the domain of your problem. Fine-tuning the models might not always be available due to time or any other limitations. Off-the-shelf embedding might not be discriminative enough for your data, which might lead to a lot of noise…

Building ML Solutions on Top of Pre-trained Embeddings with Noisy/Imbalanced Data

Written by Ching (Chingis)