Modern Weak Supervision for ML Ranking Applications: Combining Heuristics and LLM
In recent years, the real-world impact of machine learning (ML) has grown in leaps and bounds. There’s no doubt about how important labels are for these constant improvements and innovations. The Imagenet dataset alone has been a major driving force for the growth of Computer Vision. Accurate labels allow us to improve model performance as well as safely and reliably evaluate our experiments.
These hand-labeled training sets are expensive and time-consuming to create — often requiring person months or years to assemble, clean, and debug — especially when domain expertise is required.
On top of this, tasks often change and evolve in the real world. The labels that took you months to collect might not be reliable anymore. In the scenario of ranking, implicit feedback (clicks, purchases, likes, and other signals of engagement ) has been another go-to source for producing ground-truth labels. However, relying on them alone might not be sufficient. These signals, while valuable, can often be noisy, biased, or incomplete. Another scenario is that they might not be applicable if we want to produce some other features for the ranking model that utilizes implicit feedback.
For all these reasons, practitioners have increasingly been turning to weaker forms of supervision.
This piece collects and summarizes multiple works and blogs to provide a summary and an idea for future inspirations in this research…