LLM Bias and Calibration: What Causes the High Variance?

4 min readJan 5, 2024

The motivation behind “prompt engineering” is that not all prompts lead to the same accuracy. Thus, one should tune the prompt’s format and examples to achieve the best possible performance.

There have been many different prompting strategies proposed:

Prompting for LLMs: What’s New in Academia (September — October) 2023 (part 2)

With the boom of LLMs, Prompt Engineering is becoming a huge thing. Prompt engineering is crucial for developing and…

chingisoinar.medium.com

To this day, this is a major and popular area of research. However, have you wondered why and what causes such variance in performance? In this blog, we are going to see some findings on the effect of prompts on LLM performance and different biases that cause the variance.

Calibrate Before Use: Improving Few-Shot Performance of Language Models

GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show…

arxiv.org

How Prompts Affect LLM Performance?

LLM’s accuracy depends highly on both selection and permutation of training examples. The authors demonstrate that LLM performance varies significantly, from random guess…