LinkedInNoteAI

⚠️ Biases to watch out for when using LLMs as judges

When using LLMs as evaluators, be careful about position bias, verbosity bias, and self-enhancement bias. If the generation model and the evaluation model are the same, the evaluator tends to prefer answers in its own style.

LinkedIn
December 9, 2025
Read time
2 min
Language
English
AIDec 9, 2025English

⚠️ Biases to watch out for when using LLMs as judges.

  1. Position bias: the model tends to prefer the answer that is shown first. A common fix is to ask twice with the order swapped, A vs B and B vs A, and then take the majority result.

  2. Verbosity bias: the model tends to prefer answers that are simply longer and more detailed, regardless of whether the content is actually better. You need to specify this in the rubric or explicitly penalize length.

  3. Self-enhancement bias: the model prefers answers that resemble the ones it would have generated itself.

If the model generated a certain sentence, that means it probabilistically believed it was the best answer. So when it evaluates, it will naturally favor styles similar to its own. To avoid that, it is better to use different models for generation and evaluation.

https://lnkd.in/g2qPM6cx

LinkedIn attachment 1