Why LLMs Agree With You Even When You're Wrong: The Systematic Bias Toward Agreement Over Correctness in Reward Models
Why LLMs Agree With You Even When You're Wrong: The Systematic Bias Toward Agreement Over Correctness in Reward ModelsUnderstanding the trade-off in AI: priorit...