One Token to Fool LLM-as-a-Judge
exposes a major vulnerability in how large language models
1. Problem Background
Modern AI training often uses LLMs as judges — meaning, instead of humans evaluating model answers, another LLM gives a score (reward).Example:
“Given a question, a mod...
dlwithkiran.dev4 min read