How I A/B test LLM prompts without fooling myself
I have shipped a prompt change on a number that meant nothing, and paid for it the next morning.
The pattern is always the same. Someone makes a small tweak to the support agent's groundedness prompt.
kartiknvjk.hashnode.dev6 min read