Prompt Engineering: Why Structure Is the Only Thing Between You and a Hallucination
If RLHF trains a model to prefer answers humans like, how does it still say something completely false — and say it with total confidence?
RLHF never trained the model to be right. It trained the mode
changeofbasis.hashnode.dev4 min read