RTRaghavan T Minchangeofbasis.hashnode.dev·2d ago · 4 min readAI Evaluation Basics: Why a 98% Score Doesn't Mean What You ThinkIf Netflix says a show is a 97% match for you, why do you still hate it 10 minutes in? That's not a broken algorithm. That's a score, and scores don't always match reality. AI has the same problem. H00
RTRaghavan T Minchangeofbasis.hashnode.dev·3d ago · 6 min readModel Limitations: The Two Everyone Knows and the Ones Nobody Mentions If an AI can show you exactly how it solved a problem, how can that explanation have nothing to do with what actually happened inside it? That's not a rhetorical question. It's the honest state of the00
RTRaghavan T Minchangeofbasis.hashnode.dev·4d ago · 4 min readPrompt Engineering: Why Structure Is the Only Thing Between You and a HallucinationIf RLHF trains a model to prefer answers humans like, how does it still say something completely false — and say it with total confidence? RLHF never trained the model to be right. It trained the mode00
RTRaghavan T Minchangeofbasis.hashnode.dev·5d ago · 5 min readRLHF: How ChatGPT Learned What You Actually Want You have probably noticed that ChatGPT gives surprisingly useful answers. Not just correct — useful. It matches your tone, reads between the lines, knows when to be brief and when to go deep. But here00
RTRaghavan T Minchangeofbasis.hashnode.dev·6d ago · 5 min readFine-Tuning: How a Raw Language Model Becomes a Product GPT-3 arrived in 2020. OpenAI had trained it on an enormous corpus of text — web pages, books, code, academic papers. It could write, summarise, translate, answer questions. The benchmarks were impres00