World Cup AI Arena: What 12 Models and 169 Picks Tell Us About LLM Calibration
I built a public World Cup prediction arena for 12 AI models.
The fun question is: which model predicts football best?
The engineering question is better: which model stays calibrated under uncertaint
tokenmix.hashnode.dev4 min read