Smaller is Better: Replacing GPT-4o-mini with a 7B Local Judge
I expected the 30B model to be the better judge. It wasn't.
When I set out to replace OpenAI's GPT-4o-mini as the judge for the Oolong benchmark, my plan was simple: use the biggest local model I had. Qwen3-coder at 30B parameters seemed like the obv...
proximal.hashnode.dev4 min read