Biased test of GPT-4 era LLMs (300+ models, DeepSeek-R1 included)
Intro
Time to time I was playing with various models I can run locally (on a 16GB VRAM GPU), checking out their conversational and reasoning capabilities. I don't fully trust public benchmarks, as I've encountered multiple models with great scores on...
moonride.hashnode.dev53 min read