Evaluating SotA LLM Models trying to solve a net-new LeetCode style puzzle
I am sure a lot of you would have seen this particular meme template. It has given rise to entire genre of tiktoks where girls are amazed at how much calculation guys do to pick which stall to go to in a crowded row of urinals.
I actually even made ...
arnav.tech20 min read
Luv Singh
Question everything
I guess one of the reasons for o1 performing better could be it's better distribution of training data especially for reasoning tasks than deepseek (as these 2 are primarily reasoning models). These llms mostly approximate the training data distribution, since o1 has better (and more) I guess that's why it did well (though inherently none of them can reason like we do)