Discussion on "The Rise and Fall of Reinforcement Learning for LLM “Reasoning”"

Gerard Sans · 2025-05-11T19:54:05.913Z

In the fast-moving world of artificial intelligence, we've been dazzled by increasingly impressive claims about "reasoning models" and their problem-solving abilities. These models, enhanced through Reinforcement Learning for Reasoning (RLfR), suppos...

Thanks for reading and for your reply. However, you are overlooking key findings in the research that clearly indicate a lack of actual ‘reasoning’ or ‘thinking’. It’s important to recognize that your framing reflects anthropomorphism, a cognitive bias that attributes human traits to AI.

While reinforcement learning introduces a different signal compared to next-token prediction during pre-training, the underlying backpropagation mechanism used in post-training remains unchanged. Adjusting weights through this process does not amount to cognitive reasoning. The transformer’s behavior during inference also remains the same. Post-training is more accurately described as ‘stochastic funneling’ or ‘manifold sculpting’, modeling existing distributions to favor the examples seen during fine-tuning, based on the data and tasks involved.

Search Hashnode

The Rise and Fall of Reinforcement Learning for LLM “Reasoning”

Responses(1)