Comment by Areeba Nishat on "The Rise and Fall of Reinforcement Learning for LLM “Reasoning”"

Thanks for reading and for your reply. However, you are overlooking key findings in the research that clearly indicate a lack of actual ‘reasoning’ or ‘thinking’. It’s important to recognize that your framing reflects anthropomorphism, a cognitive bias that attributes human traits to AI.

While reinforcement learning introduces a different signal compared to next-token prediction during pre-training, the underlying backpropagation mechanism used in post-training remains unchanged. Adjusting weights through this process does not amount to cognitive reasoning. The transformer’s behavior during inference also remains the same. Post-training is more accurately described as ‘stochastic funneling’ or ‘manifold sculpting’, modeling existing distributions to favor the examples seen during fine-tuning, based on the data and tasks involved.

Search Hashnode