L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Controlling Reasoning Length: framing and goals Motivation and high-level objective At first glance the problem is simple but practically thorny: modern reasoning models improve when they think longer, yet researchers have lacked mechanisms to direct...
paperium.hashnode.dev5 min read