Advancing LLM Reasoning Generalists with Preference Trees
Eurus and UltraInteract: a pragmatic appraisal of reasoning-focused alignment Context and high-level goals At first glance, the work presents a clear ambition: to push open-source language models toward stronger multi-step reasoning by combining mode...
paperium.hashnode.dev5 min read