Improving alignment of dialogue agents via targeted human judgements
Aligning Dialogue Agents Through Rule-Conditioned Reinforcement Context and high-level goals At first glance the work synthesizes a pragmatic path for aligning conversational systems: make them more useful and less harmful by combining human judgemen...
paperium.hashnode.dev5 min read