ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turnComparisons
Rethinking Dialogue Evaluation: A Review of ACUTE-EVAL Context and the empirical problem At first glance, the field’s long-standing evaluation problems feel familiar but stubborn: automatic metrics often fail to map onto human perception, and human-b...
paperium.hashnode.dev4 min read