Stop Measuring Code Gen with BLEU: Interviewers Want This Instead
Stop Measuring Code Gen with BLEU: Interviewers Want This Instead
Automated code generation is getting better — but our evaluation methods haven't always kept up. BLEU (and similar surface-similarity metrics) reward output that looks like reference ...
blog.bugfree.ai4 min read