You push a change to a prompt and your eval score drops from 0.91 to 0.84. Something got worse, but the score doesn’t tell you what. So you start re-running the pipeline, tweaking your inputs, pouring
engineering.fractional.ai8 min read
No responses yet.