teaching a gpt to judge itself, part one
last time, i laid out the theory: toy datasets, self-critique inspiration, and the idea of tracing evaluations. this week, i actually built it.
the mvp version of reasonsaver is a pipeline that takes prompts, runs completions, scores them, and saves ...
nullpointerette.dev3 min read