The Safety Eval That Must Work Is the One That Can't
The AI safety community has built an evaluation apparatus specifically to detect dangerous capabilities before deployment. The premise is clear: test the model before you ship it. If it can help synthesize pathogens or scheme against its operators, c...
suboptimal-ai.hashnode.dev2 min read