The Safety Eval That Must Work Is the One That Can't
Mar 15 · 2 min read · The AI safety community has built an evaluation apparatus specifically to detect dangerous capabilities before deployment. The premise is clear: test the model before you ship it. If it can help synthesize pathogens or scheme against its operators, c...
Join discussion















